When comparing ./pants help with/without pants2, there appears to be a fixed ~2 second overhead on my machine for python3.
Profiling using https://www.pantsbuild.org/invoking.html#profiling-pants doesn't show any significant difference between pants2 and pants, so it's possible that this overhead exists below/before pants' own entrypoint...?
Looking at https://speed.python.org/comparison/ and filtering down to the startup benchmarks.

It would seem that startup time for python has gone up in python 3. Not sure that explains 100% of the regression.
It looks like the scale on that graph is seconds, so I don't think that python 3's inherent startup time could be causing this.
Given that https://www.pantsbuild.org/invoking.html#profiling-pants isn't showing anything useful, another possibility is that the overhead is outside of pants' own entrypoint somewhere... but it's likely that just adding some debug print('in $x: {}'.format(time.time())) lines in a bunch of spots would be sufficient to locate this (and then figure out why our profiling harness isn't catching it).
I've also updated the description: this repros for:
$ time ./pants2 help > /dev/null
real 0m1.952s
user 0m1.207s
sys 0m0.669s
vs
$ time ./pants help > /dev/null
real 0m4.221s
user 0m3.594s
sys 0m0.550s
Found one whole second of additional overhead due to this line: module = importlib.import_module(module_path).
Apply this diff:
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..cd39da454 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -7,6 +7,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import importlib
import locale
import os
+import time
import sys
import warnings
from builtins import object
@@ -99,11 +100,23 @@ class PantsLoader(object):
@staticmethod
def load_and_execute(entrypoint):
+ time_before_load = time.time()
+ print("time before load: {}".format(time_before_load))
+
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
+ time_after_parsing_entrypoint = time.time()
+ print("time after parsing entrypoint: {} (elapsed since last {})".format(time_after_parsing_entrypoint, time_after_parsing_entrypoint - time_before_load))
+
module = importlib.import_module(module_path)
+ time_after_importing_module = time.time()
+ print("time after importing module: {} (elapsed since last {})".format(time_after_importing_module, time_after_importing_module - time_after_parsing_entrypoint))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
+ time_after_getting_entrypoint = time.time()
+ print("time after getting entrypoint: {} (elapsed since last {})".format(time_after_getting_entrypoint, time_after_getting_entrypoint - time_after_importing_module))
+
entrypoint_main()
@classmethod
Running ./pants2 help:
time before load: 1553648263.19
time after parsing entrypoint: 1553648263.19 (elapsed since last 4.19616699219e-05)
time after importing module: 1553648263.79 (elapsed since last 0.609071016312)
time after getting entrypoint: 1553648263.79 (elapsed since last 2.00271606445e-05)
Running ./pants help:
time before load: 1553648215.361344
time after parsing entrypoint: 1553648215.361392 (elapsed since last 4.792213439941406e-05)
time after importing module: 1553648217.008014 (elapsed since last 1.6466219425201416)
time after getting entrypoint: 1553648217.008049 (elapsed since last 3.504753112792969e-05)
--
Initial idea why this could be happening: as of Py3.3, importlib.import_module imports parent packages too: https://docs.python.org/3/library/importlib.html#importlib.import_module.
Also, this only explains half of the slow down.
Initial idea why this could be happening: as of Py3.3, importlib.import_module imports parent packages too: https://docs.python.org/3/library/importlib.html#importlib.import_module.
I don't think this is the issue. Even with implementing our own import loader by following the tutorial from https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly and https://docs.python.org/3/library/importlib.html#approximating-importlib-import-module, and intentionally leaving out the code to resolve parents, we still get an execution of 1.5 - 1.8 seconds due to the line spec.loader.exec_module(module).
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..048c9e5fa 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -7,6 +7,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import importlib
import locale
import os
+import time
import sys
import warnings
from builtins import object
@@ -97,13 +98,35 @@ class PantsLoader(object):
def determine_entrypoint(env_var, default):
return os.environ.pop(env_var, default)
+ @staticmethod
+ def load_module(name):
+ """An approximate implementation of import."""
+ spec = importlib.util.find_spec(name)
+ module = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(module)
+ sys.modules[name] = module
+ return module
+
@staticmethod
def load_and_execute(entrypoint):
+ time_before_load = time.time()
+ print("time before load: {}".format(time_before_load))
+
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
- module = importlib.import_module(module_path)
+ time_after_parsing_entrypoint = time.time()
+ print("time after parsing entrypoint: {} (elapsed since last {})".format(time_after_parsing_entrypoint, time_after_parsing_entrypoint - time_before_load))
+
+ # module = importlib.import_module(module_path)
+ module = PantsLoader.load_module(module_path)
+ time_after_importing_module = time.time()
+ print("time after importing module: {} (elapsed since last {})".format(time_after_importing_module, time_after_importing_module - time_after_parsing_entrypoint))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
+ time_after_getting_entrypoint = time.time()
+ print("time after getting entrypoint: {} (elapsed since last {})".format(time_after_getting_entrypoint, time_after_getting_entrypoint - time_after_importing_module))
+
entrypoint_main()
@classmethod
So, the issue is exec_module(). I don't know why this would be slower in Py3 and am not sure how we could fix it.
Going to take a break from the investigation.
Also note this issue reproduces with simply ./pants vs ./pants2.
$ time ./pants
user 3.91s
sys 0.46s
vs
$ time ./pants2
user 1.11s
sys 0.55s
@stuhood you may want to update the issue description.
Distilling down observations a bit:
Re: import / exec_module - these are the same thing for our purposes - a detail of importing is the fact that it means to execute a module in a new namespace. This is true in python2 and python3 and we really have no control over that - the language requires imports work this way.
Narrowing the testing above and further eliminating the resolution of parents red-herring by switching to __import__:
$ git diff
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..c987228ec 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -4,7 +4,6 @@
from __future__ import absolute_import, division, print_function, unicode_literals
-import importlib
import locale
import os
import sys
@@ -101,7 +100,13 @@ class PantsLoader(object):
def load_and_execute(entrypoint):
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
- module = importlib.import_module(module_path)
+
+ import time
+ start = time.time()
+ __import__(module_path)
+ module = sys.modules[module_path]
+ print('>>> import took {}'.format(time.time() - start))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
entrypoint_main()
py3:
$ time ./pants -V
>>> import took 1.0852248668670654
1.15.0rc0
real 0m3.074s
user 0m2.675s
sys 0m0.283s
py2
$ time ./pants2 -V
>>> import took 0.541138887405
1.15.0rc0
real 0m1.512s
user 0m1.181s
sys 0m0.227s
-V executes as little code as possible so the overall times here represent almost pure importing. We measure one bit of that - the time it takes to import the pants entrypoint - and this takes 2x as long under python3. We already know from historical work that a large portion of pants start up time is consumed by gathering options from Tasks and Subsystems. This involves importing more code outside of what we measured here. I think it's safe to say then that import being 2x slower explains everything here.
Figuring out why import is 2x slower remains. Is it something we have control over or not?
Eeinteresting - helpful new python3.7 feature:
$ PYTHONPROFILEIMPORTTIME=1 ./pants -V &> /tmp/import-times.txt
Yields: import-times.txt
Via grep "import time:" /tmp/import-times.txt | cut -d: -f2 | sort -n | sed -re "s/\|\s+/| /g" | tail -10
self [us] | cumulative | imported package
...
42707 | 42707 | pants.engine.isolated_process
47359 | 415113 | pants.backend.python.rules.python_test_runner
101299 | 101461 | pants.backend.native.subsystems.binaries.gcc
107934 | 108030 | pants.backend.native.subsystems.binaries.binutils
108567 | 182715 | pants.engine.legacy.options_parsing
151087 | 151087 | pants.backend.native.subsystems.binaries.llvm
182072 | 182072 | pants.engine.build_files
208188 | 208188 | pants.backend.native.subsystems.xcode_cli_tools
362700 | 366863 | pants.engine.legacy.graph
550027 | 1119320 | pants.backend.native.subsystems.native_toolchain
The interesting times here are in the left-hand column - they give the time to import the module itself. The heaviest hitter takes 500ms and their are several 100ms+.
Also interesting is comparing @rule counts:
$ git grep @rule src/**/*.py | cut -d: -f1 | sort | uniq -c | sort -n | tail -10
2 src/python/pants/backend/native/subsystems/binaries/binutils.py
2 src/python/pants/backend/native/subsystems/binaries/gcc.py
3 src/python/pants/backend/native/subsystems/binaries/llvm.py
3 src/python/pants/engine/legacy/options_parsing.py
3 src/python/pants/init/engine_initializer.py
4 src/python/pants/backend/native/subsystems/xcode_cli_tools.py
4 src/python/pants/engine/build_files.py
7 src/python/pants/engine/legacy/graph.py
12 src/python/pants/engine/rules.py
14 src/python/pants/backend/native/subsystems/native_toolchain.py
The ratios seem to track the import self time ratios well.
Perhaps putting an even finer point on things is looking at the tree output of the profile combined with the diff above for emphasis:
$ PYTHONPROFILEIMPORTTIME=1 ./pants -V
...
import time: 636 | 1019874 | pants.pantsd.pants_daemon
import time: 300 | 1030937 | pants.bin.remote_pants_runner
import time: 216 | 1031152 | pants.bin.pants_runner
import time: 165 | 1114329 | pants.bin.pants_exe
>>> import took 1.1143436431884766
import time: 182 | 182 | pants.build_graph.aliased_target
import time: 177 | 177 | pants.build_graph.intransitive_dependency
import time: 206 | 206 | pants.build_graph.prep_command
import time: 216 | 216 | netrc
...
import time: 133 | 133 | pants.backend.python.targets.python_requirement_library
import time: 224 | 224 | pants.build_graph.import_remote_sources_mixin
import time: 212 | 436 | pants.backend.python.targets.import_wheels_mixin
import time: 225 | 660 | pants.backend.python.targets.unpacked_whls
import time: 122 | 122 | pants.backend.python.tasks
import time: 98 | 98 | pants.backend.native
import time: 159 | 256 | pants.backend.native.targets
import time: 94 | 94 | pants.backend.native.subsystems
import time: 93 | 93 | pants.backend.native.subsystems.utils
import time: 215 | 308 | pants.backend.native.subsystems.utils.mirrored_target_option_mixin
import time: 143 | 143 | pants.option.compiler_option_sets_mixin
import time: 687 | 1231 | pants.backend.native.subsystems.native_build_step
import time: 287 | 287 | pants.backend.native.targets.native_artifact
import time: 316 | 2088 | pants.backend.native.targets.native_library
import time: 135 | 135 | pants.backend.native.tasks
import time: 97 | 97 | pants.backend.native.config
import time: 2152 | 2249 | pants.backend.native.config.environment
import time: 141 | 141 | pants.backend.native.subsystems.native_build_settings
import time: 98 | 98 | pants.backend.native.subsystems.binaries
import time: 109589 | 109687 | pants.backend.native.subsystems.binaries.binutils
import time: 175 | 175 | pants.backend.native.subsystems.utils.archive_file_mapper
import time: 102889 | 103064 | pants.backend.native.subsystems.binaries.gcc
import time: 153892 | 153892 | pants.backend.native.subsystems.binaries.llvm
import time: 257 | 257 | pants.backend.native.subsystems.utils.parse_search_dirs
import time: 415 | 671 | pants.backend.native.subsystems.libc_dev
import time: 205049 | 205049 | pants.backend.native.subsystems.xcode_cli_tools
import time: 525163 | 1097523 | pants.backend.native.subsystems.native_toolchain
import time: 242 | 242 | pants.backend.native.targets.packaged_native_library
import time: 156 | 156 | pants.build_graph.dependency_context
import time: 367 | 1098427 | pants.backend.native.tasks.native_task
import time: 951 | 1099377 | pants.backend.native.tasks.native_compile
import time: 687 | 1102447 | pants.backend.native.tasks.link_shared_libraries
import time: 156 | 156 | pex.compiler
import time: 365 | 521 | pex.pex_builder
import time: 272 | 272 | pants.backend.python.subsystems.python_setup
import time: 452 | 1244 | pants.backend.python.subsystems.pex_build_util
import time: 287 | 287 | pex.environment
import time: 374 | 661 | pex.pex
import time: 176 | 836 | pants.binaries.executable_pex_tool
import time: 342 | 1178 | pants.backend.python.subsystems.python_native_code
import time: 465 | 1107543 | pants.backend.python.tasks.build_local_python_distributions
import time: 227 | 227 | pants.backend.python.tasks.gather_sources
import time: 124 | 124 | pants.backend.python.subsystems.python_tool_base
import time: 151 | 274 | pants.backend.python.subsystems.isort
import time: 1069 | 1069 | pants.backend.python.interpreter_cache
import time: 287 | 1356 | pants.backend.python.tasks.python_tool_prep_base
import time: 215 | 1844 | pants.backend.python.tasks.isort_prep
import time: 241 | 241 | pants.backend.python.tasks.isort_run
import time: 135 | 135 | pants.util.fileutil
...
import time: 448 | 18731 | pants.contrib.python.checks.tasks.checkstyle.checkstyle
import time: 314 | 314 | pants.contrib.python.checks.tasks.python_eval
import time: 87 | 87 | pants.contrib.scrooge.tasks
import time: 144 | 144 | pants.contrib.scrooge.tasks.java_thrift_library_fingerprint_strategy
import time: 300 | 300 | pants.contrib.scrooge.tasks.thrift_util
import time: 702 | 1232 | pants.contrib.scrooge.tasks.scrooge_gen
import time: 273 | 273 | pants.contrib.scrooge.tasks.thrift_linter
import time: 149 | 149 | pants.contrib.thrifty.java_thrifty_library
import time: 205 | 354 | pants.contrib.thrifty.java_thrifty_gen
import time: 215 | 215 | multiprocessing.dummy.connection
import time: 313 | 527 | multiprocessing.dummy
It really does look like something about the @rule decorator code is very slow to execute on import in general which is now doubly slow under py3.