Pants: Allow more dynamic source_roots

Created on 23 Apr 2020  路  24Comments  路  Source: pantsbuild/pants

I have a case were I want to give a python package more control about the source_roots,
ATM I have 'Packages/Python' as the source_root of my Python code,
but I have some special cases like:

Packages/Python/foo/foo/__init__.py
Packages/Python/foo/BUILD (which builds foo library with sources=['foo/**/*']),

this causes me to import foo.foo and not just foo.
I guess the source_root should be Packages/Python/foo (so I could import foo and not foo.foo),
but this shouldn't be within pants.toml since its a Package choice

python

Most helpful comment

That said, one way to get the effect is to re-export symbols in the higher-up package:

Packages/Python/foo/__init__.py can:

from .foo_lib import a

This introduces the symbol a into the package foo.

So now you can from foo import a.

This would be the idiomatic way of doing what you want in Python.

(Note that the __init__.py file needs to be in a python_library target, which it will by default unless you explicitly set sources=)

All 24 comments

If the python_library() target was able to declare e.g. source_root='foo', something like:

python_library(
  sources=['foo/**/*'],
  source_root='foo',
  ...,
)

Then pants could use that to infer that when other libraries import this one, they should import not foo.some_file, but just import some_file, if some_file.py is contained within foo/.

Can you explain more about why this is necessary/valuable? Generally it seems like having imports conform to the tree structure of the corresponding sources (starting at one of possibly several non-overlapping source roots) is a good idea.

This doesn't seem possible in any tool today. In fact, with no tool at all, just with raw python, you'd have to do some crazy things with sys.path (or PYTHONPATH) to get this to work.

Basically what you're asking for is overlapping source roots, which is very non-idiomatic in Python... Is this in an attempt to mirror C++ behavior?

That said, one way to get the effect is to re-export symbols in the higher-up package:

Packages/Python/foo/__init__.py can:

from .foo_lib import a

This introduces the symbol a into the package foo.

So now you can from foo import a.

This would be the idiomatic way of doing what you want in Python.

(Note that the __init__.py file needs to be in a python_library target, which it will by default unless you explicitly set sources=)

Here is an example of what I mean, f.e with setup.py I could get from
Screenshot from 2020-04-24 04-45-34
this modules: f, f.g, a, b, c, c.d, c.d.e
You can see that the source_roots can be different, but I'm not sure how to make this, or if even the place to specify this is in the pants.toml source_roots (which I think shouldn't be the place for this)

So it sounds like you want the source roots to be inferred from the locations of the setup.py files, more or less?

In this case yes,
but the idea is to have a BUILD file to specify this, and not set at as a global rule..
its just to show what I mean..

Just think of the problem itself, I have a big repo (monorepo ), and python packages can be in different hierarchies and paths within..

I don't have a src/python layout, also not sure I want to have this, usually you don't have this kind of problem, since each python package has a setup.py and you can create these packages by using it..

I guess if I don't have any other option, I will need to be creative and "force" this with symlinks or something.. (Although I hope this not what I have to do, and really didn't tested it yet, so don't really know if you work like that with symlinks- Again please try to ignore this for now^^)

Interesting. So the problem with this is that to know what all the source roots are, we'd have to look at every BUILD file in the repo.

However we don't usually need to know that, so possibly this isn't really a problem. Usually we only need to know what source root a specific file or target belongs to, so climbing up the file hierarchy to look for a BUILD file signaling a root is cheap.

How would you feel about some other sentinel file, like an empty file named SOURCE_ROOT?

Or .SOURCE_ROOT

Then we only need to check for existence of a file, instead of parsing BUILD files and so on.

@benjyw that sound AWESOME :+1: :+1: :+1:

FWIW, Pants originally required source roots to be declared explicitly and this was done in BUILD files (here is an example just before the feature's death circa 2015): https://github.com/pantsbuild/pants/blob/238143038b99eb087b513e7052a494bc3de6db75/contrib/scrooge/BUILD#L4-L6

We then added support for auto-detecting common layouts and then removed support for specifying source roots in BUILD files, moving the configuration up to the central point we have now. I for one would be happy to see support added back for specifying source roots in BUILD files in particular as preferrable to a type of sentinel file unless the performance of parsing a handfull of BUILDs is deemed onerous, but it shouldn't be afaict.

Yeah, hope to see this soon as well:X
or the other suggested solution, both sounds good (as long when you specify it in a BUILD you don't need to specify this for the rest of the inner hierarchy)

The advantage of a sentinel file like SOURCE_ROOT is not having to eval it. This assumes that we get rid of the (never used) ability to map source roots to languages, which I added prematurely.

@cow0w Will look into adding one of these two options very soon.

It's already semi-sortof the case that we allow sourceroots to be created dynamically (or it was?). If sources were not located within a sourceroot, we would create a sourceroot at the location of the BUILD file that was capturing the sources.

This meant that patterns like:

<root>/project_directory/BUILD             # containing a target that captures `a/b/c/**`
<root>/project_directory/a/b/c/file.py

...would work correctly. People found this very useful for configuration (which didn't have inherent namespacing), but it also works for this case. It's just challenging to explain, perhaps.

The feature @stuhood describes is what Pants does when it cannot find a source root, it assumes the source root is the parent directory of the BUILD file owning the sources. Using that feature as a hack to declare source roots is fragile though since the source root breaks the minute you sprinkle BUILD files in the subtree.

Using that feature as a hack to declare source roots is fragile though since the source root breaks the minute you sprinkle BUILD files in the subtree.

Only if the sourceroot "creation" is sideffecting... and I don't think it is anymore. IIRC, the implementation of this was more like "is there a declared source root? if not, compute one for this target".

EDIT: See https://github.com/pantsbuild/pants/blob/ac5465ad5aee81ac2c1a5462148ba284fb277526/src/python/pants/source/source_root.py#L90-L104

If I have a non-source root path a/b/c and I drop in a BUILD with one target that recursive globs the a/b/c subtree, then for the file a/b/c/d/e/f.py I get a source root of a/b/c and the resulting python module d.e.f. If later I want more fine grained targets and add a BUILD file at a/b/c/d to own a/b/c/d/e/f.py I now get a source root of a/b/c/d and a python module of e.f. The fact the module changed because the source root changed will break imports from a/b/c/h.py.

If later I want more fine grained targets and add a BUILD file at a/b/c/d to own a/b/c/d/e/f.py I now get a source root of a/b/c/d and a python module of e.f. The fact the module changed because the source root changed will break imports from a/b/c/h.py.

Yes, that's true. You have a workaround, which is to define both targets in the BUILD file at a/b/c/... which isn't great. But it might be preferable to a SOURCEROOT file... because the BUILD file then plays the role of the SOURCEROOT file.


Another related question is: what should happen here in the case of dependency inference? Should avoid any magic that we can't add inference for later.

Option #357, leverage BUILD file extensions and deem BUILD.sourceroot special. It's a BUILD file that can contain targets, but it also denotes a source root without inspecting its contents.

That is the best of all magic!

First step is probably to get rid of the sourceroot->language and sourceroot->(source/test/3rdparty) mapping, which we only currently use in v1 golang buildgen and nowhere else.

I added all that back in the day expecting it to be useful, but it really isn't, and doesn't map to reality particularly well in many cases: there's no reason to segregate tests and 3rdparty other than custom, which at least in the case of tests is bad custom IMO.

We believe that this is fixed by #9977, which should be released as part of 1.30.x.

Was this page helpful?
0 / 5 - 0 ratings