Originally reported by: gwideman (Bitbucket: gwideman, GitHub: gwideman)
#!python
setup(...
packages=find_packages(..)
)
on the developer machine will fail to find packages that lack a __init__.py file, as is allowed in Python 3.3. However, such packages listed explicitly: packages=['mypkg'] do appear to get included and later installed.
Note: When testing this, before each test be sure to delete all generated metadata, including that which setup may previously have placed in the original source directory, as it seems that setup may use metadata created on a previous run in order to include files.
This is part of a general problem reported in issue #83, but I've logged it separately as it's specifically about setuptools.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
@pje
the net effect of having a namespace package inside a
non-namespace package was to make the subpackage a
non-namespace package
OK, but then there seems to be no way to specify, either to Python, or to package-user programmers, that a subdir is _not_ intended to be a package.
update to the PEP is advisable
I have been leaning that way for some time, but as a non-luminary I don't carry much weight :-).
If this comes up for more deliberation, I do advocate that the overall package behavior be not just deterministic and implementable, but also comprehensible and explicable as a whole. That is, the whole of the __init__.py and non-__init__.py should be described as a coherent whole, and not as two distinct things whose interactions and side effects are left to the imagination.
Nick Coghlan proposed an "explanation requirement" with respect to directly executing a module from a package. I think it applies to the whole package concept. https://mail.python.org/pipermail/import-sig/2012-March/000423.html
_Original comment by_ pje (Bitbucket: pje, GitHub: pje):
It was once briefly discussed in relation to PEP 402 (420's predecessor) that the net effect of having a namespace package inside a non-namespace package was to make the subpackage a non-namespace package, since its path is limited to one item (the subdirectory of the parent). But neither PEP specs this explicitly, it's just implied. There didn't seem to be any reason to restrict this.
Another issue that's come up recently on distutils-sig is that nspkg.pth support is broken for PEP 420 currently -- the magic really shouldn't be running if PEP 420 support is available.
I'm beginning to think that an update to the PEP is advisable, to both specify how to detect support, and spelling out package detection heuristics in a better way. At any rate, the discussion needs to expand beyond the ticket tracker, probably to import-sig and distutils-sig.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
Could someone clarify whether there was explicit thinking/spec regarding whether PEP420 package rules apply within a package that has an
__init__.py file? From Wyatt's earlier comment and blog post, I believe he thinks that PEP420 applies even within a __init__.py-marked directory tree, at least so far as import's behavior. He finds that surprising (as do I), and proposed heuristics that disable that behavior. Was there some deliberation on that we can read about? I'd rather be persuaded that it's a good thing than whinge about it in ignorance :-).
_Original comment by_ pje (Bitbucket: pje, GitHub: pje):
Heh. If you think this is hard in Python, you should check out JavaScript's module systems, or lacks thereof. ;-)
But no, essentially, there is no way to be unambiguous without adding __init__.py, and it's still recommended that you do, unless you're using namespace packages.
And at the moment, setup.py and distribution metadata are supposed to catalog the existence of same. So that is the only other way to know: use .egg-info or .dist-info. Any packages that don't use __init__.py and aren't declared in distribution info can't be picked up unambiguously.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
"that there is no a priori way to tell that an implicit
namespace package is a namespace package from the file
system alone"
That's the conclusion I've come to as well.
I know a lot of smart people put a lot of thought into this. I've read some of the reams of discussion on import-sig (notably including Nick Coghlan's https://mail.python.org/pipermail/import-sig/2012-March/000423.html and subsequent).
But I am puzzled how the package concept is viable if it lacks a way for packages to specify (explicitly or implicitly) their own structure unambiguously. I didn't reach any explanation of how everything that interacts with packages (programmers, IDEs, tools) is intended to work with packages that lack a way to unambiguously specify their structure. Is there one?
_Original comment by_ pje (Bitbucket: pje, GitHub: pje):
A PEP 420 package _is_ a namespace package - that's the whole concept behind PEP 420.
One thing that was discussed in relation to PEP 420 and its predecessors on python-dev and import-sig, is that there is _no_ a priori way to tell that an implicit namespace package is a namespace package from the file system alone. If you have a directory tree ./foo/bar/baz/spam.py, where ./foo and ./foo/bar don't have any importable files, then it's possible that foo.bar.baz and its parents are all namespace packages. So you'd have to scan all possible subdirectories to unlimited depth (while detecting symlink loops) to correctly "detect" all possible namespace packages... and then you can have false positives.
In the presence of this ambiguity, the tools should refuse the temptation to guess.
(Also, if you _do_ make find_packages() guess, you'll have a counter-problem, where Distribution will have to find out on its own that those packages are namespace packages -- my guess is that they won't be handled properly in all installation cases if you don't. In other words, guessing will be required on both sides of the fence!)
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
@pje excellent suggestion to use namespace_packages directly.
I do think, however, that PEP420 doesn't just apply to namespace packages but to any package, in which case the use of namespace_packages only serves as another improvement in the heuristic.
_Original comment by_ pje (Bitbucket: pje, GitHub: pje):
I forgot to mention/may not have made this clear: currently, namespace packages must be _explicitly listed_ in the namespace_packages argument to setup(), which means there is no real reason from an end-user perspective to have find_packages() detect them. That's why it would be simplest to change setuptools.dist:check_nsp to automatically add the namespace packages to the list of packages, instead of giving an error -- at that point, nobody should even _notice_ that find_packages() isn't finding the packages in question. (Though the docs should still be updated to clarify that any __init__-less packages must still be manually listed in namespace_packages.)
_Original comment by_ pje (Bitbucket: pje, GitHub: pje):
The easiest way to fix this problem is to simply auto-add namespace_packages to the packages list in the Distribution object. find_packages should _not_ be used for this, because of the possibility of false positives.
In other words, if you want to include namespace packages, list them in namespace_packages, and _don't_ list them in the regular packages. This means that find_packages should _only_ find traditional packages, not namespace packages.
The explicit listing of namespace_packages needs to happen for backward compatibility anyway, and setuptools already installs those without __init__.py on older Pythons.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
Case in point is the recent refactorings I made to the find_packages function.
I wanted to be rid of the 'stack' concept (which mutated a list for the purposes of iteration) and the 'out list' (which initialized an empty list and populated it incrementally for iteration). I wanted instead something that leveraged clean iteration over the essential objects.
Once I did that though, it becomes obvious why a include_ns_packages parameter in find_packages would significantly complicate the implementation.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
Especially when I see this:
def do_something(switch=False):
if switch:
do_one_thing()
else:
do_another()
Of course, it's usually more complicated than that, but that function could be re-written to just have the caller call the function he cares about.
More importantly, an important lesson I've taken from cleanroom development, functional programming, and test-driven design is that functions should be small, isolated, and with as few branches and side-effects as possible. Adding a boolean flag that creates in internal branch ties the external interface to a branch in the internal behavior. This means that to comprehend fully the meaning of the flag, one must comprehend the entire function and all of the states it might be in.
Proper variable naming can help indicate intent, but it won't overcome the increasing cognitive burden of the behavior.
This becomes increasingly apparent when one attempts to refactor such a function only to find that the single boolean parameter must be passed unused to reach its inner function.
I realize my explanation probably sounds esoteric, but it stems from my attempt to explain what seems intuitive to me after reading books like Refactoring.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
In general, I avoid parameters, especially simple booleans, that select behavior. I'd prefer accepting/passing strings or actual functions to select varying behaviors.
But even if one assumes a keyword argument as Wyatt suggests, I'd be inclined to have an alias where the default is True.
_Original comment by_ wyatt (Bitbucket: wyatt, GitHub: wyatt):
I was just thinking about the idea of adding a keyword arg, too. It would only be relevant/active on Python 3.3 and up. Perhaps it would be disabled by default on 3.3 too so that people who aren't using namespace packages--which is probably the common case--wouldn't need to think about it.
Something like find_packages(include=['mypkg'], include_ns_packages=True) seems like a decent API. And for projects that aren't using namespace packages in 3.3 and up, find_packages() with no args would work just like it does today.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
@Jason R. Coombs: "defer some extra weight [to me]" Hahaha -- that's a lot of responsiblity!
First, I thought I posted this additional comment, but perhaps not:
Is adding an optional named argument to the existing function to select the 420-inclusive behavior a way forward that avoids the naming issue? Or is this beyond the pale for some reason?
If a new function is the way to go, then I certainly see the reasonableness of all your points.
If long names are out, then personally I lean strongly to the find_420_package name. Like I mentioned, it precisely captures the intent of the function, and its "technical" appearance correctly conveys that there's some nuances to understand.
I strongly dislike the names that change just "find" to "locate" or "discover".
I would also be quite happy with find_package_ex, or find_package2. These aren't beautiful, but at least they convey that this function is newer in some way than the old one.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
@gwideman I do agree with your sentiment. I am concerned about the counter sentiment which is that this function is meant to be a convenience function, providing a shortcut for specifying the project names explicitly (and sometimes redundantly). A secondary advantage is that the function adds to the simplicity and elegance of the setup function. Having a function name like 'find_distribution_and_namespace_packages', while some environments might auto-complete, many (most?) won't. And furthermore, for a function like this, the aesthetic is relevant. A clumsy name, while functional, would detract from use (and thus the appeal of setuptools).
Let me elaborate on my thoughts about the alternative names. Yes, they do mirror find_packages semantically without reflecting the disparity. This aspect was by design, with the intention that (a) it would be documented clearly (probably with a show-by-default deprecation warning), and (b) the new function would entirely supersede the old name. So, I'm somewhat reluctant to have a new function carry the baggage of a legacy function that's to be deprecated.
That said, you make some good arguments, and you've pushed me closer to the 'find_420_packages' name.
I should mention, I've considered 'find_all_packages' but rejected it because 'all' is vague yet still clumsy. I've also considered the Microsoft technique, find_packages_ex, but that's clumsy too and even more vague.
Given this ticket/PR is your initiative, I'm inclined to defer some extra weight to your judgment, so please consider these aspects and give me your favorite (or two).
Thanks again for your help on this.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
@ Jason: Where function names suggest an overlap of functionality, I strongly favor that the distinctions in the names actually indicate what the functional distinctions are.
Cases where functions are distinguished only by arbitrary differences in their names tend to present endless confusion, searches of docs, and so on. The difference between "find", "locate" and "discover" does not pertain to scope, which is the salient difference in functionality.
Also, having two overlapping functions with names that are not alphabetically adjacent (in docs and autocomplete) misses the opportunity to alert the unwary programmer that there's a choice to be made.
If the distinctness of meaning is: "find_distribution_and_namespace_packages", then I, for one, am in favor of calling it that. It's long, but we have copy/paste and IDE autocomplete.
If that's considered too long, then "find_420_packages", while somewhat cryptic, is at least precise. If it suggests cruft, that's perhaps a good thing -- that here some complexity lies, please be fully informed of the nuances of this function.
Side question:
"once Python 3.2 is the baseline" <-- I'm puzzling what's the salience of 3.2 to this topic, or did you mean 3.3?
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
After further consideration on this issue, particularly around backward-compatibility, I'm inclined to back out the changes as published and instead create a new function, designed to supersede find_packages, with PEP-420 support.
This approach would give opt-in capability for projects that use PEP-420 package structures, but would leave the existing functionality unchanged.
I've been brainstorming over what this new function could be called. I'm leaning toward "discover_packages", "locate_packages", or "find_420_packages". The latter provides better clarity of its original purpose, but will appear like cruft in the future (once Python 3.2 is the baseline).
_Original comment by_ wyatt (Bitbucket: wyatt, GitHub: wyatt):
The heuristics are tricky. Here's what I came up with (my original patch included similar heuristics):
__init__.py (these directories _could_ contain .py files, e.g. a directory tree for a "scaffold").As far as excluding certain hard coded directories, it seems like that shouldn't be necessary, since that's basically the purpose of the new include arg. I.e., you say you want to include 'mypkg*' and then the sibling directories of mypkg (like build and docs) will automatically be excluded.
In addition, you can use a little trick to exclude non-package directories that are inside a package: add an extension to them (e.g., mypkg/scaffolds.data). These will be excluded automatically by find_packages(), and they're also not importable.
I also wrote a blog post about this: http://wyattbaldwin.com/2014/02/07/pep-420-namespace-packages-somewhat-surprising/.
_Original comment by_ arfrever (Bitbucket: arfrever, GitHub: arfrever):
I suggest to skip directories matching some hardcoded patterns by default (unless such directories are matched by element of include parameter):
build* (When e.g. python setup.py build is run multiple times, contents of build directory would be copied to a subdirectory of itself (build/lib/build/lib/build/lib/...) :( .)doc* (Often with Sphinx-specific conf.py file)_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
By the way, I don't think there's a "big plan" regarding packages. PEP-420 was the plan and where it's deficient, more discussion should probably happen in python-dev.
I was there for the implementation of PEP-420. I feel like it's a good thing to simply and directly allow folders to represent packages. In the past, I was always bothered by the fact that it took two entities (a directory and an __init__ file) to represent a package.
Regardless, the implementation for Python 3.3 and 3.4 is set, so setuptools should support those models to the best of its ability. I think for find_packages, it might be worthwhile to do more heuristics, namely -- only inferring a directory is a package if it has modules somewhere within it.
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
Jason:
If we stick with the generous (and probably more correct) implementation
to accept any directory as a package, package maintainers are going
to have to be a lot more explicit about which directories are packages.
It almost renders find_packages useless.
I think you are reaching a level of head scratching similar to my 2013-11-05 comment. It seems to me that PEP 420's implications weren't fully thought through, as it undermines the ideas about what the package concept facilitates and what it protects against (not just causing problems for find_packages() ).
Perhaps someone at a higher pay grade could distill a little better what the big-picture goal is for packages, and figure out whether PEP 420 gets there. To that end, it might be worthwhile recording in more detail the implications that you've encountered.
For example, your discussion of package maintainers needing "to be a lot more explicit" -- what form do you envision that taking?
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
Add support for PEP 420 namespace packages to find_packages()
On Python 3.3+, find_packages() now considers any subdirectory of the
start directory that's not a regular package (i.e., that doesn't have an
__init__.py) to be a namespace package. Because this will often
include non-package directories, a new include argument has been added
to find_packages().
include can make it easier to narrow which directories are considered
packages instead of having to specify numerous excludes. In
particular, it's an easy way to keep top-level, non-source directories
from being considered packages.
The other way this supports PEP 420 is by making sure __pycache__
directories are never added to the list of packages.
Refs issue #97
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
I've merged the pull request, but I find that the output includes quite a few directories not previously covered. Consider the comparison on setuptools itself:
> python
Python 3.4.0rc3 (v3.4.0rc3:8a81cdab3e9d, Mar 9 2014, 20:24:50) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import setuptools
>>> setuptools.find_packages()
['build', 'command', 'dist', 'docs', 'foo', 'setuptools', 'temp', 'tests', '_markerlib', 'build.lib', 'docs.build', 'docs._templates', 'docs._theme', 'setuptools.command', 'setuptools.tests', 'tests.shlib_test', 'build.lib.setuptools', 'build.lib._markerlib', 'docs.build.doctrees', 'docs.build.html', 'docs._theme.nature', 'setuptools.tests.indexes', 'setuptools.tests.svn_data', 'build.lib.setuptools.command', 'build.lib.setuptools.tests', 'docs.build.html._sources', 'docs.build.html._static', 'docs._theme.nature.static', 'setuptools.tests.indexes.test_links_priority', 'setuptools.tests.indexes.test_links_priority.simple', 'setuptools.tests.indexes.test_links_priority.simple.foobar']
> py -2.7
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import setuptools
>>> setuptools.find_packages()
['setuptools', '_markerlib', 'setuptools.command', 'setuptools.tests']
I think perhaps the example above using the heuristic of testing for the presence of *.py files may be better. On the other hand, Python will import any directory.
> py
Python 3.4.0rc3 (v3.4.0rc3:8a81cdab3e9d, Mar 9 2014, 20:24:50) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir('foo')
[]
>>> import foo
>>>
If we stick with the generous (and probably more correct) implementation to accept any directory as a package, package maintainers are going to have to be a lot more explicit about which directories are packages. It almost renders find_packages useless.
Some more consideration is going to have to go into this idea.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
Add support for PEP 420 namespace packages to find_packages()
On Python 3.3+, find_packages() now considers any subdirectory of the
start directory that's not a regular package (i.e., that doesn't have an
__init__.py) to be a namespace package.
The other way this supports PEP 420 is by making sure __pycache__
directories are never added to the list of packages.
Fixes issue #97
_Original comment by_ wyatt (Bitbucket: wyatt, GitHub: wyatt):
I just pushed two more alternative approaches. The first uses __import__ to determine which directories are packages, and the second works similarly to the existing version but also considers _all_ directories under where to be namespace packages when running under Python 3.3+.
_Original comment by_ wyatt (Bitbucket: wyatt, GitHub: wyatt):
I ran into this issue yesterday and took a quick stab at a patch.
It works pretty much the same as before, but it's necessary to specify what to include when there are top level, non-package directories that contain .py files.
#!python
setup(
# ...
find_packages(include=['my_pep420_ns_package'])
# ...
)
It's slightly more complicated than I'd prefer because of the need to specify include in some cases. I was thinking about putting the where directory on sys.path and then trying to import subdirectories, but I wanted to try a simple(r) approach first.
On the other hand, maybe allowing includes to be specified is generally useful. It also makes find_packages() work in a way that's more similar to how the basic packages list works (i.e., you only specify what to include). In fact, if distutils.core.setup() would allow packages to be specified using wildcards, find_packages() probably wouldn't be necessary in most cases.
Anyway, here's the patch:
#!diff
diff -r f19978009ac8 setuptools/__init__.py
--- a/setuptools/__init__.py Thu Feb 06 09:31:48 2014 -0500
+++ b/setuptools/__init__.py Thu Feb 06 13:09:25 2014 -0800
@@ -1,5 +1,6 @@
"""Extensions to the 'distutils' for large or complex distributions"""
+import glob
import os
import sys
import distutils.core
@@ -27,31 +28,58 @@
# Standard package names for fixer packages
lib2to3_fixer_packages = ['lib2to3.fixes']
-def find_packages(where='.', exclude=()):
+def find_packages(where='.', exclude=(), include=()):
"""Return a list all Python packages found within directory 'where'
'where' should be supplied as a "cross-platform" (i.e. URL-style) path; it
- will be converted to the appropriate local path syntax. 'exclude' is a
- sequence of package names to exclude; '*' can be used as a wildcard in the
- names, such that 'foo.*' will exclude all subpackages of 'foo' (but not
- 'foo' itself).
+ will be converted to the appropriate local path syntax.
+
+ 'exclude' is a sequence of package names to exclude; '*' can be used
+ as a wildcard in the names, such that 'foo.*' will exclude all
+ subpackages of 'foo' (but not 'foo' itself).
+
+ 'include' is a sequence of package names to include. If it's
+ specified, only the named packages will be included. If it's not
+ specified, all found packages will be included (this is the same as
+ specifying include='*'). 'include' can include wild card patterns
+ just like 'exclude'.
+
+ The list of includes is built up first and then any excludes are
+ removed from it.
+
"""
+ from fnmatch import fnmatchcase
+ include = list(include or '*')
out = []
stack=[(convert_path(where), '')]
+ pep420 = sys.version_info[:2] >= (3, 3)
+ in_regular_pkg = False
while stack:
where,prefix = stack.pop(0)
for name in os.listdir(where):
fn = os.path.join(where,name)
+ candidate = prefix + name
+ for pat in include:
+ if fnmatchcase(candidate, pat):
+ break
+ else:
+ continue
+ in_regular_pkg = os.path.isfile(os.path.join(fn, '__init__.py'))
+ # Namespace packages can't be nested inside regular packages
+ in_pep420_ns_pkg = (
+ not in_regular_pkg
+ and pep420
+ and glob.glob(os.path.join(fn, '*.py'))
+ )
looks_like_package = (
'.' not in name
and os.path.isdir(fn)
- and os.path.isfile(os.path.join(fn, '__init__.py'))
+ and (in_regular_pkg or in_pep420_ns_pkg)
)
if looks_like_package:
- out.append(prefix+name)
- stack.append((fn, prefix+name+'.'))
+ out.append(candidate)
+ stack.append((fn, candidate + '.'))
for pat in list(exclude)+['ez_setup']:
- from fnmatch import fnmatchcase
out = [item for item in out if not fnmatchcase(item,pat)]
return out
_Original comment by_ gwideman (Bitbucket: gwideman, GitHub: gwideman):
Yes, the more I look at PEP 420-style packages, the less I am able to see what distinction there is between a package, and simply a directory tree containing python files at any level, a tree that import+sys.path can navigate arbitrarily.
Packages become whatever directories can be reached by import combined with sys.path. And neither the imports or sys.path are readily available to packaging software.
It may be that "packages" as a special level of modularity were an unnecessary idea in the first place, and it's as well to discard them.
But if packages _are_ thought to be an important level of modularity, then it should be up to the package developer to positively ordain them (as with __init__.py), not left up to the caller and whatever the caller's sys.path happens to be. At least, so I'm thinking at the moment.
So I'm a bit mystified what the big plan is regarding packages.
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
I'd like to say that Setuptools should support PEP 420 packages. The harder question is how should find_packages discover such packages? PEP 420 allows importing of any importable filename ending in .py in any directory in sys.path. However, sys.path isn't relevant when building distributions and discovering packages.
I imagine setuptools could include all suitable directories, possibly omitting explicitly excluded paths. This change would be backwards-incompatible, but could provide packagers with options to produce compatible dists.
I considered using Python's import machinery to find packages, but I don't believe the import machinery provides any discovery tools.
The other thing to consider - if setuptools supports this mode, it will not work on other versions of Python, so will require Python 3.3+.
This gives me an idea - perhaps setuptools can use the package metadata to determine when PEP 420 package discovery is enabled. It could inspect the Trove classifiers and if Python versions are specified and only include versions >= 3.3, the feature would be enabled.
That technique would be somewhat implicit, but properly-documented, it could provide an elegant way to allow a packager to both include PEP 420 packages and declare the Python requirement at the same time.
Another issue that's come up recently on distutils-sig is that nspkg.pth support is broken for PEP 420 currently -- the magic really shouldn't be running if PEP 420 support is available.
This issue has since been addressed in #805.
That's why it would be simplest to change setuptools.dist:check_nsp to automatically add the namespace packages to the list of packages, instead of giving an error -- at that point, nobody should even notice that find_packages() isn't finding the packages in question. (Though the docs should still be updated to clarify that any init-less packages must still be manually listed in namespace_packages.)
This sounds like a fine recommendation, which I will attempt to implement.
In the issue-97 branch, I've attempted to implement the recommendation, but I'm pretty sure this doesn't solve the problem. In particular, it still doesn't address the issue of a very basic use case like zope.interface. If zope/__init__.py does not exist, setuptools.find_packages() will still not find zope/interface/__init__.py.
Anyone know if there has been any more work on this? I'm currently in the situation of wanting to switch from using an "old school" namespace package to a PEP-420 one. In my case I have only a single (company-wide) namespace package, and various distributions that install packages into it, so for me, the solution @jaraco was working on in the issue-97 branch would be great, if only find_packages was updated as well, such that it also looks at namespace_packages when taking the decision on whether it should enter a directory in its search.
What I mean is, if find_packages encounters:
some_project/setup.py
some_project/my_company/foo/__init__.py
some_project/my_company/foo/stuff.py
some_project/my_company/bar/__init__.py
some_project/my_company/bar/morestuff.py
Then find_packages would normally not enter into my_company (the PEP-420 ns package). But if it was changed to also look at namespace_packages, it would enter, because I have my_company in there (I already do that). @jaraco: Would something like that work or am I babbling? Did you have any plans like that for your branch?
I would be perfectly fine with having to explicitly specify all my namespace packages in namespace_packages (in fact, I already do that with this single ns package that I have). Having find_packages work would still be a great boon compared to explicitly having to list all packages.
My recommendation for now is that package maintainers continue to use __init__.py files in their namespace packages to mark the folder as a package for find_packages. Either that or enumerate the packages explicitly and not use find_packages. I haven't made any advancement on the branch, but I do agree, what you propose would be preferable.
I've flagged this issue as help-wanted because I simply don't have time to address all of these issues on my own. Please feel free to grab the work I've done and advance it, or start again from the current master.
@jaraco Alright, I realized that what I said earlier doesn't really make sense, since find_package has no way of getting a hold of what was passed in to namespace_packages (right?).
We're moving to PEP 420, so what I've done for now is to keep a setup_helper.py next to my setup.py with:
from distutils.util import convert_path
from fnmatch import fnmatchcase
import os
def find_packages(where='.', exclude=(), include=('*',), pep420_dirs=()):
"""Our own version of the setuptools find_package helper
The additional 'pep420_dirs' parameter specifies directories which are
PEP 420 packages. These will be considered packages despite lacking a
__init__.py. We use this until setuptools issue #97 is fixed.
"""
where = convert_path(where)
exclude += ('*__pycache__',)
packages = []
for root, dirs, files in os.walk(where, followlinks=True):
all_dirs = dirs[:]
dirs[:] = []
for dir in all_dirs:
full_path = os.path.join(root, dir)
rel_path = os.path.relpath(full_path, where)
package = rel_path.replace(os.path.sep, '.')
included = any(fnmatchcase(package, pat=pat) for pat in include)
excluded = any(fnmatchcase(package, pat=pat) for pat in exclude)
has_init = os.path.isfile(os.path.join(full_path, '__init__.py'))
if ('.' in dir) or not (rel_path in pep420_dirs or has_init):
continue
if included and not excluded:
packages.append(package)
dirs.append(dir)
return packages
and then I do
packages=find_packages(pep420_dirs=['my_company']),
namespace_packages=['my_company'],
in the setup call. It's not too pretty, but works well enough until there's a better solution.
@jaraco I haven't hacked on setuptools before, but I could give it a shot.
Though, the only way I can think of to make the find_packages search be guided by both presence of __init__.py and namespace_packages is to somehow delay the entire search until namespace_packages is known.
So find_packages would do nothing but store away the fact that it has been called, its parameters, and that the search is still a TODO. And then at some later stage the actual search is done. But this would break the use of find_packages in any other context except for the call to setup(..) :(
Do you see some other way?
Perhaps find_packages should be left completely alone, and some sentient value should instead be stored, e.g. packages=RequestFindPackages(...), and then at a later stage, if isinstance(packages, RequestFindPackages) the search is performed (obviously with some better name...).
@jaraco One simple solution that I kind of like, to avoid having to defer the search until namespace_packages is known, would be to change the behavior of find_packages such that a directory in include is considered a package despite lacking __init__.py, and then instruct users to add their PEP 420 packages to include.
So it would be something like
implicit_namespace_packages=[...]
setup(
...
packages=find_packages(include=[...] + implicit_namespace_packages, ...)
namespace_packages=[...] + implicit_namespace_packages
)
I know it requires the user to introduce a variable to avoid having to type the list of implicit namespace packages twice, but I could live with that. For many use cases (e.g. a "suite" of distributions) I think the list of PEP 420 packages should be short (often just a single one), and the user might then even skip the variable.
I think it makes sense that if we've now discovered that find_packages needs more info to do its job in the presence of PEP 420 packages, that the info must be passed in as a parameter.
What do you think?
Regarding the first idea (deferring the search until namespace_packages is known), we could of course just do the search over again, but this time guided by namespace_packages, and during the deferred search only add packages which are not already added. A little more inefficient, but would preserve the behavior of find_packages for those who use it outside of setup(...).
But the problem with this of course is that if find_packages records the fact that it's been called (and its parameters), there's no way to know in the later stage that find_packages was called to construct packages, or if it just happened to be called for some other purpose before setup was called, so could lead to unwanted behavior. Or, put another way, there's no way to know if packages was constructed manually or by a call to find_packages.
Yet another crazy idea: We could allow dict entries in packages, to signify "search requests":
setup(
packages=[
'some_hardcoded_package',
{ 'exclude': '*.tests' }
],
namespace_packages=['my_company']
)
The dicts would support the normal find_packages parameters as keys, and would be expanded to the result of the search at a later time when namespace_packages is known. It's a little funky, but I think I like it.
Or maybe it's clearer to make it a name tuple FoundPackages à la
setup(
packages=[
'some_hardcoded_package',
FoundPackages(exclude='*.tests')
],
namespace_packages=['my_company']
)
No matter what, the downside of this approach is that find_packages then still won't have the capability, for the cases where someone is using it in another context. But perhaps we could also give it an implicit_namespace_packages parameter, for those use cases.
Of my ideas so far, the one that is the least amount of code is to change the meaning of include. This is what it would look like:
diff --git a/setuptools/__init__.py b/setuptools/__init__.py
index 54577ce..ac7f524 100644
--- a/setuptools/__init__.py
+++ b/setuptools/__init__.py
@@ -52,18 +52,24 @@ class PackageFinder(object):
specified, only the named packages will be included. If it's not
specified, all found packages will be included. 'include' can contain
shell style wildcard patterns just like 'exclude'.
+
+ Note: PEP 420 packages will not be included unless they are explicitly
+ listed with their full names in 'include', so if you use PEP 420
+ you would use something like find_packages(include=('*', 'foo'))
+ if foo is a PEP 420 package.
"""
return list(cls._find_packages_iter(
convert_path(where),
cls._build_filter('ez_setup', '*__pycache__', *exclude),
- cls._build_filter(*include)))
+ cls._build_filter(*include), include))
@classmethod
- def _find_packages_iter(cls, where, exclude, include):
+ def _find_packages_iter(cls, where, excluded, included, include):
"""
- All the packages found in 'where' that pass the 'include' filter, but
- not the 'exclude' filter.
+ All the packages found in 'where' that pass the 'included' filter, but
+ not the 'excluded' filter. If a package is in 'include', it's considered
+ a package even if it lacks an __init__.py.
"""
for root, dirs, files in os.walk(where, followlinks=True):
# Copy dirs to iterate over it, then empty dirs.
@@ -76,11 +82,11 @@ class PackageFinder(object):
package = rel_path.replace(os.path.sep, '.')
# Skip directory trees that are not valid packages
- if ('.' in dir or not cls._looks_like_package(full_path)):
+ if ('.' in dir or (not cls._looks_like_package(full_path) and package not in include)):
continue
# Should this package be included?
- if include(package) and not exclude(package):
+ if included(package) and not excluded(package):
yield package
# Keep searching subdirectories, as there may be more packages
And then e.g.
setup(
packages=find_packages(include=['*', 'my_company']),
namespace_packages('my_company')
)
I realize I've had a misconception about one thing in my thinking: I thought that you still had to list PEP 420 packages in the namespace_packages parameter to setup. But I realize now that you only need to do this for old style namespace packages (those with __init__.py). So ignore all my talk about trying to avoid repetition.
In light of that, I think the absolutely best/easiest solution is to simply add an namespace_packages parameter to find_packages, where the use must list any PEP 420 packages explicitly. I'll prepare a PR to do so.
Hello.
I made a small patch here with something that works for me, but needs a bit more work after discussion: https://github.com/pypa/setuptools/pull/943
This is a solution so that the simple case works without requiring any extra parameters be used.
I modified sampleproject here: https://github.com/illume/sampleproject
Note how there is only a file sample/sample.py and no __init__.py.
This is the simplest possible package. I think this is a very important use case for many people, and I don't think find_packages() should require doing anything special with any extra parameters for this case. I've seen many people trip over this issue, since it's pretty much the simplest type of package you can make.
It works for me on the sampleproject above, and some other projects I tested. I handled the case where it ignores directories with site-packages in them (probably a venv, definitely not a package).
There are a few failing tests still. Also, I need to make it only look for namespace packages in python 3.3+
What do you think?
@illume I think if you read the discussion above, there were concerns that simply using "if a dir has any .py files below it, it's a package" as the guiding logic in the search would make it too liberal for many projects.
Thanks @estan for your PR. It's a thorough implementation for our consideration. I'm considering it, but it does have a couple of smells that I'd like to avoid, namely that it passes a parameter through several calls, but also because that parameter has to be supplied twice (for packages and namespace_packages).
Someone recently suggested a different approach - using a sentinel file to signal a namespace package. Consider, for example, allowing a __nspkg__ 0-length file (or similar) to declare a directory to be a namespace package.
Such an approach would have the simple elegance of being stored with the source control system and being declarative. It would have the additional benefit of enabling the replacing the packages and namespace_packages parameters to discover these packages, such that every setup.py might look simply like:
setup(
packages=find_packages(),
namespace_packages=find_namespace_packages(),
)
What do the subscribers of this ticket think of such an idea?
@ncoghlan: How does such a proposal strike you? My instinct is that such a convention should be more widely reviewed.
How about a find_packages that works correctly on python 3.3+, but falls back to current behaviour for older pythons? Legacy find_packages can still work as is for older packages.
The problem with adding a separate namespace_packages function, or a separate argument is that I don't think people will know to look for them. How do users figure out the reason their package that works in python doesn't work when packaged?
Common folders like doc, and virtualenv ones can be removed from consideration by find_packages by looking for things like 'site-packages' and common sphinx files. Note, that we already have an exclude argument in find_packages.
Is there a tool to test setuptools changes over a wide range of packages already? It would be nice to base decisions on data. I'd be happy to do the work to find out if filtering out false positives will be effective. Please let me know if there is a tool like this already, I'd prefer not to write it if it already exists :)
When people have a sample/sample.py which works as a package in python, and then doesn't with packaging, we have a major packaging wart.
If find_packages is not going to be fixed, perhaps we can stop recommending to use it in the docs, and instead recommend a 'new_find_packages' or whatever a new api could be called?
@jaraco: I was under the impression that in current versions, you don't have to list implicit namespace packages in namespace_packages (the setup parameter). So there would be no duplication, right? My new parameter would be the only place where you have to list them. But I may have misunderstood that.
That said, I really like the approach with __nspkg__ markers that you describe.
@illume: "that works correctly on python 3.3+" <-- I think that trying to figure out what this means is what has made this issue so longstanding. People seem to think that taking all directories which have .py files below them as implicit namespace packages is a too liberal approach, even if the user can exclude things, and even if there are a set of default exclusions. If you look back in this thread you'll see that it has already been discussed.
@jaraco: Regarding "passing a parameter through several calls", do you mean that namespace_packages is passed on from find to find_packages_iter? If so, how else should it be implemented?
@estan I don't think a broken by default find_packages is good, even if people have discussed that before (yes, I did read the previous discussion). The accepted PEP 420 now says that packages are directories with .py files in them. Rather than dismiss the approach I suggested out of hand I'd like to look into it to see if it is a real problem. I haven't been able to find a testing tool for setuptools, so it looks like I'll have to hack something up.
When I help people with packaging this is one of the main issues that keeps coming up over and over again, and I doubt having a separate argument or function will help. Packaging in python is already arcane enough without adding another parameter or function call you'd need to add to make simple packages work.
@illume: Yes, personally I actually do think that approach could be viable. Sorry if I sounded dismissive.
Lets try to investigate what the impact would be for existing projects. Of the 100 most downloaded projects on PyPI, here are the 29 of them that uses find_packages and how they call it:
amqp-2.1.4 find_packages(exclude=['ez_setup', 't', 't.*'])
anyjson-0.3.3 find_packages(exclude=['ez_setup', 'examples', 'tests'])
awscli-1.11.44 find_packages(exclude=['tests*'])
beautifulsoup4-4.5.3 find_packages(exclude=['tests*'])
billiard-3.5.0.2 find_packages(exclude=['ez_setup', 't', 't.*'])
botocore-1.5.7 find_packages(exclude=['tests*'])
celery-4.0.2 find_packages(exclude=['t', 't.*'])
cryptography-1.7.2 find_packages(where="src", exclude=["_cffi_src", "_cffi_src.*"])
Django-1.10.5 find_packages(exclude=['django.conf.project_template', 'django.conf.app_template', 'django.bin'])
Fabric-1.13.1 find_packages()
gevent-1.2.1 find_packages('src')
gunicorn-19.6.0 find_packages(exclude=['examples', 'tests'])
jmespath-0.9.1 find_packages(exclude=['tests'])
kombu-4.0.2 find_packages(exclude=['t', 't.*'])
logilab-common-1.3.0 find_packages(exclude=['contrib', 'docs', 'test*'])
Mako-1.0.6 find_packages('.', exclude=['examples*', 'test*'])
nose-1.3.7 find_packages()
Paste-2.0.3 find_packages(exclude=['ez_setup', 'examples', 'packages', 'tests*'])
PasteDeploy-1.5.2 find_packages(exclude=['tests'])
Pillow-4.0.0 find_packages()
pip-9.0.1 find_packages(exclude=["contrib", "docs", "tests*", "tasks"])
Pygments-2.2.0 find_packages()
pyOpenSSL-16.2.0 find_packages(where="src")
setuptools-34.1.0 find_packages(exclude=['*.tests'])
setuptools-git-1.1 find_packages()
Sphinx-1.5.2 find_packages(exclude=['tests'])
SQLAlchemy-1.1.5 find_packages('lib')
supervisor-3.3.1 find_packages()
zope.interface-4.3.3 find_packages('src')
It would be interesting to see how an "any dir with a .py below it" approach would compare to the current behavior for these projects. I might script this and check when I have the time.
EDIT: And with compare, I mean see how many "false positives" that approach would give, what the extra unwanted dirs would be. This could then be a guidance on what to exclude by default, and a measure of how much extra work it would be for these projects to adapt to the new behavior.
Consider allowing a __nspkg__ 0-length file to declare a directory to be a namespace package.
It could even be named __init__.py!
I am hearing some support for the idea that namespace packages should be marked in some way. Frankly I consider that to be a bare minimum requirement to take packages seriously. It makes no sense to me that a package is in "the eye of the beholder". The developer who places python files in a directory must surely _know_ whether they are developing a package or just a bunch of files. Furthermore, their development in that directory is predicated on it being or not being a package.
Yet if the directory is not marked as a package, then human readers don't know how to use it, and as we've seen in this now three year thread, automated tools don't know what to do with it either.
A package is a fundamental unit of code with which one builds (not to mention deploys) applications, and yet somehow python lacks a way to intentionally establish where one is? There's much to love about python, but this incapacity to identify what is a package is ridiculous.
Serious question: Is there any difference in actually accessing a package (via import etc) between an __init__.py package and a namespace package? Maybe there's a difference of intent. And obviously one has an __init__.py file that may limit access to the enclosed modules and objects. But if you use an __init__.py file that gives access to all enclosed, this would be imported and called just like the same package with no __init__.py file (ie: a namespace package), no? Why are there even these two concepts?
Maybe I'm missing something, but it just seems to me that namespace packages need to be rethought.
@gwideman: I agree with pretty much everything you say. I think that PEP 420 is not well thought out, especially for tools needing to deduce package structure from the file system (or vice versa actually, in the case of py.test's automatic PYTHONPATH-prepending during py.test some_dir...).
In my case, my reason for moving from a classical namespace package for our company stuff (spread across various distributions) to a PEP 420 was this: I had made an internal dev tool, which I packaged as an Ubuntu package and uploaded to our company internal APT repo, and since the debhelper for Python distributions actually strips the __init__.py for namespace packages that are declared as such with namespace_packages in the setup call (wanting to encourage PEP 420 I guess, and avoid file conflicts), the namespace package was in fact turned into a PEP 420 one. The result was that developers working on other distributions of ours, with packages in that same namespace package, who set up their PYTHONPATH to point at the distribution they were hacking on, got the dev tool broken since Python couldn't see its PEP 420 package (the classical one in PYTHONPATH clouded it)... So that's my story on why I tried moving all our packages to PEP 420 and finding this issue #97.
@gwideman: I think that @jaraco meant that the __nspkg__ would only be present in the package dir in the distribution, not installed when doing setup.py install, so just a marker for the purpose of getting find_packages working (and possibly other tools), right @jaraco?
If it's to also be installed, then I think distro packagers will start yelling because it's yet another thing that will cause conflicts (in addition to classical __init__.py carrying namespace packages)..
@estan cool! How did you get those packages? Is there a tool of some sort to download them? I noticed there is a tool called Basket, and the list of top downloaded packages from https://python3wos.appspot.com/ Are these the correct places? Also, I'll grab a list of the 40 most recently updates packages as per https://pypi.python.org/pypi . I'll make a vagrant script to do it all inside.
I was thinking to modify find_packages to call both methods, and log (package_name, old_find_packages, new_packages, are_different). So then I can investigate the packages with differences (extra unwanted dirs).
[Edit: ok, I found tests/test_pypi.py which seems to have a lot of this stuff already]
Regarding "passing a parameter through several calls", do you mean that namespace_packages is passed on from find to find_packages_iter? If so, how else should it be implemented?
Indeed, I don't see a better way. And perhaps I'm being too sensitive here. I'll drop that concern.
I was under the impression that in current versions, you don't have to list implicit namespace packages in namespace_packages (the setup parameter). So there would be no duplication, right?
You may be right, though I haven't looked into it deeply. I do know that if you want the __init__.py file for backward compatibility, you do need to declare the namespace package in setup.py. The tools do need some way to know all namespace packages in order to generate the appropriate -nspkg.pth files on older Pythons, but for pure PEP 420 packages, that parameter may be unnecessary.
It [
__nspkg__] could even be named __init__.py!
It can't because the presence of a __init__.py precludes the initialization or inclusion of that package as a namespace package.
Serious question: Is there any difference in actually accessing a package (via import etc) between an __init__.py package and a namespace package? Maybe there's a difference of intent. And obviously one has an __init__.py file that may limit access to the enclosed modules and objects. But if you use an __init__.py file that gives access to all enclosed, this would be imported and called just like the same package with no __init__.py file (ie: a namespace package), no? Why are there even these two concepts?
Yes. There's some good reading in pep 402 and the second paragraph of the PEP 420 spec specifically says namespace packages cannot contain a __init__.py. This behavior allows a package to be a "regular" package and exclude the possibility of other modules appearing in the same namespace.
Maybe I'm missing something, but it just seems to me that namespace packages need to be rethought.
For better or worse, this design is the one we must work with in the build tools. The design won't change until at least Python 3.7, and before that happens, the issue will need to be well-understood and a cogent solution presented, probably in a PEP.
PEP 420 is not well thought out, especially for tools needing to deduce package structure from the file system
Perhaps. The PEP focuses on the import system and not build/packaging systems. The word "discover" appears twice and only in reference to the import system. I wonder how other languages handle this condition - how does Java or Perl (the two languages referenced in the PEPs) deal with package discovery during the build process? Does that provide any inspiration? I know nothing about Perl, but my experience with Java makes me think they're just very explicit about it, not dissimilar to the proposal in #915, or even more explicit without any discovery, such as setup(packages['myns.foo']).
debhelper for Python distributions actually strips the __init__.py for namespace packages that are declared as such with namespace_packages in the setup call (wanting to encourage PEP 420 I guess, and avoid file conflicts)
Actually, this behavior is baked into setuptools and pip as well. In setuptools, it happens with setup.py install --single-version-externally-managed, which is probably what the deb helper is invoking and what pip invokes. In addition to stripping the __init__.py, it also adds a -nspkg.pth file which provides backward compatibility in older Pythons. In 28.3, Setuptools tried relying entirely on PEP 420 for Python 3.3+, but in version 31, this change was reverted as the -nspkg.pth is necessary to override PEP-420 behavior for namespace packages which have the __init__.py for compatibility reasons.
The result was that developers working on other distributions of ours, with packages in that same namespace package, who set up their PYTHONPATH to point at the distribution they were hacking on, got the dev tool broken since Python couldn't see its PEP 420 package
The release of Setuptools 31 addresses this use case through the use of setup.py develop (invoked by pip install -e), which should be preferred over PYTHONPATH hacks.
And I should point out that if you want to support Python 3.3+ only in your namespace package and rely on PEP 420, that is possible today - you just can't use find_packages but must specify your packages explicitly in setup.py.
__nspkg__would only be present in the package dir in the distribution, not installed when doing setup.py install, so just a marker for the purpose of getting find_packages working
Yes, though I hadn't thought through it too deeply. The marker would have to be included in sdists. To be conservative, it would be nice if it were not installed to the environment, to prevent its abuse for other purposes.
@estan I modified test_pypi.py a bit to go off installing packages with the modified setuptools. Luckily all the packages install ok with (pip install packagename). However the failure with this plan is that the packages already have the correct files in there. I can't use the whl or tar.gz either, since they both have only the required files in there. So I need to go and find the repos for them in order to see if it works. I started to look at the 'home_page' attribute in the json to detect the github/bitbucket repo. (there's 88 of 400 packages with the required git repo in the home_page). So this will let me download quite a few packages. I'll wait overnight for the repos to finish downloading and continue tomorrow.
@jaraco Thanks for your patient response, and the pointer to PEP 402 (in addition to PEP 420). The current state-of-play clearly presents a thorny problem, and I understand that making find_packages() work most usefully with the existing import design. But in addition, the explorations themselves provide important insights on the way to understanding the issues with PEP 420.
@illume: What I did to get that list was just to download the first two pages of http://pypi-ranking.info/alltime , do some grepping+sed to get a text list, and then using pip download -d somedir --no-binary :all: --no-deps <package> to download the source dist for each of them, extract them, and then grep for find_packages in their setup.py to determine if they were using it.
My plan was then simply to create two virtualenvs, one with original setuptools and one with a patched one, and run something like from setuptools import find_packages; print(find_packages(...)) from each of the two virtualenvs against each package, and each time diff the two lists.
Regarding Java and Go: they're very prescriptive about how developers lay
out their source directories. People put up with that because they're
solely used as application development languages, and not for ad hoc
scripting.
Regarding Perl: if I recall correctly, the package membership declaration
has to appear in the module source file as well as being part of the
directory layout
I think for setuptools, it would be fine to have a "source only" marker
file that was used to lay down a directory on target systems. I'd suggest
"__nsdir__" as the name, though - you're requesting that the build system
add that directory to the namespace package, not defining the namespace
package directly.
@ncoghlan: Good point about the name (__nsdir__ instead of __nspkg__).
I don't have much more to offer in this discussion I think. Personally I'd be fine with either of these approaches:
__nsdir__ marker files to guide the search.namespace_packages parameter à la my PR to guide the search..py files below it an implicit namespace package, with a good set of default excludes, where the user have to use include/exclude to guide the search if the default excludes are not enough.@estan thanks for the download command. I tried it and had problems, since it tried to call setup.py egg_info on the packages? It's ok though, because I could get the source download link easily from pypi. Unfortunately using the source may be misleading since, find_packages itself determines what goes inside the source file when sdist makes it. So I had to download the git repos as well for a more accurate test. It is good however to use the pypi source downloads to find if find_packages is used, and which arguments are being used.
Anyway, after trying all that now I have about 80 git repos and 186 source packages.
29 out of the 186 packages with source downloaded off pypi out of 400 attempted to download use find_packages with arguments. 19 don't use arguments.
$ find . -name setup.py -depth 2 -exec grep -H find_packages( {} \; | grep -v find_packages() | wc -l
29
$ find . -name setup.py -depth 2 -exec grep -H find_packages() {} \; | wc -l
19
$ find . -name setup.py -depth 2 -exec ls {} \; | wc -l
186
docs and tests are quite common folders. conf.py can be used to filter out docs(a common sphinx folder) or docs/*.rst, and tests/test_*.py seems to be a good filter for the tests dir.
These are what people are using in their exclude= param. ez_setup, docs, examples, tests, examples, test, tests.*, benchmarks, contrib, packages.
I will come back with another report after more investigation. Cheers!
Here is a gist that you can include in your setup.py and resolve PEP420 namespace packages and standard packages (those with an __init__.py)
https://gist.github.com/silkentrance/2710c559f07560d87d33cb6698dfe7a1
It does not find namespace packages that use the old scheme, though. I have code here, that does that, but it is unreliable.
Python 3.5+ only
I've opened #1140 to resolve this.
@jaraco since you rejected #1140, the only way to do this is via setup.cfg or setting packages to find:. Agreed?
@agronholm I apologize. I didn't mean to be rejecting #1140 outright. It did highlight two alternate paths, vendoring OrderedDict or dropping support for Python 2.6, such that the namespace_packages declaration can be re-used. But now that I think about it more, we probably don't want to re-use that declaration, because it does more than simply declare namespace packages. It also signals the behavior that creates -nspkg.pth files and other namespace package management, which we may not want conflated with package discovery.
As I re-read this issue and consider the two PRs, I guess there is a consensus that we should add this interface. We should probably base the solution on #943 as I promised to do there.
the only way to do this is via setup.cfg or settings
packagestofind:
I'm not sure I follow. And the declarative config, including find: didn't exist at the time. Does find: already support declaration of namespace packages?
I'm not sure I follow. And the declarative config, including find: didn't exist at the time. Does find: already support declaration of namespace packages?
No, but unlike find_packages(), it could be made to reuse the existing declaration. That declaration in turn can be made not to create nspkg.pth files on Python 3, where PEP 420 namespace packages already work. Would there be a problem with this?
Note: if the only Python 2.6 you care about is the one in RHEL 6, that actually provides collections.OrderedDict (other things in RHEL needed it, so it got backported a few years back).
I don't like #943. It would consider my tests directory a namespace package because it contains Python modules. Ditto for docs.
Yes, I agree. #1140 is better. I'll revive it.
unlike find_packages(),
find:could be made to reuse the existing declaration
That's in intriguing idea. If we could deprecate find_packages altogether or at least make find: the preferred mechanism for package discovery, perhaps that would be the best way. I do really want to avoid having two mechanisms, each of which have unique features and detriments.
So the new consensus would be to deprecate find_packages() in favor of find: + namespace_packages. The latter would be changed to not require a placeholder __init__.py in the namespace package directory. I will look into making a PR with these changes.
Is there any progress on this topic?
PEP 420 was approved by Guido on May 24, 2012
This issue was created on Nov 5, 2013
And the only output is a documentation change (roughly 5 years later) that find_package does not support PEP 420!?
EOL of Python 2.7 is January 1st, 2020.
Python 3.3 which introduced this feature the first time has already reached EOL
So basically everyone should switch to a modern Python version, which support PEP 420, in the next few month, but there is no useful way to use it within setuptools?
So basically everyone should switch to a modern Python version, which support PEP 420, in the next few month, but there is no useful way to use it within setuptools?
What is preventing you from manually specifying the packages? That's what I do.
What is preventing you from manually specifying the packages? That's what I do.
Yes, of course, because there is no other option.
But what's the meaning of find_packages if it does not work with modern Python?
But what's the meaning of find_packages if it does not work with modern Python?
I'll remind that PEP 420 namespaces is a feature not needed by most people so that statement seems a bit unfair. That said, this will probably require a PR before it goes anywhere.
This issue was created on Nov 5, 2013
"You're welcome".
And 50% of my rationale for raising this issue was that PEP420 seemed to break what people relied upon to indicate a package, further weakening Python's chronic lack of explicit definition or record of programmers' intent as to what constitutes a complete "thing", a la "project" in other development environments. (And obliquely, agronholm's comment "I'll remind that PEP 420 namespaces is a feature not needed by most people" somewhat reinforces that observation.) For all that I like about Python, this is a problem area in my view, and continues to be.
Don't get me wrong – I would really like this to be fixed. I've just been busy with other projects to do it myself, since for me it's just a very minor inconvenience. I bet the changes are easy enough to implement that just doing a PR would be more productive than keeping on posting on this issue.
@agronholm I believe that most of the functionality is already there. Yet, PEP420PackageFinder.find is not being exported yet and might be done so via find_packages_ns.
However, given a python environment that supports PEP420, find_packages should be able to automatically forward everything to find_packages_ns.
I am currently working on a PR for this.
Has anyone tried using the interface currently presented by Setuptools, namely:
setup(
packages=setuptools.PEP420PackageFinder.find(),
)
Most of my projects have migrated or are migrating their metadata and options to setup.cfg. How do I use that from setup.cfg?
Has anyone tried using the interface currently presented by Setuptools, namely:
setup( packages=setuptools.PEP420PackageFinder.find(), )
I tried it now and with using the exclude parameter it would work for me.
But what do you mean with currently presented? Will you remove this interface again? Should this be an official interface everyone could/should use? At the moment one of PyCharms inspection reports a warning

Good point. I think I prepared/exposed that interface for comment and experimentation. If it's a suitable solution, we should definitely present it in a more official manner. I think I'd intended to test it out with the pmxbot namespace, as it doesn't have any Python 2 releases, but I never got around to it. If the behavior is suitable we should expose it with a more friendly name, such as find_packages_ns as silkentrance's PR suggests.
@jaroco feel free to contribute to the PR
@agronholm the official documentation https://docs.python.org/3/distutils/configfile.html lacks a lot, do you have more information?
the official documentation https://docs.python.org/3/distutils/configfile.html lacks a lot, do you have more information?
Regarding?
@agronholm where command is one of the Distutils commands
regardless, distutils is different from setuptools, unless both have been merged, of which i am unaware, see also https://stackoverflow.com/a/25372045.
so this might not affect you at all.
@silkentrance I still have no clue what you're asking me about.
The way I see this going forward is by making use of the namespace_packages option in the package finder. Is this what you are doing in your PR?
@pganssle How do I use find_packages_ns from setup.cfg? It's not obvious from your commit.
@agronholm It was @silkentrance that did the implementation, I just merged it. Good point, though, I don't believe this is supported. I will create a new issue.
Most helpful comment
_Original comment by_ jaraco (Bitbucket: jaraco, GitHub: jaraco):
I believe this issue is more than a minor one and should be addressed sooner than later.