Hi, firstly, thanks for the work you maintainers do on this library :)
I could be incorrect in saying this, but it's come to my understanding that setting include_package_data=True in setup includes data files specified by MANIFEST.in and only in MANIFEST.in. At the very least, this behaviour is not clearly explained in the docs, and at the worst it is unexpected and implicit behaviour.
Here is a sentence from the docs which seems to be trying to say the above, although elsewhere in the docs include_package_data is not explained in this way:
If using the setuptools-specific
include_package_dataargument, files specified bypackage_datawill not be automatically added to the manifest unless they are listed in the MANIFEST.in file.
Doesn't this make package_data redundant when include_package_data is True? Does it really make sense to have a param called include_package_data which implicitly ignores package_data?
In summary, this is the behaviour I seem to have come across when using these options in various combinations:
package_data with include_package_data=False:package_data are included as well as everything in MANIFEST.in.package_data with include_package_data=True:package_data not specified by MANIFEST.in are excluded.package_data with include_package_data=True:I'm sorry if I've got this wrong, any correspondence is much appreciated :)
I tend to agree that the intuitive behavior would either be:
include_package_data + package_data does the union of everything found between the two of themMy preference would be 1, but that is a backwards-incompatible change that could probably cause all kinds of issues with existing packages, and a shocking number of projects don't test their packages as installed. I think we can detect if both are specified and switch it over to a warning for a few releases, then an error.
Also, I find include_package_data to be a very misleading name. I suspect we'd be better off deprecating this keyword entirely. It could be replaced with a find_package_data function similar to find_packages. Alternatively (and I like this less), we could do something like add an include_manifest_data or include_manifest_specified_data, which would have the "union of specified package data and manifest data" behavior.
As a newcomer to the Python packaging world I experienced that confusion first hand
This is completely confusing and frustrating. Hope to see this resolved very soon
I suspect this issue is related to this hacky behavior.
Most helpful comment
I tend to agree that the intuitive behavior would either be:
include_package_data+package_datadoes the union of everything found between the two of themMy preference would be 1, but that is a backwards-incompatible change that could probably cause all kinds of issues with existing packages, and a shocking number of projects don't test their packages as installed. I think we can detect if both are specified and switch it over to a warning for a few releases, then an error.
Also, I find
include_package_datato be a very misleading name. I suspect we'd be better off deprecating this keyword entirely. It could be replaced with afind_package_datafunction similar tofind_packages. Alternatively (and I like this less), we could do something like add aninclude_manifest_dataorinclude_manifest_specified_data, which would have the "union of specified package data and manifest data" behavior.