Setuptools: Ignoring package_data when include_package_data is True

Created on 18 Aug 2018  路  4Comments  路  Source: pypa/setuptools

Hi, firstly, thanks for the work you maintainers do on this library :)

I could be incorrect in saying this, but it's come to my understanding that setting include_package_data=True in setup includes data files specified by MANIFEST.in and only in MANIFEST.in. At the very least, this behaviour is not clearly explained in the docs, and at the worst it is unexpected and implicit behaviour.

Here is a sentence from the docs which seems to be trying to say the above, although elsewhere in the docs include_package_data is not explained in this way:

If using the setuptools-specific include_package_data argument, files specified by package_data will not be automatically added to the manifest unless they are listed in the MANIFEST.in file.

Doesn't this make package_data redundant when include_package_data is True? Does it really make sense to have a param called include_package_data which implicitly ignores package_data?

In summary, this is the behaviour I seem to have come across when using these options in various combinations:

  1. Including files in package_data with include_package_data=False:

    • Files specified by package_data are included as well as everything in MANIFEST.in.

  2. Including files in package_data with include_package_data=True:

    • Only the files specified by MANIFEST.in are included, files in package_data not specified by MANIFEST.in are excluded.

  3. Omitting package_data with include_package_data=True:

    • Same result as point 2.

I'm sorry if I've got this wrong, any correspondence is much appreciated :)

Needs Discussion documentation enhancement major

Most helpful comment

I tend to agree that the intuitive behavior would either be:

  1. include_package_data + package_data does the union of everything found between the two of them
  2. an error is raised as these are incompatible arguments

My preference would be 1, but that is a backwards-incompatible change that could probably cause all kinds of issues with existing packages, and a shocking number of projects don't test their packages as installed. I think we can detect if both are specified and switch it over to a warning for a few releases, then an error.

Also, I find include_package_data to be a very misleading name. I suspect we'd be better off deprecating this keyword entirely. It could be replaced with a find_package_data function similar to find_packages. Alternatively (and I like this less), we could do something like add an include_manifest_data or include_manifest_specified_data, which would have the "union of specified package data and manifest data" behavior.

All 4 comments

I tend to agree that the intuitive behavior would either be:

  1. include_package_data + package_data does the union of everything found between the two of them
  2. an error is raised as these are incompatible arguments

My preference would be 1, but that is a backwards-incompatible change that could probably cause all kinds of issues with existing packages, and a shocking number of projects don't test their packages as installed. I think we can detect if both are specified and switch it over to a warning for a few releases, then an error.

Also, I find include_package_data to be a very misleading name. I suspect we'd be better off deprecating this keyword entirely. It could be replaced with a find_package_data function similar to find_packages. Alternatively (and I like this less), we could do something like add an include_manifest_data or include_manifest_specified_data, which would have the "union of specified package data and manifest data" behavior.

As a newcomer to the Python packaging world I experienced that confusion first hand

This is completely confusing and frustrating. Hope to see this resolved very soon

I suspect this issue is related to this hacky behavior.

Was this page helpful?
0 / 5 - 0 ratings