Poetry: Support for data_files

Created on 12 Feb 2019  Â·  25Comments  Â·  Source: python-poetry/poetry

  • [x] I have searched the issues of this repo and believe that this is not a duplicate.
  • [x] I have searched the documentation and believe that my question is not covered.

Feature Request

Poetry does not current support the setup(data_files:[]) element which allows you to include datafiles which live outside of the package files area. This functionality is generally used for shipping non-code files which might be necessary for your library to run, or for other libraries to build. Examples include protobuf .proto files, avro schemas, thrift idl, etc.

Feature

Most helpful comment

Every time I jump head first into a new tool, I smash my face into the bottom of the pool.

All 25 comments

I used data_files to ship systemd unit file. This is a very important feature !

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@sdispater I guess this would wait for post 1.0, right ?

This is critical for a (basic) gui app

Do I understand correctly, that data_files that live next to the package modules are supported?

So, this layout should work and config_data.csv will be packaged?

pkgname/
  pyproject.toml
  src/
    pckname/
      __init__.py
      data_files/
        config_data.csv

Every time I jump head first into a new tool, I smash my face into the bottom of the pool.

This should now be covered by https://python-poetry.org/docs/pyproject/#include-and-exclude. If this is not the case, please feel free to comment here or open a new issue with the specific scenario not covered.

that doesn't let you specify where they should go? how are users supposed to install a .desktop for DE integration?

There are at least two use cases:

  1. https://docs.python.org/2/distutils/setupscript.html#installing-package-data
  2. https://docs.python.org/2/distutils/setupscript.html#installing-additional-files

AIUI include/exclude mechanism do not match either, they just add it to the package

Now, substantially, if my package is going to be installed in
/some/path/lib/python3.6/site-packages/ then those files are going to be installed directly into such directory

Those two use cases specify something as much required to be able to move from setuptools, as not implemented yet in poetry.

note: package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path, pacakge_data is package oriented (as make sense to be, managing packaging). also the empty package '' use case is extremely important:
{'': ['assets/*']} is extremely expressive for whom has lots of files in lots of packages (which can be added and removed) and with include would need to explicitely list them all.

@kalfa, as you later realized, include/exclude matches exactly the first use case you linked, package data. Can you provide a concrete example of package data that is easy to do with setuptools but difficult with Poetry?

Package data is less expressive with include/exclude and more difficult to read.
Overall it is possible to achieve most if not all use cases.

Setup tools approach is more compact and readable

  • Install the directory asset in each package.
  • install the directory foo in package X
'':'assets/*',
'X':'foo/*'

With poetry I have to specify a list of more obscure patterns. But for simple enough projects, is good enough. As you said, i understood later the potentiality.

What is missing is the other use case, which this ticket is about, and has been closed and IMHO should be reopened

I'm porting setup.py files to pyproject.toml and trying to build the same wheel. Happy to find out I'm wrong and it's possible

Yeah .desktop files need explicit support and I don't think poetry has a
way to install them into the appropriate /usr/share/applications folder.

On Fri, May 29, 2020, 16:14 Cosimo Alfarano notifications@github.com
wrote:

Package data is less expressive with include/exclude and more difficult to
read.
Overall it is possible to achieve most if not all use cases.

Setup tools approach is more compact and readable

  • Install the directory asset in each package.
  • install the directory foo in package X

'':'assets/',
'X':'foo/
'

With poetry I have to specify a list of more obscure patterns. But for
simple enough projects, is good enough. As you said, i understood later the
potentiality.

What is missing is the other use case, which this ticket is about, and has
been closed and IMHO should be reopened

I'm porting setup.py files to pyproject.toml and trying to build the same
wheel. Happy to find out I'm wrong and it's possible

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/python-poetry/poetry/issues/890#issuecomment-636171976,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABLU44JYVWSTC7TRT6OUSTRUAJS3ANCNFSM4GXACQWQ
.

"data_files" are delivered relative to sys.prefix, whereas "package_data" is delivered to site-packages. I don't think it's possible to deliver files relative to sys.prefix using Poetry's include/exclude options.

Another use-case is the distribution of man pages with the package.

I tried to move a unix console app following FHS from setuptools to poetry but was stuck in this issue and looks like will have to rollback :(

@kalfa in the post above you wrote that

package_data can be implemented with include, but does not provide a lot of flexibility. we have to hardcode lots of path

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

@Ezhvsalate

@kalfa in the post above you wrote that

package_data can be implemented with include

Could you please provide an example of how I can achieve this with poetry:

data_files=[('/etc/myapp/', ['myapp.conf'])]

I don't think you can yet (unless in the meantime I wrote my comments it's been implemented and I'm unaware of it).

Only package_data has a way in poetry.

What you mentioned is the same use case of desktop files & co.

@kalfa thank you, got it.

@abn Is there any chance for the feature to be implemented? Maybe there is some way to reopen it? Found also a pull request https://github.com/python-poetry/poetry/pull/901 with implementation but it's also closed.

From my point of view (and many others) using data_files (the one from _setuptools_) is a bad practice. And I would venture that it is why it is not supported in _poetry_.

The idea that _pip_-installing a project could result in files being written to random locations on the local file system is discomforting. Things like: data_files=[('/etc/myapp/', ['myapp.conf'])] get a no from me. For one, it would mean _pip_-installing with sudo, which is also a no (there are way too many issues coming with that).

It is true, that there is a need for such things, in particular for applications. But from what I understood Python's packaging ecosystem was initially built with libraries in mind, applications were (and still very much are) some kind of second class citizens in a sense. So packaging applications in order to distribute them on _PyPI_ is still very awkward. There are many other issues showing this divide between libraries and applications all around the Python packaging ecosystem in general, _poetry_ included (but not _poetry_'s fault in any sense, as far as I have seen).

My usual recommendation when something like data_files is needed is to go beyond the standard/common Python packaging techniques and reach for the packaging techniques specific to the operating systems. So for example, for Linux I would recommend looking into packaging your applications with _apt_/.deb, _yum_/.rpm, _pacman_, _appimage_, _snap_, etc. Give _pyinstaller_, or _beewares_'s _briefcase_ or other similar tools a look. Those would probably give you a much better experience for such things.

The _setuptools_ package_data on the other hand, is perfectly fine and encouraged. It only results in files written in the venv/lib/site-packages/mylibrary directory of your own package for the environment. So for _poetry_, as it was already mentioned, use include and exclude. More often than not, those are sufficient, no need to write files to random places on the file system. Also remember to use _importlib.resources_ to read those files, never rely on paths relative to __file__.

I will also add that for things such as configuration files, user data files, cache files, etc. you should have a look at _appdirs_.

So, in short:
If you need data_files, think twice. If you really need data_files, I would recommend you to rely on something more than just the common Python packaging tools. Go for pyinstaller, or briefcase, etc. or for more heavy duty tools (apt, yum, pacman, etc.). Because you want OS-specific or Linux distro-specific things anyway. More generally if you want to distribute applications, you might want to look beyond distributing as _sdist_ and _wheels_, those are not really made for applications.

To add onto @sinoroc's excellent explanation, remember that Python is a cross-platform language. People install Python software on Windows, and if you publish an application on PyPI, Windows users might expect that it will work on their system. If you install platform-specific files (e.g. /etc/whatever), then you might need a platform-specific installer.

@sinoroc From what I see in this issue, it looks like the reason data_files is not supported in Poetry currently is that the maintainers do not see the use case for it. It is certainly true that use of data_files should be minimized (libraries almost never need it), but applications in many cases have no other option to bundle assets properly.

The idea that pip-installing a project could result in files being written to random locations on the local file system is discomforting.

Even without data_files, this is the case. Arbitrary code is executed whenever you pip install a package; pip installing a package means you trust that package.

Additionally, data_files does not require that you specify absolute paths for your files to be installed into (in fact, it's discouraged). Relative paths work (e.g. ('share/applications', 'xyz.desktop')), and the files will be installed relative to either sys.prefix or site.USER_BASE.

Your recommendation for using tools like pyinstaller, briefcase, Debian packages, etc. isn't really possible for application developers in a lot of cases. If, for example, an application wanted to support only Linux, there are still a lot of different kinds of package formats that the application developer would have to support. For that reason, distribution-specific package formats are usually created and maintained specifically for those distributions by someone on behalf of the distribution, rather than the maintainer of the application. Also, many of these formats take advantage of using the application's setuptools setup to install data files (see for example pybuild).

include is not a replacement for data_files in many cases, as other users have mentioned here (application desktop files, systemd unit files, man pages, etc).

@thejohnfreeman To address your concern of Linux-only applications on PyPI, it is not a requirement that Poetry packages are published to PyPI. A lot of applications won't be. Also, PyPI has classifiers to mark applications as supporting only Linux. Users should not be blindly installing applications from PyPI --- that is a recipe for disaster.

@thomassross I totally understand your point of view. But my point still stands: I do not believe data_files is a good practice for the common use cases. And as far as I understood, one of the big drivers for the development of _poetry_ is to enforce good practices.

There are obviously very legitimate use cases where data_files are helpful and a good solution. For example if the project is only used in controlled environment for private usage, then I have nothing against using data_files.

So I would side on not adding support for data_files in _poetry_, and I would absolutely encourage a _plugin_ that adds this feature (plugin system is scheduled for _v1.2_).

While it may not be terribly common, it is still a necessary piece of functionality for many applications if they want to fully take advantage of Poetry. I personally would like to see it in Poetry core (with a warning in the documentation recommending include where it's possible to use it, if required).

In any case, it would be great if we could get a response from the project maintainers on how they feel about implementing this functionality (@abn?).

This is a feature that is preventing me from adopting Poetry in some of my own projects.

@sinoroc
(I moved the discussion to here.)

Could you give or update the example in the include and exclude section for what the relative path is based? Is it based on where pyproject.toml locates or the package folder?

For example,

dummy_folder/
    pyproject.toml
    CHANGE.log
    my_package/
        __init__.py
        my_data.csv

The pyproject.toml is for my_package/ and I am not sure what should I specify in include = [] in pyproject.toml, is it my_data.csv or my_package/my_data.csv?

If it is the latter, would it fail for the user to simply specify CHANGE.log because only things in my_package/ will be installed to site-packages?

@hyliu1989 I am probably not the best placed to answer this, but I will try to give it a shot in the other thread.

Another piece of information on the topic:

data_files

Warning: data_files is deprecated. It does not work with wheels, so it should be avoided.

A list of strings specifying the data files to install.

-- https://setuptools.readthedocs.io/en/latest/references/keywords.html?highlight=data_files

Also maybe related: https://github.com/pypa/wheel/issues/92

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jhrmnn picture jhrmnn  Â·  3Comments

Euphorbium picture Euphorbium  Â·  3Comments

jackemuk picture jackemuk  Â·  3Comments

mozartilize picture mozartilize  Â·  3Comments

AWegnerGitHub picture AWegnerGitHub  Â·  3Comments