Conan: Extract 7z archives

Created on 21 Dec 2015  路  17Comments  路  Source: conan-io/conan

conans.tools.unzip supports .zip and .tar.*s as far as I can see. It would be nice if .7z would be supported also, as it usually compresses much better.

Boost is 131 MB (.zip) vs 71 MB (.7z) for example (http://sourceforge.net/projects/boost/files/boost/1.60.0/).

Most helpful comment

I am ok with this, but making it very clear that this feature in python 2 won't be supported at all.

All 17 comments

Good idea.

I have researched a little bit, and the main problem is the lack of a good portable package for lzma compression. There exist pylzma, but requires building binaries, that fails on win. I have been able to use this precompiled wheel:

http://www.lfd.uci.edu/~gohlke/pythonlibs/#pylzma => pylzma-0.4.8-cp27-none-win32.whl

Then, the following code works, listing boost 7z:

import py7zlib

f7file = "<mypath>/boost_1_60_0.7z"

with open(f7file, 'rb') as f:
     z = py7zlib.Archive7z(f)
     z.list()

Such binary dependency can cause troubles when packaging the application, but let's try it.

You could use cmake -E tar ... to handle packing and unpacking 7z. Latest CMake 3.5 also supports LZMA2 and newer filters.

Thanks for your suggestion. The issue with it is that it will introduce a hard dependency on cmake, that now is just optional. You might be able to use conan without cmake at all (in fact we are aware of active users in companies that are not using cmake in their projects at all). Furthermore it will introduce a dependency to modern cmake, while conan tries to remain backwards compatible down to cmake 2.8, which (unfortunately) is the default in many widely used systems, and actively used by many developers in many projects.

There is an ongoing effort to add python 3 support, which could provide some of this functionality, but not possible to make it available for all conan users, as many of them would still use python 2.7.

I think the best possible solution here is to bundle libarchive-based bsdtar program with conan and invoke it to unpack archives. It supports all widespread archive formats and can autodetect what format to use based on provided file.

I don't think this is the best approach, the downloads date back to last modified in 2008, not to say that the integration to call external programs to do certain tasks is complicated and very error-prone, very difficult to get it right. Furthermore, it is something to be done in all OSs and all packages and installers, and we have been reported that conan runs in a few more than the strictly supported. Also, the difference between Python 27 and 35 will add extra complexity. It must be a fully integrated, portable and maintained solution, otherwise devops (releases), troubleshooting and support will kill us. Given that this is mainly a speed/space optimization, not a strictly functional requirement, I'd say the best would be to gradually move to Python 3.5 as the mainstream one, until we deprecate Python 2.7, and we will get it from Python 3.5.

In any case, of course I am not against this feature, it is just that we don't have the resources to address it, if someone wants to implement, test, create packages/installers for all platforms, and provide support for this functionality, it is very welcome.

the downloads date back to last modified in 2008

You'll need to build it, libarchive is actively developed

the integration to call external programs to do certain tasks is complicated and very error-prone, very difficult to get it right

In my view UNIX-way integration is much easier less error-prone than writing your own code with libraries. In this case all you need is to pass input file and working directory to bsdtar, and it will do all other work

_UPD_ It's basically same as self.run() in conanfile.py

Furthermore, it is something to be done in all OSs and all packages and installers

Just require it as a system dependency for unsupported systems

Please be aware that some older 7z versions create non-standard / corrupted 7z archives. 7z is able to gracefully handle those but libarchive (master) is not.

This leads to archives which cannot be unpacked with bsdtar or cmake. Because the issue happens only sometimes when compressing it might be unnoticed by bsdtar for some time.

I'm not sure if that makes sense to support corrupted archives, but if you need this I think the right way would be to improve libarchive, or invoke external 7z program (possibly as a fallback if bsdtar fails and file is in 7z format)

There is some wrappers around libarchive: PyEasyArchive, python-libarchive-c (which is a rewrite of PyEaseArchive) and old google swig python-libarchive. All of these depends on run-time link of libarchive. So the first options is to take libarchive library with conan.

The second option is use pyliblzma. I think it's better because we can statically compile xz (liblzma) in pyliblzma package.

About pyliblzma, I see a few issues. It is a version 0.5.3 package, latest uploaded in 2010 = no maintenance, no support. No github repository. LGPL license, which would invalidate the MIT license used in the conan installers. It also requires compiling and re-distributing the library.

I see this issue as not conan specific, but as python specific. Approached as such, this contribution would benefit a huge OSS community. I think the best approach would be trying to provide python packages (preferably wheels) that bundle libarchive and python-libarchive-c, that would work cross platform in major OS: Win, Linux (several distros and versions) and OSX.

I think conan would benefit from 7z when compressing as well, as compressing large packages can be very slow (e.g. Qt on windows which has a package folder over 1.2GB).

Should another issue be created for this, since a large chunk of work would be having a LZMA support in the first place ?

@packadal sure.

Think about appropriate compression algorithm for binary data.

For example xz utils page says:

The core of the XZ Utils compression code is based on LZMA SDK, but it has been modified quite a lot to be suitable for XZ Utils. The primary compression algorithm is currently LZMA2, which is used inside the .xz container format. With typical files, XZ Utils create 30 % smaller output than gzip and 15 % smaller output than bzip2.

I was looking for support of .xz files for conan (some third party packages only have releases using .xz).
Not sure where i should post it. But since it is been discussed here, let's go.

Based on what i have read (from PR #698, and issues #648, #446 and this one),
looks like the status is:

  1. Python 3 has native support ( :+1: ), but conan needs to support Python2 too (what is good!)
  2. Alternatives in Python2:

    1. Use backports.lzma



      • requires xz utils


      • may have problems for automating the installation (?)



    2. Use pyliblzma



      • It is not been maintained


      • May have problems for Windows (?)



    3. Use libarchive with some python wrappers (see commentary above by @opilar )



      • needs runtime link for libarchive



Looks like all goes into linking with some native dependencies (xz utils or libarchive) and provide
a python wrapper to the build.
So, why not create a conan package to build/link this dependencies?
xz utils has even prebuilt binaries for Windows and MacOS (i did not see libarchive yet)

Also, the python wrappers could be downloaded from pip and added to the pythonpath (using the new tools).

I was planning to do that, but right now i am little busy.
If the package could not been included with conan directly, users could be instructed to
add it to the build.

P.s.: i created this gist as a workaround: https://gist.github.com/paulobrizolara/625f2cd18ff98c3a4c0795fd291f46d1

Thanks @paulobrizolara for your research and effort into this.
Yes, I think a conan package around some of the native xz utilities could be useful. That could be used to provide support for tools.unzip or equivalent, but not to the main conan packages tgz. Until very robust, builtin support for lzma can be achieved by python (both 2 and 3), it is extremely risky.

For the other use case, that would require each package downloading xz archives, to require the xz package , which might be slightly annoying, but yes, better than nothing. I am really looking forward the day that python has real native support for lmza both in py2 and 3, don't know if we will see it :(

Now that we are officially deprecating python 2, let's implement the wrapper for 7z files compression/decompression, we can raise if python 2 tries to use it.

I am ok with this, but making it very clear that this feature in python 2 won't be supported at all.

Implemented in #3197. It will be released in conan 1.6.
Only .xz will be supported, not 7z which is propietary and doesn't have adequate python support.
Also, it will only be supported in Python 3 with lzma support enabled. Python 2 will error (and Python 2 will be deprecated anyway)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mpdelbuono picture mpdelbuono  路  3Comments

Polibif picture Polibif  路  3Comments

uilianries picture uilianries  路  3Comments

tonka3000 picture tonka3000  路  3Comments

petermbauer picture petermbauer  路  3Comments