Debian's dpkg-source has a nice feature when unpacking upstream tarballs to ensure that the target directory always has a specific name. It will do the following:
The following situations exist then:
Archive contains a directory tree in the following form:
dir/
dir/CMakeLists.txt
dir/sourcefile.c
Calling dpkg-source -x would create a directory of the expected name (in Debian's case package-version) and that directory would contain the CMakeLists.txt file and the sourcefile.c file - regardless of the name dir/ within the archive.
Archive contains a directory tree in the following form:
CMakeLists.txt
sourcefile.c
Calling dpkg-source -x would create the same result as before, even though the archive doesn't contain a subdirectory.
I think something like that would be interesting for conan, because I see the following pattern very often in recipies:
_source_subfolder (or something otherwise deterministic)Since the rename step requires the user to know the directory name within the source archive, and projects might change that name between releases, this can add quite a bit of complexity to the conan file.
Since this changes the behavior of unzip / get I would suggest adding a keyword argument autodir (default False) that introduces this new behavior.
Example usage could be:
tools.get(url, sha256=hash, destination='_source_subfolder', autodir=True)
This would ensure that the contents of the downloaded archive is available in the directory _source_subfolder according to the described algorithm, regardless of how the archive was organized internally.
I could implement this and provide a pull request if this is something you'd be interested in adding to conan.
Hi! I've seen this behavior on desktop tools too, and it is useful in general terms, but for an automated tool... I think it is not so annoying to have that _extra_ directory. I would like to hear more feedback from the community that is contributing with recipes to Conan Center to know if it would be useful or they will still rename the directory to _source_subfolder (cc/ @conan-io/community-leaders ).
If it is really useful, as an opt-in, it won't break anything and we can consider including it.
Thanks for the suggestion!
Just fyi, because I feel that I probably wasn't completely clear in my initial posting based on your response: the main issue I see is not the fact that there might be an extra directory (there'll be an extra directory anyway, even with this scheme), but that the name of the extra directory has to be constructed every time from the package version, adding a lot of boiler-plate code - especially if upstream doesn't follow the packagename-version scheme. (Most packages do, but some don't.)
Anyway, thanks for considering this, and as I said, I'd be willing to implement this if you approve.
I see, it is about providing a destination that is the final directory, so the result of.
tools.get(url='https://.../lib-2.0.tar.gz', destination='_source_subfolder',...)
would be:
.../_souce_subfolder/
├───CMakeLists.txt
├───sourcefile.c
instead of:
.../_souce_subfolder/lib-2.0/
├───CMakeLists.txt
├───sourcefile.c
From the UX perspective, I find it a useful feature, although the tarfile library, which is the one we are using to unzip the files, doesn't have that feature available (docs here):
TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Extract all members from the archive to the current working directory or directory path.
As a Conan developer, I prefer to keep things simple and, whenever it is possible, to keep the same signature as the underlying functions we are calling instead of adding a layer of functionality on top of it. That's my only concern, but I'll tag this as a feature request, and share it with the team, so we can prioritize it. Nevertheless, if you want to contribute to Conan and you create the PR we will consider it.
👍 @chris-se, waiting for your PR, if you need help writing the feature or the tests, just ping me. Thanks!
Thanks! I'll start working on it.
thanks for the suggestion. for the time being, I was thinking about the same. I have simple implementation (not for conan, for my own scripts):
def extract(filename):
if filename.endswith(".zip"):
f = zipfile.ZipFile(filename, 'r')
names = f.namelist()
else:
f = tarfile.open(filename, 'r')
names = f.getnames()
dstdir = TMP
dirname = os.path.commonprefix(names)
if not dirname:
dirname = os.path.splitext(filename)[0]
dstdir = os.path.join(TMP, dirname)
dirname = os.path.join(TMP, dirname)
if os.path.isdir(dirname):
logging.info('extracting "%s" into "%s"... (cached)' % (filename, dirname))
else:
logging.info('extracting "%s" into "%s"...' % (filename, dirname))
f.extractall(path=dstdir)
return dirname
It'd be really great to have that for the CCI. Some archive names are harder to decipher, like for example ones from direct commit hash links - you either pass the hash to the recipe or glob downloaded files, both ways are inconvenient.
It would be awesome to have that!
As there has not been progress here, and we want to release 1.23, removing the milestone. When we have something to review, we will plan the release to be merged in. Thanks!
I'm a big fan of this. In Conan-Center there are several recipes that are doing tricks in the source() method like creating a self._source_folder and carrying that subfolder in all the recipe. And the only reason is that the zip subfolder (something like whatever_1.2.3) is renamed to self._source_folder and that is the way to skip the extract subfolder for all the contents.
I'm working on a PR to introduce a new argument for the unzipping that will cause that the directory containing all the files is removed and all the contents moved to the parent.
I detected also this pain while working on a POC of the recipe layouts here: https://github.com/conan-io/conan/pull/6261 where we aim to avoid this "subfolders" tricks by declaring in the recipe the relevant folders.
https://github.com/conan-io/conan/pull/8208 implemented strip_root argument in tools.unzip(), tools.get(), etc, and it will be released in Conan 1.33.
Most helpful comment
I'm a big fan of this. In Conan-Center there are several recipes that are doing tricks in the
source()method like creating aself._source_folderand carrying that subfolder in all the recipe. And the only reason is that the zip subfolder (something likewhatever_1.2.3) is renamed toself._source_folderand that is the way to skip the extract subfolder for all the contents.I'm working on a PR to introduce a new argument for the unzipping that will cause that the directory containing all the files is removed and all the contents moved to the parent.
I detected also this pain while working on a POC of the recipe layouts here: https://github.com/conan-io/conan/pull/6261 where we aim to avoid this "subfolders" tricks by declaring in the recipe the relevant folders.