Archives automatically generated by GIT (eg https://github.com/symfony/symfony/archive/v3.0.6.zip ) contain a copy of the repository.
These archives may contain tests/examples/docs that are not necessary on production machines (and could potentially lead to security issues).
Developers may tell git to exclude files/folders via an export-ignore option in the .gitattributes file.
BTW, someone may still want docs/tests included in packages during the development phase (that's for instance the main reason why big projects like Symfony are keeping tests/docs in packages).
What about improving Composer to handle this problem?
I mean, what I'm thinking about is to add something like this to to composer.json:
{
"name": "symfony/class-loader",
"type": "library",
"devFiles": [
"/Tests",
"/CHANGELOG.md",
"/phpunit.xml.dist",
".git*"
]
}
We could then add to composer a new option (eg --no-dev-files): if this option is specified, Composer should delete the files/directories that satisfy the rules specified in the devFiles section of composer.json.
This approach would make everyone happy...
First of all, composer should not be run on production (note, should not, not cannot).
Second, there have been many requests for options similar to .gitattributes export-ignore, etc. Composer is not going to provide this. This should be a part of your packaging process/job.
@alcohol @kamilsk I think you misunderstood what I mean.
Let me explain the origin of this issue.
concrete5 uses a lot of composer packages. When building a new release archive to be downloaded by end-users, we run a composer install.
Doing so, we have a lot of unnecessary files (tests, docs, examples, ...) that increase the size of the archive, and it may results in some problems (security issues, version disclosures, ...).
So, since currently there's no automatic way to determine the production-only files, we need to manually parse all the files and directories of all the composer packages, trying to understand which are the unnecessary ones (see our cleanup-script).
This manual operation must be done every time a composer package gets updated.
Furthermore this is somehow dangerous: we need to deeply analyze every package, to determine the files to be removed.
Ideally, package maintainers could add export-ignore to the files that are not required for production, but many package developers don't want to do it because using git clone may be really slow and because they want tests to be run even with a composer install.
So, at the moment there's basically no way to tell which are the production-only files.
What I'm suggesting here is a solution of this problem.
Do you see any alternative?
@mlocati, you can create plugin and use extra for this purpose. In any case, you will request the vendors of the packages to add a list of dev files. Whether it's devFiles or extra:devfiles, this list will need to fill out.
you can create plugin and use extra for this purpose
I think that if composer would officially adopt a solution like this, the maintainers of all the packages would be more positive about a request to keep such devFiles entries updated.
That's an issue that many have... For instance, here's a list of Symphony-related stuff:
As you can see, it's a very recurring issue, and an "official" solution could be appreciated by a lot of people...
It's only an issue to people making an issue out of it :-)
It's only an issue to people making an issue out of it :-)
...as everything :wink:
One NoToast exception may be fake, many of them could be a signal that something should be taken in account
I think that if composer would officially adopt a solution like this
I don't think so. After that we have two options to resolve similar problem: .gitattributes and composer.json:devFiles. But first like "pre-hook" (remove before archiving), and second like "post-hook" (remove after delivering).
Do you know that the symfony maintainers ready to support this?
After that we have two options to resolve similar problem: .gitattributes and composer.json:devFiles. But first like "pre-hook" (remove before archiving), and second like "post-hook" (remove after delivering).
.gitattributes has already been taken in account, and discarded.
Do you know that the symfony maintainers ready to support this?
I don't know... Let's see what @fabpot @webmozart @nicolas-grekas @Tobion think about this devFiles alternative to .gitattributes...
PS: composer itself too has the need to strip out development stuff (see here)... In the case of composer it's easy to keep the list of production-useless stuff up to date.
In other projects it's quite hard (for concrete5 we have 100+ composer packages).
@mlocati but that is part of our packaging process/job (or "compile" step, if you prefer). It is maintained by us. This is not a Composer feature :-)
@mlocati but that is part of our packaging process/job (or "compile" step, if you prefer). It is maintained by us. This is not a Composer feature :-)
@alcohol That's the problem that my proposal tries to fix!
You (composer) have to keep the list of stuff to be excluded from .phar, I have to keep the list of stuff to be removed from our distribution .zip archive, ... It's a problem that every project has (or should have) when generating the final distribution files.
Imagine if you simply use something like php bin/composer install -q --no-dev --remove-dev here: it would simplify the packaging process a lot.
In your case it's simple (include .php files only, exclude tests dirs), in other big projects it's a hard, manual, recurring, time-expensive and error-proning process.
@mlocati what problem? I don't see a problem.
Composer has nothing to do with our compile/build step.
@mlocati what problem? I don't see a problem.
As I described above, the problem is to strip out files that are not needed for the production. When building composer, it's an easy step (keep only .php files, remove test dirs). For big projects it's a problem: we need to parse every package and determine the list of production-useless stuff.
Composer has nothing to do with our compile/build step.
In your compile step you have to keep the list of stuff to be excluded from .phar.
For concrete5 I have to keep the list of stuff to be removed from our distribution .zip archive.
What I'm suggesting here is a way to automatize this hard, manual, recurring, time-expensive and error-proning process.
I think you misunderstood. I don't consider it a problem, period.
Also, implementing your proposal in Composer would not change our compile/build process at all since we do not use composer for this.
I can understand that your use-case is more complicated. But I simply do not agree that this is actually an issue that warrants attention/time. Storage space is negligible in this day and age. The cost (time, resources) it would require to implement your suggestion (and get everyone onboard to use it) would far exceed any gains it might provide.
Anyway, unsubscribing from this topic as I (personally) have zero further interest in discussing this.
we do not use composer for this.
You do: https://github.com/composer/composer/blob/1.1.1/bin/compile#L14
@mlocati IMO if someone builds a plugin that works well it could well become a community standard that most packages accept and just add their data in the extra key.. We can't solve everything in composer itself, because not everybody agrees how things should be done, and because our time to maintain things is finite.
@Seldaek, @mlocati I can try
Potential syntax looks like this https://github.com/kamilsk/Common/blob/3.x/composer.yml#L75-L78
Project will be located here https://github.com/octolab/Cleaner/blob/master/package.meta
@mlocati, any suggestions?
@Seldaek Thanks for your clean and wise answer
@kamilsk Great!
For the dev-files option I'd adopt a pattern syntax compatible with the one of .gitattributes. It's well known and people that will want to switch from .gitattributes/export-ignore to composer.json/dev-files will have to do less work.
There's one more improvement I'm thinking of.
Let's take for instance doctrine/dbal.
It may be used via command line (executing the command in its bin directory) as well as a PHP-library only.
In some projects, its command line version may not be used (so it could be removed), in other projects this CLI command is required.
What about adding some details about the files that could be removed?
I mean, something like this:
{
"devFiles": {
"docs": ["/CHANGELOG.md"],
"tests": ["/Tests", "/phpunit.xml.dist"],
"bin": ["/bin"],
"others": ".git*"
}
}
That way, people that may want to keep the "bin" could delete all the files except the "bin" ones.
PS: It's just a raw and odd idea :wink:
@mlocati, I think you can close this issue, I open related here https://github.com/octolab/Cleaner/issues/1 and try to complete all works on MVP until 31 May https://github.com/octolab/Cleaner/milestones
@mlocati, I think you can close this issue
@alcohol closed this (even before understanding the issue)
yep, my mistake
@mlocati I simply do not agree it is an issue. Please do not speak on my behalf. :-)
@mlocati I simply do not agree it is an issue.
Maybe we have some language barrier here (obviously on my side): for me a GitHub "issue" is not the same as "bug", it's also something that's worth discussing about.
Please do not speak on my behalf. :-)
Sorry if I somehow offended you. The problem is that seeing an issue closed before a deeper discussion (like we've had after you closed this issue) is somehow hurting. It's like if you told me "shut out, don't say stupid things". GitHub is a collaborative place, where we all (love to) speak about software and the ways to improve it, isn't it?
@mlocati I am simply saying that I do not agree that there is a "problem" to solve here. Composer is a dependency manager. Not a packaging tool.
@alcohol imagine
and it is really composer-related, because authors of composer packages have to participate to make it working.
Imagine that could do this just as well as the final step in your continuous integration pipeline (either as the last task in your main job, or a separate job). Now what if I told you that you do not have to imagine this, but that it is already possible?!
PROFIT
@alcohol could you please clarify (provide a link or keyword), how exactly it is possible as a part of integration pipeline?
Pretty sure you can run a simple set of find X --delete recursively. After cleanup, create your deployable archive (tar, gz, etc) and voila, it's as small as you aim it to be.
find X --delete
What should we use for X?
@alcohol Yes it could work, but only if i know X. I do know X for my own codebase, but i dont know and i can not know X for my 9000 dependencies. That is exactly the purpose of this feature request!
Le sigh. I'm unsubscribing from this thread.
+1 for this feature.
@dan-da, you can contribute to octolab/cleaner instead of continuing this thread. The composer is flexible and can be extended by plugins - thanks to the authors.
I can see @alcohol stated numerously that he didn't consider this issue a real issue, without 0 explaination to why he thinks so :/
The only real argument I found was:
Composer is a dependency manager. Not a packaging tool.
Yes, I agree. So why can't dependencies' authors decide what's part of a "dependecy", and what parts are unrelated noice that's required inside VCS for other reasons?
Because it's just that, noise. If your OCD can't handle that, then deal with it in your packaging process. But there is no reason for the Composer team to waste time on building nor maintaining a feature that provides zero actual gain.
@alcohol Ok, perhaps a map of excluded files is indeed hard to implement, test and maintain, I can agree on that.
But how about, instead of specifying a map of excluded files, let's specify a baseDirectory in composer.json. We could set the directory to src, and composer would only download that one folder, and ignore the rest. That sounds like an easy feature to implement, right? Certainly simpler than a map of exluded files.
We could set the directory to
src, and composer would only download that one folder, and ignore the rest. That sounds like an easy feature to implement, right? Certainly simpler than a map of exluded files.
How do you download only a subfolder ? If you use source install, you cannot do it (well, with SVN you can do it, but most of the packages are using Git today, not SVN, and Git cannot do it). And for dist install, we can only download what the dist archive contains. We cannot magically download a trimmed-down archive (and most open-source packages rely on downloads endpoints of Github and Gitlab for instance, which don't have such sub-folder download endpoints)
@stof There is git clone --filter flag, that allows for downloading only a specified directory.
For dist install, we can handle it with .gitattributes, so for dist, composer could still download whole package.
@Danon and supporting gitattributes or no is not something done by composer at all, but by the packaging tool (github respects it for instance when building the download archive, and many packages already uses it)
It all comes down to value in the end. There is no value here. This discussion itself has already cost more than you would ever gain from it.
@stof So why can't we have baseDirectory that composer could use, utilizing git clone --filter, to install only one folder in the dependency downloaders?
@Danon what's wrong with https://github.com/octolab/cleaner? Looks exactly what we are looking for.
Note about your baseDirectory proposal - sadly the usage of it would be very limited, because a huge amount of libraries, frameworks, etc. use more than one folder which can not be deleted on prod (folders like src, config, templates, etc.) So, baseDirectory does not exist most of the time.
@Danon what's wrong with https://github.com/octolab/cleaner? Looks exactly what we are looking for.
@ivastly I don't care about my local or hosted environment, i could delete noise files on my computer with ease 馃棥
I care about users of my dependencies. I have test, ReadMe.md, other unrelated files in my VCS, and users of my package will have those files downloaded by composer.
I don't want to tell users to use octolab/cleaner or to add --prefer-dist when require'ing. I would just like to specify myself what parts should be in packagist, and what not.
Note about your
baseDirectoryproposal - sadly the usage of it would be very limited, because a huge amount of libraries, frameworks, etc. use more than one folder which can not be deleted on prod (folders like src, config, templates, etc.) So,baseDirectorydoes not exist most of the time.
True, it wouldn't, but at least it would work for some projects (ones that have a single directory).
PS: I'm sure you could use git clone --filter to clone more than one directory, so perhaps composer could allow for a flat array of folders that should be downloaded?
I would just like to specify myself what parts should be in packagist
Packagist does not store any of your files, only metadata.
You could remove these "noise" files from your repository in your tags. That way, no matter if the user downloads dist or source, they will not get these "noise" files.
But seriously, as I said before, there is absolutely no value here to be gained. The amount of time spend discussing this has cost more in terms of $$ than you could ever gain back from removing these files.
You could remove these "noise" files from your repository in your tags
Or you can add those files in your .gitattributes files (marking them with export-ignore)
You could remove these "noise" files from your repository in your tags
Or you can add those files in your
.gitattributesfiles (marking them withexport-ignore)
That requires users to use --pref-dist, and not everyone knows that
@Danon using dists is the default behavior of composer for releases (only dev branches are installed from source by default).
Oh.
Nice!
What does --pref-dist flag do then?
@mlocati okay, export-ignore sounds good, so why you have created the issue then? 馃槂 What is missing in export-ignore approach, which you want to implement in composer? Is it advanced regex syntax?
What is missing in
export-ignoreapproach
@ivastly Someone may need all the files in the git repository without using git clone (which is the only way to get the whole archive contents in case you use export-ignore in .gitattributes) - see for example https://github.com/maxmind/MaxMind-DB-Reader-php/issues/55
Exists several plugins to remove unwanted files, but none works very universally. That's why I started to develop new plugin https://github.com/liborm85/composer-vendor-cleaner , that cleans files and directories in vendor directory by glob pattern syntax.
Most helpful comment
@alcohol Yes it could work, but only if i know
X. I do knowXfor my own codebase, but i dont know and i can not knowXfor my 9000 dependencies. That is exactly the purpose of this feature request!