zstd command-line tool should accept options from the environment

Created on 21 Nov 2018  路  9Comments  路  Source: facebook/zstd

Like gzip with $GZIP and xz with $XZ_OPT . Quoting xz's manpage:

   XZ_OPT This is for passing options to xz when it  is  not  possible  to  set  the  options
          directly  on the xz command line.  This is the case e.g. when xz is run by a script
          or tool, e.g. GNU tar(1):

                 XZ_OPT=-2v tar caf foo.tar.xz foo

          Scripts may use XZ_OPT e.g. to set script-specific default compression options.  It
          is  still recommended to allow users to override XZ_OPT if that is reasonable, e.g.
          in sh(1) scripts one may use something like this:

                 XZ_OPT=${XZ_OPT-"-7e"}
                 export XZ_OPT
feature request

Most helpful comment

Hi @Cyan4973, thanks for the great work around here.

What about setting the number of parallel threads used for compression?
export XZ_OPT="-T 0" does the trick for xz.

I see you opted for environment variables that are limited to a specific argument. Could you add a new one for the number of threads, say ZSTD_THREADS?

BTW, ZSTD_CLEVEL should be documented on the zstd man page.

All 9 comments

This is debatable.
The environment variables have been curtailed within gzip because they can be abused and become a security hazard. I don't know the details, but it's something like it becomes possible to make gzip erase files, replace files or change rights to files, all transparently, piggy-backing on user commands through environment variables.

Given that we have no strong scenario where environment variables are needed, but we already know this feature caused security risks in the past, we prefer to not go into this direction.

Understood, but this implies that GNU tar should not receive a short-hand switch to compress as .tar.zst, like -z for gzip, -j for .tar.bz2 and -J for .tar.xz. Compression should always be done with a pipeline or by using "-Izstd -19".

You mean, in order to pass a specific compression level since the short-hand will only use the default one ?
Fair point.

I believe we could consider supporting an environment variable limited to compression level only.
This would reduce opportunity to do strange things through combination of commands bundled into the environment variables.

Support for ZSTD_CLEVEL environment variable
has been added by @yijinfb in https://github.com/facebook/zstd/pull/1463 .

It makes it possible to override the default compression level.

What about using a dictionary when decompressing a tar archive?

This is not recommended.

Dictionaries are effective when shared across a lot of small data.

A tar archive is like a single big stream of bytes. Dictionary efficiency will be underwhelming, since it will save a few KB, which is likely a tiny amount compared to the archive. On the other hand, the requirement to have the dictionary available at decompression time will be a deployment challenge.

Hi @Cyan4973, thanks for the great work around here.

What about setting the number of parallel threads used for compression?
export XZ_OPT="-T 0" does the trick for xz.

I see you opted for environment variables that are limited to a specific argument. Could you add a new one for the number of threads, say ZSTD_THREADS?

BTW, ZSTD_CLEVEL should be documented on the zstd man page.

ZSTD_CLEVEL should be documented on the zstd man page.

You are right. It has been recently added to the man page (within dev branch), it should be present in next release.

What about setting the number of parallel threads used for compression?

That could be an interesting option.

Our concerns is that environment arguments can be used to silently alter program behavior, potentially leading to some form of denial, such as resource exhaustion.

When it comes to multithreading, the memory budget is basically multiplied by the number of threads. 2 threads is unlikely to be more dangerous than one, but what about 200 ?

I believe that if we can have a good answer to this risk, it could unlock this candidate feature.

Good to know it is already documented and thanks for the detailed reply.

Unfortunately I am unable to provide a good answer to the "risk" you mention because I do not perceive it as a risk. It seems to me that you are trying to protect the user from himself. If something else but the user is setting such environment variable, then he probably has much greater concerns other than resource exhaustion.

Silently altering zstd behaviour is precisely what I would like to have, in order to be able to use it comfortably from tar, as can be done with other compression programs. As is, I have to use -Izstd -T0 or -Izstdmt, similar to what @thiagomacieira mentions. Otherwise I would be wasting resources everytime I use it on my machines.

zstd is a command line tool, meaning it is mostly used by professionals or enthusiasts. Like many other professional tools, it can do bad things, but only if you use it improperly. That's why we have manuals. Limiting the flexibility of a professional tool in order to make it just a little less dangerous to the novice (that doesn't care to read the manual) is a tradeoff that I am not fond of.
(Yes, zstd will certainly be used by novice users through some sort of GUI. In that event it is up to the GUI programmer, whom is a professional or enthusiast, to make the right decisions.)

Summing up, IMHO it would suffice to add a warning to the manual stating that multi-threading will lead to memory exhaustion.

Anyway, as an half-baked measure to do damage control to the unwary user, instead of ZSTD_THREADS you could have a boolean variable, ZSTD_MT or ZSTD_MULTITHREADED, that adds -T 0 when set to true. Most machines tend to have a good balance between available CPU cores and installed memory. Seems safe to assume that -T 0 will leave most machines way bellow the swap line.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

planet36 picture planet36  路  3Comments

sergeevabc picture sergeevabc  路  3Comments

itsnotvalid picture itsnotvalid  路  3Comments

dciliske picture dciliske  路  3Comments

indygreg picture indygreg  路  3Comments