Data.table: fread may lead to vulnerabilities

Created on 12 May 2016  Â·  11Comments  Â·  Source: Rdatatable/data.table

If the argument to fread is a path, it opens the file and reads it. If the argument has a space, it interprets it as a system command and runs it.

I am concerned that filenames are the kind of information that people pass in from websites. As Rserve and shiny are being used more widely, fread may open up a vulnerability. By passing in a filename with a space, fread can be triggered to switch from reading in files to running arbitrary code, including deleting files. This is reminiscent of imagetragick, which allowed a crafted image upload to cause arbitrary code execution.

For this to be an issue, there would need to be a way to get a string from a web form or API to fread. I have no idea how common this may be. And a careful person might protect against this. Nevertheless, this handy fread behaviour creates a potential vulnerability. Ideally reading files and executing code are kept clearly separate.

fread

Most helpful comment

by default only allow a restricted list of safe commands e.g. grep, cut, gunzip etc.

@mattdowle @jangorecki I understand the appeal of this, but it seems like it would be a shame to give up the flexibility that fread currently provides. For example, combining fread with the aws command line interface allows users to read data into R straight from Amazon S3 (not many languages can do this).

dat <- fread(paste0("aws s3 cp s3://model-data/data.csv.gz - | gunzip"))

This is pretty awsome IMHO.

All 11 comments

Here is an example of this kind of thing in the wild (well, in github code)

Adding new argument file which accept only file path seems pretty easy and sufficient solution.

Interesting. Different ways to do it then :
fread(file="rm -rf *") would fail with 'file not found'. (First argument is called input=, not used here.)
fread("rm -rf *", file=TRUE) where file='auto' by default
and/or
freadFile("rm -rf *") could call either of above
and/or
options(datatable.webSafeMode = TRUE) would turn off running system commands from fread.
and/or
by default only allow a restricted list of safe commands e.g. grep, cut, gunzip etc.

by default only allow a restricted list of safe commands e.g. grep, cut, gunzip etc.

I like it, could be well managed as option, so any application can add own if needed, or set to NULL to disable system calls, and allow file path only.

by default only allow a restricted list of safe commands e.g. grep, cut, gunzip etc.

@mattdowle @jangorecki I understand the appeal of this, but it seems like it would be a shame to give up the flexibility that fread currently provides. For example, combining fread with the aws command line interface allows users to read data into R straight from Amazon S3 (not many languages can do this).

dat <- fread(paste0("aws s3 cp s3://model-data/data.csv.gz - | gunzip"))

This is pretty awsome IMHO.

I agree with @mgahan - restricting it by default would have a severe negative impact for me. I do all kinds of crazy things now with fread and shell commands.

I like the idea of adding an argument which means that fread will only read
files (rather than evaluate system commands). This would be compatible with
other uses of fread, but would provide a way of protecting against
malicious input in a context where fread was processing user input.

Of course, the person writing the code would need to know that they should
this argument ...

On 25 May 2016 at 08:49, eduard [email protected] wrote:

I agree with @mgahan https://github.com/mgahan - restricting it by
default would have a severe negative impact for me. I do all kinds of crazy
things now with fread and shell commands.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/Rdatatable/data.table/issues/1702#issuecomment-221385924

Apart from the security concerns mentioned here, I would also like to see the distinction between input and file args, so I can give a one-line character input without it being interpreted as a command-line call.

I really like sprintf api, i.e.

fread(sprintf("hadoop fs -cat %s", hdfs.url))

It has nicer api for more complex substitution in piping string. Maybe that could be handled somewhere along current options?

@jangorecki perhaps put more emphasis in the man page about why this is useful/when developers should strictly be using file instead of input

Was this page helpful?
0 / 5 - 0 ratings