Data.table: fread input random lines. Sampling

Created on 18 Mar 2015  路  6Comments  路  Source: Rdatatable/data.table

Hello.
It would be great if data.table's function fread (or other import functions) were able to read random rows from a file, I think this is also called "sampling" lines (nonconsecutive rows).

As far as I know it can only read a single contiguous range.

There could be two options:

  • The user could pass it a vector of numbers specifying the lines to be read.
  • Or, the easiest way, just leave fread to read the rows he wants, for example using uniform random number generator to choose them.

Reading a file in this way would allow the user to have a quick glimpse of very big files.

duplicate fread

Most helpful comment

The shuf command already allows for this.

fread("shuf -n 5 iris.csv")

All 6 comments

The shuf command already allows for this.

fread("shuf -n 5 iris.csv")

+1

@mgahan
I get this error:
'shuf' is not recognized as an internal or external command,

Maybe that's only available on linux. Isn't it?

Good point @skanskan. I am not sure how to do this on Windows.

That's why I think it could be implemented directly on fread, I think it shouldn't be difficult.

This is basically #583 in disguise.

Was this page helpful?
0 / 5 - 0 ratings