Hello.
It would be great if data.table's function fread (or other import functions) were able to read random rows from a file, I think this is also called "sampling" lines (nonconsecutive rows).
As far as I know it can only read a single contiguous range.
There could be two options:
Reading a file in this way would allow the user to have a quick glimpse of very big files.
The shuf command already allows for this.
fread("shuf -n 5 iris.csv")
+1
@mgahan
I get this error:
'shuf' is not recognized as an internal or external command,
Maybe that's only available on linux. Isn't it?
Good point @skanskan. I am not sure how to do this on Windows.
That's why I think it could be implemented directly on fread, I think it shouldn't be difficult.
This is basically #583 in disguise.
Most helpful comment
The
shufcommand already allows for this.fread("shuf -n 5 iris.csv")