I create a toy example.
temp <- data.table(a=c(1,NA,2,3,999,NA))
I save it:
fwrite(temp, "temp.csv", quote=FALSE, sep=",", append=F)
and read it again:
my <- fread("temp.csv", stringsAsFactors=F)
As you can see only the first line is read.
I don't know if it's a problem with fread or with fwrite's output file.
library(data.table)
temp = data.table(a=c(1,NA,2,3,999,NA))
tmp = tempfile()
fwrite(temp, tmp, quote=FALSE)
system(paste('cat', tmp))
# a
# 1
#
# 2
# 3
# 999
#
There are no separators in any line, and the last line is blank.
Absent separators, fread has no way of knowing whether there are just blank lines or whether they are supposed to have missing data.
fread warns you about this:
Warning message:
In
fread("temp.csv", stringsAsFactors = F):
Stopped reading at empty line 3 but text exists afterwards (discarded): 2
One potential fix: add a dummy column:
temp[ , b := NA]
Now tmp will have separators so fread can tell which lines have data.
@MichaelChirico Side note: to cat it to the console, just use fwrite, which puts it there by default:
fwrite(temp)
# a
# 1
#
# 2
# 3
# 999
So it can be reproduced like...
fread(paste(capture.output(fwrite(temp)), collapse="\n"))
# a
# 1: 1
# Warning message:
# In fread(paste(capture.output(fwrite(temp)), collapse = "\n")) :
# Found the last consistent line but text exists afterwards (discarded): <<2>>
Yeah, I'm inclined towards saying the file should be written better if it wants blank lines read as NA (rather than reconfigure fread to treat this one-column case specially). I mean:
fread(paste(capture.output(fwrite(temp, na="NA")), collapse="\n"))
@franknarf1 nice, the default to write to stdout is an update I missed, wasn't like that initially. Matches write.table behavior :+1:
And I like your fix better, but not sure if it's the user's responsibility to handle a case like that, or if na = if (ncol(x) > 1L) '' else 'NA' as the default is a better fix
The warning message is there. And there are arguments to control it.
Using v1.10.4 on CRAN :
> fread("temp.csv")
a
1: 1
Warning message:
In fread("temp.csv") :
Stopped reading at empty line 3 but text exists afterwards (discarded): 2
> fread("temp.csv", fill=TRUE)
a
1: 1
2: NA
3: 2
4: 3
5: 999
6: NA
> fread("temp.csv", blank.lines.skip=TRUE)
a
1: 1
2: 2
3: 3
4: 999
>
Perhaps "Consider fill=TRUE and blank.lines.skip=TRUE" should be added to the warning message? (TODO1)
Also the empty lines can be controlled in fwrite with the na= argument.
> temp
a
1: 1
2: NA
3: 2
4: 3
5: 999
6: NA
> fwrite(temp, "temp.csv")
> system("more temp.csv")
a
1
2
3
999
> fwrite(temp, "temp.csv", na="NA")
> system("more temp.csv")
a
1
NA
2
3
999
NA
> fread("temp.csv")
a
1: 1
2: NA
3: 2
4: 3
5: 999
6: NA
>
I just read again and understood @MichaelChirico's comment now :
or if na = if (ncol(x) > 1L) '' else 'NA' as the default is a better fix
Yes - nice idea! Happy to make that change. (TODO2)
Closed by #2451
Great.
I've checked that it's also working well when a whole row is full of NA.
temp = data.table(a=c(1,NA,2,3,999,NA), b=c(1,NA,2,3,999,NA))
Now that fread handles the blank lines in single-column files, this change in dev can be reverted back to how it was on CRAN which is simpler and cleaner.
CRAN version has fwrite(..., na="", ...)
dev changed to fwrite(..., na = if (length(x) > 1L) "" else "NA", ...)
Most helpful comment
The warning message is there. And there are arguments to control it.
Using v1.10.4 on CRAN :
Perhaps "Consider fill=TRUE and blank.lines.skip=TRUE" should be added to the warning message? (TODO1)