Data.table: fwrite/fread single column input with NA -vs- empty lines

Created on 7 Apr 2017  路  9Comments  路  Source: Rdatatable/data.table

I create a toy example.

temp <- data.table(a=c(1,NA,2,3,999,NA))

I save it:
fwrite(temp, "temp.csv", quote=FALSE, sep=",", append=F)

and read it again:
my <- fread("temp.csv", stringsAsFactors=F)

As you can see only the first line is read.

I don't know if it's a problem with fread or with fwrite's output file.

fread fwrite

Most helpful comment

The warning message is there. And there are arguments to control it.
Using v1.10.4 on CRAN :

> fread("temp.csv")
   a
1: 1
Warning message:
In fread("temp.csv") :
  Stopped reading at empty line 3 but text exists afterwards (discarded): 2
> fread("temp.csv", fill=TRUE)
     a
1:   1
2:  NA
3:   2
4:   3
5: 999
6:  NA
> fread("temp.csv", blank.lines.skip=TRUE)
     a
1:   1
2:   2
3:   3
4: 999
> 

Perhaps "Consider fill=TRUE and blank.lines.skip=TRUE" should be added to the warning message? (TODO1)

All 9 comments

library(data.table)
temp = data.table(a=c(1,NA,2,3,999,NA))
tmp = tempfile()
fwrite(temp, tmp, quote=FALSE)
system(paste('cat', tmp))
# a
# 1
# 
# 2
# 3
# 999
# 

There are no separators in any line, and the last line is blank.

Absent separators, fread has no way of knowing whether there are just blank lines or whether they are supposed to have missing data.

fread warns you about this:

Warning message:

In fread("temp.csv", stringsAsFactors = F) :
Stopped reading at empty line 3 but text exists afterwards (discarded): 2

One potential fix: add a dummy column:

temp[ , b := NA]

Now tmp will have separators so fread can tell which lines have data.

@MichaelChirico Side note: to cat it to the console, just use fwrite, which puts it there by default:

fwrite(temp)
# a
# 1
# 
# 2
# 3
# 999

So it can be reproduced like...

fread(paste(capture.output(fwrite(temp)), collapse="\n"))

#    a
# 1: 1
# Warning message:
# In fread(paste(capture.output(fwrite(temp)), collapse = "\n")) :
#   Found the last consistent line but text exists afterwards (discarded): <<2>>

Yeah, I'm inclined towards saying the file should be written better if it wants blank lines read as NA (rather than reconfigure fread to treat this one-column case specially). I mean:

fread(paste(capture.output(fwrite(temp, na="NA")), collapse="\n"))

@franknarf1 nice, the default to write to stdout is an update I missed, wasn't like that initially. Matches write.table behavior :+1:

And I like your fix better, but not sure if it's the user's responsibility to handle a case like that, or if na = if (ncol(x) > 1L) '' else 'NA' as the default is a better fix

The warning message is there. And there are arguments to control it.
Using v1.10.4 on CRAN :

> fread("temp.csv")
   a
1: 1
Warning message:
In fread("temp.csv") :
  Stopped reading at empty line 3 but text exists afterwards (discarded): 2
> fread("temp.csv", fill=TRUE)
     a
1:   1
2:  NA
3:   2
4:   3
5: 999
6:  NA
> fread("temp.csv", blank.lines.skip=TRUE)
     a
1:   1
2:   2
3:   3
4: 999
> 

Perhaps "Consider fill=TRUE and blank.lines.skip=TRUE" should be added to the warning message? (TODO1)

Also the empty lines can be controlled in fwrite with the na= argument.

> temp
     a
1:   1
2:  NA
3:   2
4:   3
5: 999
6:  NA
> fwrite(temp, "temp.csv")
> system("more temp.csv")
a
1

2
3
999

> fwrite(temp, "temp.csv", na="NA")
> system("more temp.csv")
a
1
NA
2
3
999
NA
> fread("temp.csv")
     a
1:   1
2:  NA
3:   2
4:   3
5: 999
6:  NA
> 

I just read again and understood @MichaelChirico's comment now :

or if na = if (ncol(x) > 1L) '' else 'NA' as the default is a better fix

Yes - nice idea! Happy to make that change. (TODO2)

Closed by #2451

Great.
I've checked that it's also working well when a whole row is full of NA.

temp = data.table(a=c(1,NA,2,3,999,NA), b=c(1,NA,2,3,999,NA))

Now that fread handles the blank lines in single-column files, this change in dev can be reverted back to how it was on CRAN which is simpler and cleaner.
CRAN version has fwrite(..., na="", ...)
dev changed to fwrite(..., na = if (length(x) > 1L) "" else "NA", ...)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

arunsrinivasan picture arunsrinivasan  路  3Comments

st-pasha picture st-pasha  路  3Comments

franknarf1 picture franknarf1  路  3Comments

st-pasha picture st-pasha  路  3Comments

tcederquist picture tcederquist  路  3Comments