When I use fread, I get this error (NAstring <<0>> is recognized as type boolean, this is not permitted)
na.strings = c("", "-", "_", "..", "...", "--", "*", "*" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e")
If I removed "0" from na.strings, fread is not getting error.
However, cells that contains "r" or "e", is not converted to NA, and their columns are characters.
Please advise.
Please include your data.table package version & system info, and the file itself, if you can.
data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE)
#> Input contains a \n or is "". Taking this to be text input (not a filename)
#> [01] Check arguments
#> Using 12 threads (omp_get_max_threads()=12, nth=12)
#> Error in data.table::fread("a,b,c\n0,1,2", na.strings = "0", verbose = TRUE):
#> freadMain: NAstring <<0>> is recognized as type boolean, this is not permitted.
Sys.info()
sysname release version nodename
"Windows" ">= 8 x64" "build 9200" "IRT-310677-Z440"
machine login user effective_user
"x86-64" "310677" "310677" "310677"
data.table package: 1.11.4
The file is so huge, it is almost 1.44 GB
Thank @HughParsonage, darn that looks no bueno.
I guess there should be some interaction with logical01 but there doesn't appear to be.
Is it a bug?
@msgoussi yes, you can workaround it with something like
data.table::fread("a,b,c\n0,1,2")[a==0, a:=NA]
If i have 500 columns and i need to clean columns and consider the following strings , c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e"), as na. The way around will not look good. This is my opnion
you're correct, but until the bug is fixed, you can at least get rid of all
the other NA strings with fread, then only replace 0 in the code.
On Tue, Jun 12, 2018, 5:36 AM msgoussi notifications@github.com wrote:
If i have 500 columns and i need to clean colmuns and consider the
following strings , c("", "-", "_", "..", "...", "--", "**", "" ,
"n/a", "n.a.", "#VALUE!", "0", "Inf", "-Inf", "NAN", "r", "e"), as na. The
way around will not look good. This is my opnion—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/2927#issuecomment-396395084,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdRcsWHnfXUeJFvIlQ3a7WLUeMyyRks5t7uLBgaJpZM4UiP9z
.
I have an R package on Github, ribailey/gghybrid, which includes a function to read genomic data files with potentially millions of columns, and one of the main softwares for producing these input files, PLINK, always codes missing data as zero. I use fread within my function to read in the data and declare missing values. This means I can't read in the most common file type people might want to use, due to the bug described here. Has there been any progress on this? Many thanks, Richard.
Removing || strcmp(ch,"1")==0 || strcmp(ch,"0")==0 from fread.c seems like an option to "fix" this.
Only breaks 1 test case which is explicitly testing for na.strings = '1'.
Interaction with logical01 seems also legit.
data.table::fread("a,b,c\n0,1,2\n1,0,2", na.strings = "0", logical01=T)
a b c
<lgcl> <lgcl> <int>
1: NA TRUE 2
2: TRUE NA 2
I think the expected behavior is that we should not allow logical01=TRUE and na.strings = "0" at the same time. If this is agreed, I'm happy to start to file a PR for this.
Agreed @shrektan. Nor na.strings = "1" though I guess that's a pretty obscure use case.
Most helpful comment
Agreed @shrektan. Nor
na.strings = "1"though I guess that's a pretty obscure use case.