julia> english = readdlm(IOBuffer(readstring("/usr/share/dict/words")), ' ')
235886×1 Array{Any,2}:
"A"
"a"
"aa"
"aal"
"aalii"
"aam"
"Aani"
"aardvark"
"aardwolf"
"Aaron"
"Aaronic"
"Aaronical"
"Aaronite"
"Aaronitic"
"Aaru"
"Ab"
"aba"
"Ababdeh"
"Ababua"
"abac"
"abaca"
"abacate"
"abacay"
â‹®
"zymoscope"
"zymosimeter"
"zymosis"
"zymosterol"
"zymosthenic"
"zymotechnic"
"zymotechnical"
"zymotechnics"
"zymotechny"
"zymotic"
"zymotically"
"zymotize"
"zymotoxic"
"zymurgy"
"Zyrenian"
"Zyrian"
"Zyryan"
"zythem"
"Zythia"
"zythum"
"Zyzomys"
"Zyzzogeton"
julia> english_lower = map(lowercase, english);
ERROR: MethodError: no method matching lowercase(::Bool)
in collect_to!(::Array{String,2}, ::Base.Generator{Array{Any,2},Base.#lowercase}, ::Int64, ::Int64) at ./array.jl:267
in _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},Base.#lowercase}, ::Base.EltypeUnknown, ::Base.HasShape) at ./array.jl:249
in map(::Function, ::Array{Any,2}) at ./abstractarray.jl:1131
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
julia> Set(typeof(x) for x in english)
Set(Any[Bool,SubString{String},Float64])
julia> english_str = map(strip, split(readstring("/usr/share/dict/words")));
julia> foreach(english_str) do x
try
parse(Float64, x)
println(x)
end
end
infinity
Nan
nan
What did you expect from can_parse_float
? The words that can be parsed as Float64
s are already Float64
s
julia> filter(t -> isa(t, Float64), english)
3-element Array{Any,1}:
Inf
NaN
NaN
and the second argument in parse should be a string.
julia> parse(Float64, NaN)
ERROR: MethodError: no method matching parse(::Type{Float64}, ::Float64)
Closest candidates are:
parse{T<:AbstractFloat}(::Type{T<:AbstractFloat}, ::AbstractString)
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
Yes, I am not thinking this early. I updated the original comment. I feel that this should return a list of strings by default (would be what I would expect) but this is debatable because Inf / NaN / true / false
are in the dictionary.
I think every other DLM reader I've encountered will first check if a column is entirely boolean/numeric, and if not will just return strings for each cell. Returning a hybrid of both is weird.
Solution: delete readdlm
.
@tanmaykm said he is working on a better readdlm
. And deleting it is not a solution, unless you have something better to offer.
How is deleting it not a solution? Both DataFrames and CSV.jl already offer better solutions to this.
Even if the code that's currently in base is going to be improved, there's not much reason for it to remain in this repository and coupled to the release schedule of the core language and compiler.
💯 CSV parsing is complicated enough that it should be moved out of the base language.
Yes, CSV parsing is complicated, but dlm parsing is not. The current Julia repo has a bunch of other things that make things convenient for users of such systems - and I don't see why reading simple delimited files is the thing that has to be booted. Why do I have to install a package to load a simple file?
This implementation is a nightmare and it's awful to maintain when we change things in Base.
The fact that we have readdlm
in base leads to also having readcsv
in base, which leads to users, naïvely (the fools!) actually trying to use it to read CSV files, only to be told, "don't do that, you should use DataFrames or CSV". That is a ludicrous situation.
I think we should have readdlm only, and have it only support simple
delimited files, not csv, and remove the readcsv function from Base.
readcsv
is only 2 lines, and I am fine with removing them to avoid the confusion.
readcsv(io; opts...) = readdlm(io, ','; opts...)
readcsv(io, T::Type; opts...) = readdlm(io, ',', T; opts...)
The implementation is a nightmare for good reason. I think it will simplify over time as the compiler gets better, and by being strict about the files that it accepts.
+1 for removing. Apart from threads by people complaining about poor performance and lack of options of this function, Julia has been criticized for having a too big standard library. readdlm
is a typical case as it's not even in a submodule which could eventually become a separate package.
Yes, I think we should deprecate readcsv
right away, and plan to move readdlm into the "standard library" when that process happens, hopefully early in the 0.6 cycle.
reading in simple homogeneous (typed) text files is really handy esp. for testing so delegating this job to CSV.jl or similar package seems like overkill. I think that readdlm
should be similar to Numpy's loadtxt
. It handles the simple cases _reliably_ by specifying the input type up front and punting on any input that is hard. I would rather have something simple that works reliably than something super complicated and magical that does not.
I would be fine with having something simple, fast and reliable in Base; that is not what we have atm.
Most helpful comment
Even if the code that's currently in base is going to be improved, there's not much reason for it to remain in this repository and coupled to the release schedule of the core language and compiler.