from the help:
read(s::IO, nb=typemax(Int))
Read at most nb bytes from s, returning a Vector{UInt8} of the bytes read.
but passing typemax(Int) as second argument actually tries to allocate that memory and fails. So the one-argument reads does something other than advertised in the help.
I would like to use typemax(Int) as a default argument for my own function, and pass it to read. read(s, nb) throws an OutOfMemoryError, so I have to test and use the 1-arg read(s) instead.
It should allocate 1024 bytes if you pass typemax(Int)
: https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/io.jl#L840. What Julia version are you using?
version 1.1
I am reading a 260MB file.
Can you post the exact code you run?
It happens in iostream.jl not io.jl
io = open(parsed_args[:fileName])
read(io, typemax(Int))
ERROR: OutOfMemoryError()
Stacktrace:
[1] Type at ./boot.jl:402 [inlined]
[2] #read#314(::Bool, ::Function, ::IOStream, ::Int64) at ./iostream.jl:506
[3] read(::IOStream, ::Int64) at ./iostream.jl:506
[4] top-level scope at none:0
OK, I've found the culprit:
https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/iostream.jl#L511-L512
We should probably just do b = Vector{UInt8}(undef, nb == typemax(Int) ? 1024 : nb)
as in the other method.
How can we write a test covering this? Is it OK to create a temporary file? Also we should ensure we don't file two PRs at the same time (please go ahead if you want to). :-)
Actually the situation looks more complex that we thought. When all=false
(and readbytes_some!
therefore is called), the docstring says a single read call is performed. AFAICT that means that we have to allocate a buffer large enough to fit all the data that could be read into it. I though bytesavailable
would give me that information, but it returns 0 in a simple test on a file. So is there a solution? I'm really not familiar with the I/O code.
I think we should remove typemax(Int)
from the doc string. It seems to me to have originally been intended as an implementation detail (a way to detect when the second argument wasn't passed). It's not realistically possible to (1) promise a single read call, (2) of unlimited size, (3) without allocating a buffer first.
I agree, but what should we do when all=false
? We still need to allocate a buffer of some size, based either on information about the stream (bytesavailable
or something similar which works), or choose an arbitrary limit.
For all=false
the user has to provide a buffer, or a size in which case we allocate a buffer of that size. For files I suppose you could use the file size, but that doesn't generalize to all streams.
Good point, I had missed that.
But removing nb=typemax(Int)
from the docstring doesn't sound desirable: as noted in the OP, it's useful when writing functions that call read
to have a good default to pass directly, instead of doing a check and calling the method without nb
if the argument was left to its default. If we don't like typemax
we could use nothing
by default. But typemax(Int)
sounds fine here since it really means "as many bytes as possible" and clearly allocating a vector of that length doesn't make sense.
See https://github.com/JuliaLang/julia/pull/30984 for the minimal changes.
Most helpful comment
Looks like https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/iostream.jl#L512 should have the same check as in https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/io.jl#L840