Julia: read(io, typemax(Int)) throws OutOfMemoryError, read(io) does not

Created on 6 Feb 2019  路  13Comments  路  Source: JuliaLang/julia

from the help:

read(s::IO, nb=typemax(Int))
Read at most nb bytes from s, returning a Vector{UInt8} of the bytes read.

but passing typemax(Int) as second argument actually tries to allocate that memory and fails. So the one-argument reads does something other than advertised in the help.

I would like to use typemax(Int) as a default argument for my own function, and pass it to read. read(s, nb) throws an OutOfMemoryError, so I have to test and use the 1-arg read(s) instead.

O

All 13 comments

It should allocate 1024 bytes if you pass typemax(Int): https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/io.jl#L840. What Julia version are you using?

version 1.1
I am reading a 260MB file.

Can you post the exact code you run?

It happens in iostream.jl not io.jl

io = open(parsed_args[:fileName])
read(io, typemax(Int))
ERROR: OutOfMemoryError()
Stacktrace:
 [1] Type at ./boot.jl:402 [inlined]
 [2] #read#314(::Bool, ::Function, ::IOStream, ::Int64) at ./iostream.jl:506
 [3] read(::IOStream, ::Int64) at ./iostream.jl:506
 [4] top-level scope at none:0

OK, I've found the culprit:
https://github.com/JuliaLang/julia/blob/9562bdfc8bc4ef09426b697ee8e29d06e1643575/base/iostream.jl#L511-L512

We should probably just do b = Vector{UInt8}(undef, nb == typemax(Int) ? 1024 : nb) as in the other method.

How can we write a test covering this? Is it OK to create a temporary file? Also we should ensure we don't file two PRs at the same time (please go ahead if you want to). :-)

Actually the situation looks more complex that we thought. When all=false (and readbytes_some! therefore is called), the docstring says a single read call is performed. AFAICT that means that we have to allocate a buffer large enough to fit all the data that could be read into it. I though bytesavailable would give me that information, but it returns 0 in a simple test on a file. So is there a solution? I'm really not familiar with the I/O code.

I think we should remove typemax(Int) from the doc string. It seems to me to have originally been intended as an implementation detail (a way to detect when the second argument wasn't passed). It's not realistically possible to (1) promise a single read call, (2) of unlimited size, (3) without allocating a buffer first.

I agree, but what should we do when all=false? We still need to allocate a buffer of some size, based either on information about the stream (bytesavailable or something similar which works), or choose an arbitrary limit.

For all=false the user has to provide a buffer, or a size in which case we allocate a buffer of that size. For files I suppose you could use the file size, but that doesn't generalize to all streams.

Good point, I had missed that.

But removing nb=typemax(Int) from the docstring doesn't sound desirable: as noted in the OP, it's useful when writing functions that call read to have a good default to pass directly, instead of doing a check and calling the method without nb if the argument was left to its default. If we don't like typemax we could use nothing by default. But typemax(Int) sounds fine here since it really means "as many bytes as possible" and clearly allocating a vector of that length doesn't make sense.

See https://github.com/JuliaLang/julia/pull/30984 for the minimal changes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

helgee picture helgee  路  3Comments

felixrehren picture felixrehren  路  3Comments

tkoolen picture tkoolen  路  3Comments

wilburtownsend picture wilburtownsend  路  3Comments

yurivish picture yurivish  路  3Comments