fromfile
invalid data and sometimes segfault if reading past the end of a file i.e. it does not check if reading will go past the file end. This issue leads to a segfault on Ubuntu 16.04, but seems to not segfault on OSX.
import numpy as np
def test_read_from_file():
# create an empty file named `empty.bin`
filename = 'empty.bin'
open(filename, 'a').close()
# read large chunk of data, past the end of the file
dtype = [('data', '<f4', 500,)]
count = 100000000
with open(filename, 'rb') as fh:
data = np.fromfile(fh, dtype, count)
print(data.shape)
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
platform linux -- Python 3.6.6, pytest-3.8.2, py-1.6.0, pluggy-0.7.1
Just to note, reproducable on 1.15.3. I guess we know the size, so this should just raise an error, or read the whole file. If this works silently on some systems, maybe we should put a release note just in case (I still would say we can just fix it).
EDIT: I would tend to error, just thought whole file might be an option because of indexing, but indexing is a bit special in this regard.
Agreed, raising an error sounds like a good idea. There still is something to be said about partial reads, which could be handled in two ways:
count
in fromfile
is reached, but then we need a mechanism for explicitly returning the actual number of records that were read (implicitly this should be visible in the shape of the resulting array). An error can still be raised because it is not the normal usage scenario.I don't know which scenarios fits better with the numpy philosophy, but the first option sounds more useful.
I think an error is most reasonable. What I am not sure about right now is if fromfile supports file like objects that do not have a known size, or what currently happens in the case of non-empty sep kwarg.
@amuresan the code for fromfile is in C, but if you have a bit of time, we are always very happy about pull requests, and it seems like a reasonable difficulty to dabble a bit into the C (Python) API.
I believe the problem here is actually that on ubuntu you are getting a MemoryError
that is being handled incorrectly and causing the segfault.
A PR with a fix is here: https://github.com/numpy/numpy/pull/12354
Most helpful comment
I believe the problem here is actually that on ubuntu you are getting a
MemoryError
that is being handled incorrectly and causing the segfault.A PR with a fix is here: https://github.com/numpy/numpy/pull/12354