Pandas: ValueError: buffer source array is read-only

Created on 5 Feb 2020  路  9Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

>>> index = pd.date_range('2020', 'now', freq='1h') 
>>> arr = np.zeros_like(index) 
>>> arr.setflags(write=False)
>>> pd.Series(arr, index=index).resample('1d').agg('last')

-------------------------------------------------------------------------
~/pandas/pandas/_libs/groupby.pyx in pandas._libs.groupby.group_last()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Problem description

Groupby fails on some read-only buffers (I couldn't quickly reproduce it with .groupby() itself, sorry).

The prime solution would be to add const specifier to the input values here (and related entries):
https://github.com/pandas-dev/pandas/blob/0b6debf825db39f8f63d300cb8007a7996d002fc/pandas/_libs/groupby.pyx#L853

if it were not for Cython's non-support of const fused types (https://github.com/cython/cython/issues/1772), resolved in (https://github.com/cython/cython/pull/3118), but despite miniscule change only scheduled for release in Cython 3.0. I guess wait until then.

Expected Output

Resampling/groupby works with read-only arrays.

Output of pd.show_versions()

pandas 1.1.0.dev0+361.gf0b00f887
cython 0.29.14

Groupby Regression Resample

Most helpful comment

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

All 9 comments

I think I've found a related issue with isin (at least, it throws the same sort of error!). May give some more insight though, as it was tricky to reproduce:

# python 3.7, on a Windows machine, Anaconda distribution, Cython = v0.28.5, then v0.29.15
import numpy as np  # v1.16.4, then v1.18.1
import pandas as pd  # v1.0.1

arr = np.array([1,2,3], dtype=np.int64) # works fine if I don't set the dtype!

arr.setflags(write=False) # make it read-only

df = pd.DataFrame({"col": [2,4,8]})

test = df.col.isin(arr)

... and this then raises an exception as above:

...
pandas\_libs\hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_int64()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview_cwrapper()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Interesting, huh?

FWIW I originally ran into this when using Google's BigQuery python client to retrieve data from a table with an array-type field. When retrieved via .to_dataframe() it came back as a column of np.array(dtype=int64)s with the OWNDATA and WRITEABLE flags set to False. The latter flag was the problem, although as stated above the data-type appears to matter (note that the stack trace refers to pandas._libs.hashtable.ismember_int64()).

@jbrockmendel you think this is essentially unfixable until we use cython>=0.30? If so, let's remove it from the 1.0.2 milestone.

you think this is essentially unfixable until we use cython>=0.30?

It's viable, just invasive and a hassle, whereas with cython 0.30 it should be a 1-liner. Depends on when we expect that to be available cc @scoder?

when we expect that to be available

This year. :)
I'm planning to release an "official alpha" in a couple of weeks, as soon as I catch up with the currently pending PRs, and as soon as it's in a state that allows users to prepare for it.

(Remember that the good thing about Cython is that you can control when and how the C code is generated, so you're rather free to use whatever released or unreleased Cython package you want, as long as you test it sufficiently on your side. That said, it's obviously easier to use a proper PyPI release, that's why I'd like to get at least _something_ out soon.)

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

@scoder thanks for the rundown. Big releases aren't easy, and we appreciate all the time and effort you put in.

@TomAugspurger I'm inclined to push this off to 1.0.3 as long as we have lower-hanging-fruit regressions to handle

Agreed.

Do we know what caused this regression? Because if this was supported before with cython < 0.30, why is it hard to support it now with that cython version?

Was this page helpful?
0 / 5 - 0 ratings