Pandas: ValueError: buffer source array is read-only

Created on 5 Feb 2020 · 9Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

>>> index = pd.date_range('2020', 'now', freq='1h') 
>>> arr = np.zeros_like(index) 
>>> arr.setflags(write=False)
>>> pd.Series(arr, index=index).resample('1d').agg('last')

-------------------------------------------------------------------------
~/pandas/pandas/_libs/groupby.pyx in pandas._libs.groupby.group_last()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Problem description

Groupby fails on some read-only buffers (I couldn't quickly reproduce it with .groupby() itself, sorry).

The prime solution would be to add const specifier to the input values here (and related entries):
https://github.com/pandas-dev/pandas/blob/0b6debf825db39f8f63d300cb8007a7996d002fc/pandas/_libs/groupby.pyx#L853

if it were not for Cython's non-support of const fused types (https://github.com/cython/cython/issues/1772), resolved in (https://github.com/cython/cython/pull/3118), but despite miniscule change only scheduled for release in Cython 3.0. I guess wait until then.

Expected Output

Resampling/groupby works with read-only arrays.

Output of `pd.show_versions()`

pandas 1.1.0.dev0+361.gf0b00f887
cython 0.29.14

Groupby Regression Resample

Source

kernc

Most helpful comment

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

scoder on 9 Mar 2020

❤1 😄1

All 9 comments

I think I've found a related issue with isin (at least, it throws the same sort of error!). May give some more insight though, as it was tricky to reproduce:

# python 3.7, on a Windows machine, Anaconda distribution, Cython = v0.28.5, then v0.29.15
import numpy as np  # v1.16.4, then v1.18.1
import pandas as pd  # v1.0.1

arr = np.array([1,2,3], dtype=np.int64) # works fine if I don't set the dtype!

arr.setflags(write=False) # make it read-only

df = pd.DataFrame({"col": [2,4,8]})

test = df.col.isin(arr)

... and this then raises an exception as above:

...
pandas\_libs\hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_int64()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview_cwrapper()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Interesting, huh?

FWIW I originally ran into this when using Google's BigQuery python client to retrieve data from a table with an array-type field. When retrieved via .to_dataframe() it came back as a column of np.array(dtype=int64)s with the OWNDATA and WRITEABLE flags set to False. The latter flag was the problem, although as stated above the data-type appears to matter (note that the stack trace refers to pandas._libs.hashtable.ismember_int64()).

PaddyAlton on 27 Feb 2020

👍1

@jbrockmendel you think this is essentially unfixable until we use cython>=0.30? If so, let's remove it from the 1.0.2 milestone.

TomAugspurger on 6 Mar 2020

you think this is essentially unfixable until we use cython>=0.30?

It's viable, just invasive and a hassle, whereas with cython 0.30 it should be a 1-liner. Depends on when we expect that to be available cc @scoder?

jbrockmendel on 6 Mar 2020

👍1

when we expect that to be available

This year. :)
I'm planning to release an "official alpha" in a couple of weeks, as soon as I catch up with the currently pending PRs, and as soon as it's in a state that allows users to prepare for it.

scoder on 9 Mar 2020

(Remember that the good thing about Cython is that you can control when and how the C code is generated, so you're rather free to use whatever released or unreleased Cython package you want, as long as you test it sufficiently on your side. That said, it's obviously easier to use a proper PyPI release, that's why I'd like to get at least _something_ out soon.)

scoder on 9 Mar 2020

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

scoder on 9 Mar 2020

❤1 😄1

@scoder thanks for the rundown. Big releases aren't easy, and we appreciate all the time and effort you put in.

@TomAugspurger I'm inclined to push this off to 1.0.3 as long as we have lower-hanging-fruit regressions to handle

jbrockmendel on 9 Mar 2020

Agreed.

TomAugspurger on 9 Mar 2020

Do we know what caused this regression? Because if this was supported before with cython < 0.30, why is it hard to support it now with that cython version?