Pandas: get_dummies with sparse doesn't convert numeric to sparse

Created on 8 Dec 2017  路  4Comments  路  Source: pandas-dev/pandas

I got the error
AttributeError: 'IntBlock' object has no attribute 'sp_index'

when converting a SparseDataFrame to Scipy csr_matrix using the following code:

dfTotalCat = get_dummies(dfTotalCat, sparse=True)

XTotalCat = csr_matrix(dfTotalCat.to_coo())

The SparseDataFrame is obtained from get_dummies.

Following is the exact error trace:

Traceback (most recent call last):
File "pandaSrc.py", line 76, in
XTotalCat = csr_matrix(dfTotalCat.to_coo())
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\frame.py", line 255, in to_coo
row = s.sp_index.to_int_index().indices
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\series.py", line 245, in sp_index
return self.block.sp_index
AttributeError: 'IntBlock' object has no attribute 'sp_index'

Bug Needs Info Reshaping Sparse

Most helpful comment

If not _all_ of your columns are dummy encoded, then it will return some columns that are not sparse. Seems like if you df.to_sparse() before dummy encoding, the error should go away.

@TomAugspurger -- I don't know enough to know if this is expected behavior and docs just need to be updated (took me a bit to figure it out...)

Repro code below

```
import pandas as pd

df = pd.DataFrame(
{
"A": ["a", "b", "c", "a"],
"B": [1, 2, 3, 4]
})
df['A'] = df['A'].astype('category')

def _throw_no_attribute_sp_index_err():
one_hot = pd.get_dummies(df, sparse=True)
print(one_hot.columns)
one_hot.to_coo()

def _no_throw():
one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
print(one_hot.columns)
one_hot.to_coo()
````

All 4 comments

Could you make a reproducible examples? What's dfTotalCat?

If not _all_ of your columns are dummy encoded, then it will return some columns that are not sparse. Seems like if you df.to_sparse() before dummy encoding, the error should go away.

@TomAugspurger -- I don't know enough to know if this is expected behavior and docs just need to be updated (took me a bit to figure it out...)

Repro code below

```
import pandas as pd

df = pd.DataFrame(
{
"A": ["a", "b", "c", "a"],
"B": [1, 2, 3, 4]
})
df['A'] = df['A'].astype('category')

def _throw_no_attribute_sp_index_err():
one_hot = pd.get_dummies(df, sparse=True)
print(one_hot.columns)
one_hot.to_coo()

def _no_throw():
one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
print(one_hot.columns)
one_hot.to_coo()
````

Thanks. I'm not sure that get_dummies(sparse=True) should convert numeric columns. I think it's best to just document this.

I did a little digging into this... and what happens is that get_dummies somehow casts the non-sparse column as sparse even though the underlying block is not sparse. Which causes some cascading issues like the sp_index error. Haven't quite figured out what is going on with that but right now my hypothesis is that it's something to do with how concat is working with sparse frames.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

swails picture swails  路  3Comments

venuktan picture venuktan  路  3Comments

ericdf picture ericdf  路  3Comments

MatzeB picture MatzeB  路  3Comments

andreas-thomik picture andreas-thomik  路  3Comments