Pandas: TST: Add tests for dtype changes when assigning values via indexing

Created on 8 Nov 2020  路  12Comments  路  Source: pandas-dev/pandas

37680 will fix the assigning problems reported in #26395.

We should add further tests for iloc/iat and different dtypes for all cases (loc, at, iloc, iat)

cc @jreback

Dtypes Indexing Needs Tests good first issue

Most helpful comment

take

All 12 comments

take

I would like to contribute this one

Ping @allenmac347 are you still working on this?

Hi, yes I am still working on making tests. I have been busy with classes lately but with Thanksgiving break coming up I should be more free to work on this issue. My apologies for the delay.

@phofl hello, so after testing loc, iloc, at, and iat, it seems like at and iat does not seem to be able to upcast at all for different dtypes. This code for example throws an error:
// column D initially has an int64 dtype
df = pd.DataFrame(index=['A','B','C'])
df['D'] = 0
df.at['B', 'D'] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

This is the same case for iat. Both loc and iloc seem to work fine and promotes column D to an object dtype. It seems like both iat and at are forcing the assigned value to cast to the dtype of the column, hence why assigning a float to int via at/iat does not change the dtype of the column as found in the bug+fix you mentioned. Please let me know if there are further specific tests you wanted me to try before I submit a pull request.

cc @jbrockmendel is this done on purpose?

is this done on purpose?

no, at and iat should behave just like loc and iloc, just restricted to scalar arguments (and i guess without setitem-with-expansion)

@allenmac347 Are you interested in debugging this further?

@phofl yep, I'd be happy to debug this further

Great, you can ping me, if you need some help

@phofl Hey just a quick update. After further debugging this issue, I found that the issue occurs in the _set_value() function on line 3187 in pandas/core/frame.py. Here is the relevant piece of code for when using at to assign a string to an int64 DataFrame index:
Screenshot from 2020-11-30 23-46-20

series._values[loc] = value throws a ValueError which the except block doesn't catch. I made somewhat of a hacky fix where I do the following:

Screenshot from 2020-11-30 23-45-19

This would make the code go into the except block where it would use loc to convert the dtype to an object and insert the value into the DataFrame. This seemed to fix the ValueError issue and also now correctly upcasts the DataFrame dtype. However, this fix now causes "pandas/tests/frame/indexing/test_set_value.py::TestSetValue::test_set_value_resize" to fail.

Screenshot from 2020-11-30 23-47-32

It appears that this particular test was expecting pandas to raise a ValueError in this particular instance, hence why it's failing. I was wondering how I should move forward? thanks

@jbrockmendel Any suggestions how to handle this?

ser = Series([1, 2, 3], index=list("abc"))
ser.loc["b"] = "test"
ser = Series([1, 2, 3], index=list("abc"))
ser.at["b"] = "test"

The loc case works while the at case raises. This seems odd to me

Was this page helpful?
0 / 5 - 0 ratings

Related issues

idanivanov picture idanivanov  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

matthiasroder picture matthiasroder  路  3Comments

marcelnem picture marcelnem  路  3Comments

ebran picture ebran  路  3Comments