Pandas: DEPR: deprecate .ix

Created on 14 Sep 2016 · 22Comments · Source: pandas-dev/pandas

enough said.

Deprecate Indexing

Source

jreback

🎉5

Most helpful comment

Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:

df.ix[0, 'colname'] = 5

In the future can you safely do:

df.iloc[0].loc['colname'] = 5

(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be
df.loc[df.index[0], 'colname'] = 5
?

Liam3851 on 15 Sep 2016

👍3

All 22 comments

What is the suggested replacement for the deprecated .ix? Is it .loc?

For me .ix works 5-10% faster than .loc:

>>> df.shape
(10000, 211)

>>> df.index
CategoricalIndex(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                  ...
                  'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
                 categories=['A', 'B', 'C'], ordered=False, dtype='category', length=10000)

>>> df.loc[['C']].shape
(8000, 211)

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

BTW, passing a list into the indexer adds another 25-50% overhead:

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.loc[['C']]
100 loops, best of 3: 9.97 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

>>> %timeit df.ix[['C']]
100 loops, best of 3: 7.57 ms per loop

frol on 15 Sep 2016

yes .loc and .iloc are the expected replacements. Timings are expected to eventually be faster, though a single sub-millisecond access difference is pretty meaningless in any real usecase.

jreback on 15 Sep 2016

👎3 👍1

@jreback Having terabytes of data and processing it with a help of Dask DataFrame which uses Pandas DataFrames as chunks turns "milliseconds" into minutes...

frol on 15 Sep 2016

👎1

@frol doesn't matter how much data you have. you are almost certainly ineffeciently using indexing operations.

jreback on 15 Sep 2016

@frol the indexing code paths are going to be rewritten in C/C++ as part of the pandas 2.0 effort, so the microperformance should improve by a factor of 10 or more. Some refactoring or Cythonization may be able to give some quick perf wins in .loc or .iloc

wesm on 15 Sep 2016

Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:

df.ix[0, 'colname'] = 5

In the future can you safely do:

df.iloc[0].loc['colname'] = 5

(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be
df.loc[df.index[0], 'colname'] = 5
?

Liam3851 on 15 Sep 2016

👍3

Our experience has been that mixing positional and label indexing has been a significant source of problems for users. Here you might want to do df['colname'][0]

wesm on 15 Sep 2016

👍1

unambigously safe setting (may be better syntactically nicer in 2.0)

df.iloc[0, df.columns.get_loc('colname')] = 5

df.loc[df.index[0], 'colname'] = 5

jreback on 15 Sep 2016

👍1

@jreback Thanks, makes sense.

Liam3851 on 15 Sep 2016

@jreback I think you have a typo with square brackets used instead of parens?

df.iloc[0, df.columns.get_loc['colname']] = 5

should be

df.iloc[0, df.columns.get_loc('colname')] = 5

johne13 on 25 Dec 2016

@johne13 yes that was a typo, thanks!

jreback on 26 Dec 2016

This looks like it will be really painful for me. Rather than removing ix entirely, could it be switched to a function with keyword only args?

  df.ix(row_idx=[0,2], col_name=["foo", "bar"])

Then I can take a dangerous df.ix[[0,2], ["foo", "bar"]] and in a fairly straightforward fashion convert it into an unambiguous index without having to repeat my index name or us the df.get_loc?

DavidEscott on 25 Jan 2017

👍2

@DavidEscott well you are only delaying the inevitable, so you have some choices

don't upgrade
ignore the DeprecationWarning (not this will eventually turn into a FutureWarning and eventually then be removed, but that is a ways down the road
change your code.

no, converting .ix to a function is not possible, its an indexer, eg. ix[ ], which is syntactically different.

jreback on 25 Jan 2017

@DavidEscott you're more than welcome to monkey-patch in your own function that does what you want. Since .ix has been a significant source of bugs and user problems, we no longer wish to support it

wesm on 25 Jan 2017

@wesm I understand that this is not an easy function to maintain, but still I find it unfortunate as it was a VERY expressive way to manipulate DataFrames... I hope someone will be able to make a code snippet to replace ix via monkey-patching?

lrq3000 on 27 Jan 2018

👍1

I just found a use case that makes ix quite valuable to me. I have a Dataframe df such that df['mask'] is a boolean mask that I'd like to filter df on. With ix, I can do df[df.mask,:n] to get the first n columns, filtered by mask. Now the best way seems to be df.loc[df.mask,:].iloc[:,:3], which just reads terribly. Using df.get_loc as an indexing workaround feels very kludgy whereas the ix solution made for elegant code.

Of course I can assign a temporary df2 = df.loc[df.mask] and work from there, but that's inelegant as well.

JonathanTay on 6 Jun 2018

@JonathanTay To support the boolean indexing case with first-n-columns, in addition to
df.loc[df.mask, :].iloc[:, :n]

you can use the (perhaps prettier, although same length)
df.iloc[df.mask.values, :n]
or
df.loc[df.mask, df.columns[:n]]

Yes it's 7 more characters than
df.ix[df.mask, :n]

but generally not having to worry about subtle bugs from .ix inference is worth the typing.

Liam3851 on 7 Jun 2018

Can .ix can be replaced by an .loc chained with an .iloc, or a simple .loc and .iloc?

If so, why not have a wrapper around this and keep backward compatibility, and a useful method?

ManuelLevi on 17 Jul 2018

@ManuelLevi The issue is, _each call_ can be replaced with .iloc, .loc, or a combination, but there's no good way for .ix to tell which to use.

E.g. if you provide a DataFrame with the Index([0, 2, 4, 6, 8]), and call .ix[:4] on it. Did you want .ix to implicitly use .iloc (returning the first 4 elements) or .loc (returning the first 3 elements)?

Liam3851 on 18 Jul 2018

@Liam3851 I see what you mean.

I usually use .iloc and .loc combined, but the impact this will have is greater than me. I believe it impacts all the pandas' community.

A quick search for df.ix on GitHub shows almost 4M results. Maybe half a million notebooks and almost 200k python files will break after this. Many of these opensource tutorials and libraries people are counting on.

Could there be a simple way to change the function behaviour instead of removing it? Maybe assume integers to always be locations, and other types to always be a label?

ManuelLevi on 20 Jul 2018

This is such a great feature, would be a shame to get it lost...
Please consider some of the suggestions above as a way to ease maintenance

miguelcdpmarques on 20 Jul 2018

@ManuelLevi As I understand it, ix treats anything that could be a label, as a label. This was a source of bugs. For example, if a Series s is indexed by integers [5,3,2,4], then should s.ix[0] return the 0th element or raise KeyError? What if s.index = ['a','b','c'] or [0,1,2,3]? @Liam3851 has a point that the bugs and unexpected behaviour just keep coming once you allow the ambiguity. For example, label based indexing (loc) takes both end points, while position-based (iloc) takes the start but not the end.

JonathanTay on 20 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings