Pandas: DEPR: deprecate .ix

Created on 14 Sep 2016  路  22Comments  路  Source: pandas-dev/pandas

enough said.

Deprecate Indexing

Most helpful comment

Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:

df.ix[0, 'colname'] = 5

In the future can you safely do:

df.iloc[0].loc['colname'] = 5

(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be
df.loc[df.index[0], 'colname'] = 5
?

All 22 comments

What is the suggested replacement for the deprecated .ix? Is it .loc?

For me .ix works 5-10% faster than .loc:

>>> df.shape
(10000, 211)

>>> df.index
CategoricalIndex(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                  ...
                  'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
                 categories=['A', 'B', 'C'], ordered=False, dtype='category', length=10000)

>>> df.loc[['C']].shape
(8000, 211)

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

BTW, passing a list into the indexer adds another 25-50% overhead:

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.loc[['C']]
100 loops, best of 3: 9.97 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

>>> %timeit df.ix[['C']]
100 loops, best of 3: 7.57 ms per loop

yes .loc and .iloc are the expected replacements. Timings are expected to eventually be faster, though a single sub-millisecond access difference is pretty meaningless in any real usecase.

@jreback Having terabytes of data and processing it with a help of Dask DataFrame which uses Pandas DataFrames as chunks turns "milliseconds" into minutes...

@frol doesn't matter how much data you have. you are almost certainly ineffeciently using indexing operations.

@frol the indexing code paths are going to be rewritten in C/C++ as part of the pandas 2.0 effort, so the microperformance should improve by a factor of 10 or more. Some refactoring or Cythonization may be able to give some quick perf wins in .loc or .iloc

Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:

df.ix[0, 'colname'] = 5

In the future can you safely do:

df.iloc[0].loc['colname'] = 5

(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be
df.loc[df.index[0], 'colname'] = 5
?

Our experience has been that mixing positional and label indexing has been a significant source of problems for users. Here you might want to do df['colname'][0]

unambigously safe setting (may be better syntactically nicer in 2.0)

df.iloc[0, df.columns.get_loc('colname')] = 5

or

df.loc[df.index[0], 'colname'] = 5

@jreback Thanks, makes sense.

@jreback I think you have a typo with square brackets used instead of parens?

df.iloc[0, df.columns.get_loc['colname']] = 5

should be

df.iloc[0, df.columns.get_loc('colname')] = 5

@johne13 yes that was a typo, thanks!

This looks like it will be really painful for me. Rather than removing ix entirely, could it be switched to a function with keyword only args?

  df.ix(row_idx=[0,2], col_name=["foo", "bar"])

Then I can take a dangerous df.ix[[0,2], ["foo", "bar"]] and in a fairly straightforward fashion convert it into an unambiguous index without having to repeat my index name or us the df.get_loc?

@DavidEscott well you are only delaying the inevitable, so you have some choices

  • don't upgrade
  • ignore the DeprecationWarning (not this will eventually turn into a FutureWarning and eventually then be removed, but that is a ways down the road
  • change your code.

no, converting .ix to a function is not possible, its an indexer, eg. ix[ ], which is syntactically different.

@DavidEscott you're more than welcome to monkey-patch in your own function that does what you want. Since .ix has been a significant source of bugs and user problems, we no longer wish to support it

@wesm I understand that this is not an easy function to maintain, but still I find it unfortunate as it was a VERY expressive way to manipulate DataFrames... I hope someone will be able to make a code snippet to replace ix via monkey-patching?

I just found a use case that makes ix quite valuable to me. I have a Dataframe df such that df['mask'] is a boolean mask that I'd like to filter df on. With ix, I can do df[df.mask,:n] to get the first n columns, filtered by mask. Now the best way seems to be df.loc[df.mask,:].iloc[:,:3], which just reads terribly. Using df.get_loc as an indexing workaround feels very kludgy whereas the ix solution made for elegant code.

Of course I can assign a temporary df2 = df.loc[df.mask] and work from there, but that's inelegant as well.

@JonathanTay To support the boolean indexing case with first-n-columns, in addition to
df.loc[df.mask, :].iloc[:, :n]

you can use the (perhaps prettier, although same length)
df.iloc[df.mask.values, :n]
or
df.loc[df.mask, df.columns[:n]]

Yes it's 7 more characters than
df.ix[df.mask, :n]

but generally not having to worry about subtle bugs from .ix inference is worth the typing.

Can .ix can be replaced by an .loc chained with an .iloc, or a simple .loc and .iloc?

If so, why not have a wrapper around this and keep backward compatibility, and a useful method?

@ManuelLevi The issue is, _each call_ can be replaced with .iloc, .loc, or a combination, but there's no good way for .ix to tell which to use.

E.g. if you provide a DataFrame with the Index([0, 2, 4, 6, 8]), and call .ix[:4] on it. Did you want .ix to implicitly use .iloc (returning the first 4 elements) or .loc (returning the first 3 elements)?

@Liam3851 I see what you mean.

I usually use .iloc and .loc combined, but the impact this will have is greater than me. I believe it impacts all the pandas' community.

A quick search for df.ix on GitHub shows almost 4M results. Maybe half a million notebooks and almost 200k python files will break after this. Many of these opensource tutorials and libraries people are counting on.

Could there be a simple way to change the function behaviour instead of removing it? Maybe assume integers to always be locations, and other types to always be a label?

This is such a great feature, would be a shame to get it lost...
Please consider some of the suggestions above as a way to ease maintenance

@ManuelLevi As I understand it, ix treats anything that could be a label, as a label. This was a source of bugs. For example, if a Series s is indexed by integers [5,3,2,4], then should s.ix[0] return the 0th element or raise KeyError? What if s.index = ['a','b','c'] or [0,1,2,3]? @Liam3851 has a point that the bugs and unexpected behaviour just keep coming once you allow the ambiguity. For example, label based indexing (loc) takes both end points, while position-based (iloc) takes the start but not the end.

Was this page helpful?
0 / 5 - 0 ratings