Pandas: ENH: add ignore_index argument to DataFrame.explode / Series.explode

Created on 22 Jun 2020 · 10Comments · Source: pandas-dev/pandas

When we use DataFrame.explode right now, it will repeat the index for each element in the iterator. To keep it consistent with the methods like DataFrame.sort_values, DataFrame.append and pd.concat, we can add an argument ignore_index, which will reset the index.

df = pd.DataFrame({'id':range(0,30,10),
                   'values':[list('abc'), list('def'), list('ghi')]})
print(df)

   id     values
0   0  [a, b, c]
1  10  [d, e, f]
2  20  [g, h, i]

print(df.explode('values'))
   id values
0   0      a
0   0      b
0   0      c
1  10      d
1  10      e
1  10      f
2  20      g
2  20      h
2  20      i

Expected behaviour with addition of the argument:

df.explode('values', ignore_index=True)

   id values
0   0      a
1   0      b
2   0      c
3  10      d
4  10      e
5  10      f
6  20      g
7  20      h
8  20      i

Enhancement Needs Discussion Reshaping

Source

erfannariman

All 10 comments

If this change looks oké by one of the devs, I can submit a PR for this.

erfannariman on 22 Jun 2020

take

erfannariman on 22 Jun 2020

I think something like this was discussed when explode was originally implemented. @erfannariman can you go through the original pull request implementing explode and summarize the discussion on this point?

TomAugspurger on 22 Jun 2020

👍1

I looked at the following discussions and couldn't find anything about resetting the index:

Not sure if I missed anything. @TomAugspurger

erfannariman on 22 Jun 2020

Thanks for checking.

On Mon, Jun 22, 2020 at 7:03 AM Erfan Nariman notifications@github.com
wrote:

I looked at the following discussions and couldn't find anything about
resetting the index:

#16538 https://github.com/pandas-dev/pandas/issues/16538

#10511 https://github.com/pandas-dev/pandas/issues/10511

#27267 https://github.com/pandas-dev/pandas/pull/27267

Not sure if I missed anything. @TomAugspurger
https://github.com/TomAugspurger

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/34932#issuecomment-647473754,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKAOIULGOMPJDVR5X7QKE3RX5CCHANCNFSM4OEOKHFA
.

TomAugspurger on 22 Jun 2020

it’s ok to add this argument (was added elsewhere after explode existed)

jreback on 22 Jun 2020

What's the upside of adding this as an argument instead of just calling reset_index?

WillAyd on 22 Jun 2020

Not sure if im in the position to comment on your question, but in terms of API design, isn't that in the line of other methods like DataFrame.append, DataFrame.sort_values, pd.concat? Or do you mean internally wise? @WillAyd

erfannariman on 22 Jun 2020

👍1

Ah OK makes sense since we do elsewhere

WillAyd on 22 Jun 2020

For the ignore_index, another example where this was added recently is drop_duplicates (https://github.com/pandas-dev/pandas/pull/30405) and sort_values (https://github.com/pandas-dev/pandas/pull/30402).
And another reason to add it is that it can be a bit more performant (avoid an additional copy as you would have with reset_index(drop=True)).

One aspect related to the index of the result that was briefly discussed in the original PR (https://github.com/pandas-dev/pandas/pull/27267#pullrequestreview-259233262) is whether to add a level to the index with a "count", thus resulting in a MultiIndex (which could eg be useful if you want to do an unstack in a next step).

I personally think that could still be useful, and we could potentially think about combining that in a single keyword. However, since ignore_index is already used in other places, probably better to consider this separately.