Pandas: ENH: add ignore_index argument to DataFrame.explode / Series.explode

Created on 22 Jun 2020  Â·  10Comments  Â·  Source: pandas-dev/pandas

When we use DataFrame.explode right now, it will repeat the index for each element in the iterator. To keep it consistent with the methods like DataFrame.sort_values, DataFrame.append and pd.concat, we can add an argument ignore_index, which will reset the index.

df = pd.DataFrame({'id':range(0,30,10),
                   'values':[list('abc'), list('def'), list('ghi')]})
print(df)

   id     values
0   0  [a, b, c]
1  10  [d, e, f]
2  20  [g, h, i]

print(df.explode('values'))
   id values
0   0      a
0   0      b
0   0      c
1  10      d
1  10      e
1  10      f
2  20      g
2  20      h
2  20      i

Expected behaviour with addition of the argument:

df.explode('values', ignore_index=True)

   id values
0   0      a
1   0      b
2   0      c
3  10      d
4  10      e
5  10      f
6  20      g
7  20      h
8  20      i
Enhancement Needs Discussion Reshaping

All 10 comments

If this change looks oké by one of the devs, I can submit a PR for this.

take

I think something like this was discussed when explode was originally implemented. @erfannariman can you go through the original pull request implementing explode and summarize the discussion on this point?

I looked at the following discussions and couldn't find anything about resetting the index:

  1. https://github.com/pandas-dev/pandas/issues/16538
  2. https://github.com/pandas-dev/pandas/issues/10511
  3. https://github.com/pandas-dev/pandas/pull/27267
  4. https://github.com/pandas-dev/pandas/issues/28005

Not sure if I missed anything. @TomAugspurger

Thanks for checking.

On Mon, Jun 22, 2020 at 7:03 AM Erfan Nariman notifications@github.com
wrote:

I looked at the following discussions and couldn't find anything about
resetting the index:

  1. #16538 https://github.com/pandas-dev/pandas/issues/16538
  2. #10511 https://github.com/pandas-dev/pandas/issues/10511
  3. #27267 https://github.com/pandas-dev/pandas/pull/27267

Not sure if I missed anything. @TomAugspurger
https://github.com/TomAugspurger

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/34932#issuecomment-647473754,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKAOIULGOMPJDVR5X7QKE3RX5CCHANCNFSM4OEOKHFA
.

it’s ok to add this argument (was added elsewhere after explode existed)

What's the upside of adding this as an argument instead of just calling reset_index?

Not sure if im in the position to comment on your question, but in terms of API design, isn't that in the line of other methods like DataFrame.append, DataFrame.sort_values, pd.concat? Or do you mean internally wise? @WillAyd

Ah OK makes sense since we do elsewhere

For the ignore_index, another example where this was added recently is drop_duplicates (https://github.com/pandas-dev/pandas/pull/30405) and sort_values (https://github.com/pandas-dev/pandas/pull/30402).
And another reason to add it is that it can be a bit more performant (avoid an additional copy as you would have with reset_index(drop=True)).


One aspect related to the index of the result that was briefly discussed in the original PR (https://github.com/pandas-dev/pandas/pull/27267#pullrequestreview-259233262) is whether to add a level to the index with a "count", thus resulting in a MultiIndex (which could eg be useful if you want to do an unstack in a next step).

I personally think that could still be useful, and we could potentially think about combining that in a single keyword. However, since ignore_index is already used in other places, probably better to consider this separately.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

marcelnem picture marcelnem  Â·  3Comments

amelio-vazquez-reina picture amelio-vazquez-reina  Â·  3Comments

nathanielatom picture nathanielatom  Â·  3Comments

swails picture swails  Â·  3Comments

andreas-thomik picture andreas-thomik  Â·  3Comments