Pandas: ENH: Pandas `DataFrame.append` and `Series.append` methods should get an `inplace` kwag

Created on 4 Dec 2016 · 6Comments · Source: pandas-dev/pandas

Problem description

Currently to append to a DataFrame, the following is the approach:

df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
df = df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')))

append is a DataFrame or Series method, and as such should be able to modify the DataFrame or Series in place. If in place modification is not required, one may use concat or set inplace kwag to False. It will avoid an explicit assignment operation which is quite slow in Python, as we all know. Further, it will make the expected behavior similar to Python lists, and avoid questions such as these: 1, 2...

Additionally at present, append is full subset of concat, and as such it need not exist at all. Given the vast number of functions to append a DataFrame or Series to another in Pandas, it makes sense that each has it's merits and demerits. Gaining an inplace kwag will clearly distinguish append from concat, and simplify code.

I understand that this issue was raised in #2801 a long time ago. However, the conversation in that deviated from the simplification offered by the inplace kwag to performance enhancement. I (and many like me) are looking for ease of use, and not so much at performance. Also, we expect the data to fit in memory (which is a limitation even with current version of append).

Expected Code

df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')), inplace=True)

API Design Performance Reshaping

Source

dragonator4

Most helpful comment

In the case of a namedtuple which contains a Series object, the inplace approach would be nice to have as a feature.
This would not be related in any way to the performance but would be a way to expose data to users.

Indeed, the nametuple objects are by design providing a way for writing a library and exposing it to a user allowing them to only modify it inplace.
Trying to overwrite an attribute of a namedtuple is intentionally raising AttributeError: can't set attribute so that the user does not try to affect your library. But mutable attributes are allowed.

Consider the following dummy code:

from collections import namedtuple
from pandas import Series

# ----- Library part ------
sample_schema = {
    "name": str,
    "some_info": str,
    "content": Series
}

my_data_type = namedtuple("MyDataType", sample_schema.keys())

exposed_data = my_data_type(
    name="Library data",
    some_info="Modify the content as you want",
    content=Series({"a": 0})
)


# ----- User code part ------
series_to_be_appended = Series({"b": 0})

 # This is forbidden
exposed_data.content = exposed_data.content.append(series_to_be_appended)

# This would be allowed but is not implemented in Series
exposed_data.content.append(series_to_be_appended, inplace=True)

The name and some_info attributes are string and therefore immutable. A user would not (easily) be able to affect them. But here the content can be modified as long as it is not set to a new object altogether.

I would think inplace methods are nice to have on any mutable object in general.

remidebette on 2 Aug 2017

👍4

All 6 comments

I am opposed to this for the exact reasons discussed in #2801: it would mislead users who might expect a performance benefit.

shoyer on 4 Dec 2016

Virtually all of pandas methods return a new object, the exception being the indexing operations. Using inplace is not idiomatic, quite unreadable and not (more) performant at all.

Closing, though if someone thinks that we should add a signature like

(...., inplace=False), and then raise a TypeError if inplace=True to give a nice error message, then we can reopen for that purpose.

In [2]: df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
   ...: df.append(pd.DataFrame(np.random.rand(5,3), columns=list('abc')), inplace=True)
TypeError: append() got an unexpected keyword argument 'inplace'

jreback on 4 Dec 2016

👎1

Consider the following dummy code:

from collections import namedtuple
from pandas import Series

# ----- Library part ------
sample_schema = {
    "name": str,
    "some_info": str,
    "content": Series
}

my_data_type = namedtuple("MyDataType", sample_schema.keys())

exposed_data = my_data_type(
    name="Library data",
    some_info="Modify the content as you want",
    content=Series({"a": 0})
)


# ----- User code part ------
series_to_be_appended = Series({"b": 0})

 # This is forbidden
exposed_data.content = exposed_data.content.append(series_to_be_appended)

# This would be allowed but is not implemented in Series
exposed_data.content.append(series_to_be_appended, inplace=True)

I would think inplace methods are nice to have on any mutable object in general.

remidebette on 2 Aug 2017

👍4

So the consensus among the maintainers is that it would be too confusing to have an append() method which actually appends?

I'd suggest removing the method from DataFrame entirely, or potentially renaming it. Someone familiar with pandas might find it confusing, but the opposite is currently true for those of us without your level of experience.

rtruxal on 13 Mar 2019

👍1

Agreeing here.
Never got why Pandas affords an API having its own logic rather than sharing the one of Python itself. One can get used to the fact that most pandas methods return objects rather than modifying their objects, although its counter-intuitive. (Pandas standard behavior is imho counter-intuitive for all persons that use more Python than Pandas, which should be most of the user-base). And one can get used to the fact that most Pandas methods behave as a user would expect it when passing inplace=True as argument.

Can live still with that. But not adding the possibility to specify inplace for append() and defaulting just it to False, which effectively keeps the method for all who want it but greatly helps those who need it, is something I cannot follow. Sorry.

paulstapor on 26 Oct 2020

Adding a usecase:

Have a lot of csv files, with few entries in each, many of which have additional columns.
Want a combined dataframe, which should consist of the additional columns. (Land right up on pandas.DataFrame.append() docs)

Columns in other that are not in the caller are added as new columns.

Above line reassures that I landed up in the right place.

combined_dataframe = pd.DataFrame()
for dataframe in list_of_dataframes_read_from_csvs:
    combined_dataframe.append(dataframe, inplace=True)

This raised an error, checked docs, no inplace for append(), led me to this issue.

aitikgupta on 16 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Suffixes ignored on second merge

MatzeB · 3Comments

to_sql UnicodeEncodeError

matthiasroder · 3Comments

ValueError plotting bar plot from DataFrame with existing Axes

swails · 3Comments

Incompatibility between pandas.infer_freq and pandas.to_timedelta

idanivanov · 3Comments

Storing a dict in a DataFrame fails

andreas-thomik · 3Comments