In [1]: import pandas as pd
In [2]: pd.Series([1,2,3]).sum(numeric_only=False)
Out[2]: 6
In [3]: pd.Series([1,2,3]).sum(numeric_only=True)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-3-2c46bd289e26> in <module>()
----> 1 pd.Series([1,2,3]).sum(numeric_only=True)
/users/is/whughes/pyenvs/research/lib/python2.7/site-packages/pandas-0.16.2_ahl1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
4253 skipna=skipna)
4254 return self._reduce(f, name, axis=axis,
-> 4255 skipna=skipna, numeric_only=numeric_only)
4256 stat_func.__name__ = name
4257 return stat_func
/users/is/whughes/pyenvs/research/lib/python2.7/site-packages/pandas-0.16.2_ahl1-py2.7-linux-x86_64.egg/pandas/core/series.pyc in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
2081 if numeric_only:
2082 raise NotImplementedError(
-> 2083 'Series.{0} does not implement numeric_only.'.format(name))
2084 return op(delegate, skipna=skipna, **kwds)
2085
NotImplementedError: Series.sum does not implement numeric_only.
The docstring suggests this is a legitimate argument:
Return the sum of the values for the requested axis
Parameters
----------
axis : {index (0)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar
numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use
everything, then use only numeric data
Returns
-------
sum : scalar or Series (if level specified)
However, strangely, there's an explicit test that this throws an exception: https://github.com/pydata/pandas/blob/054821dc90ded4263edf7c8d5b333c1d65ff53a4/pandas/tests/test_series.py#L2724
this is just for compat as its a general parameter that matters for DataFrames. (and the function is auto-generated). If you can find a way to not-expose it without jumping thru hoops would be ok.
OK, so numeric_only is accepted by Series.sum simply for compatibility with DataFrame.sum. You're proposing we find a way to hide this specific parameter in the docstring.
Have I understood correctly?
Ok
I'll freely admit I'm a pandas novice, but I ran headlong into what I think was this bug just now. I wanted numeric_only with Series.mean rather than sum; I assume that falls under this issue as well. The documentation says this option exists but the code says it doesn't. pandas version 0.18.1, documentation from a matching-version manual (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html) (although obviously that link may age out).
@smlewis - can you show an example of some data where you needed this and what you you expected to happen? Note that the implemented usecase is for selecting numeric _columns_, like
df = pd.DataFrame({'a': [2,3,4], 'b': pd.timedelta_range('1s', periods=3)})
df
Out[63]:
a b
0 2 0 days 00:00:01
1 3 1 days 00:00:01
2 4 2 days 00:00:01
df.mean()
Out[65]:
a 3
b 1 days 00:00:01
dtype: object
df.mean(numeric_only=True)
Out[64]:
a 3.0
dtype: float64
The input file for my dataframe was constructed in a stupid way (by me...): several similar data sources were concatenated so I could process their averages all at once instead of running the script N times. The concatenation meant that each group had its header repeated (except the first, which I'd edited manually to properly name the column; that column was a mangling of the source filename inserted at concatenation time). So you get a data set like this:
source score
alpha 2
alpha 3
alpha 2
beta score
beta 9
beta 8
beta 7
gamma score
gamma 4
gamma 4
gamma 1
This snippet:
import pandas as pd
all_scores = pd.read_csv("scores_for_averaging.csv", delim_whitespace=True)
experiments = all_scores['source'].unique()
for each in experiments:
exp_slice = all_scores.loc[all_scores['source'] == each]
#print each, exp_slice['score'].mean(numeric_only=True) #fails: NotImplementedError: Series.mean does not implement numeric_only.
#print each, exp_slice['score'].mean() #fails: TypeError: Could not convert score987 to numeric
failed because mean() couldn't accept numeric_only to throw out the spurious extra header line for beta, gamma, etc. I just reprocessed my input to not have the header line repeated and then it worked fine. I guess the problem is that the documentation and the code don't match?
Thanks, just curious what the expected use was. Yes, the documentation/method should be updated to match, just tricky to actually do in this case (PR welcome!).
FYI, for a conversion like this (assuming you actually do have a valid mixed type object array), the function you likely want is to_numeric
pd.to_numeric(exp_slice['score'], errors='coerce').mean()
I suppose this could be better documented, but the arg is there for consistency with DataFrame. It really doesn't do anything as a Series is a single dtyped object. Either you get all elements or None (even if mixed). We don't deeply introspect mixed (or object) things.
Thank you!
@jreback Why did you close this pull request? This is still not in documentation.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html
Most helpful comment
@jreback Why did you close this pull request? This is still not in documentation.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html