Pandas: warning in bar plot with multiple columns

Created on 13 Dec 2017  路  4Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
/Users/adefusco/Applications/miniconda3/envs/projects-data-analysis/lib/python3.6/site-packages/pandas/plotting/_core.py:1714: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  series.name = label
<matplotlib.axes._subplots.AxesSubplot object at 0x10dec46d8>

Problem description

The warning message about series.name = label in this case is because it's trying to do the following. I am not using the label keyword argument.

y = ['a','b'] # <=== from inputs to function
label = kwds['label'] if 'label' in kwds else y
series = data[y].copy()  # Don't modify
series.name = label

and since series is actually a Pandas now thinks that a new column is being created with the values ['a','b'].

Expected Output

The warning message does not occur if the Index is used as the x-axis

df[['b','a']].plot.bar(stacked=True)

Proposed solution

In pandas/plotting/_core.py would the following be reasonable?

if y is not None:
    if is_scalar(y):
        if is_integer(y) and not data.columns.holds_integer():
            y = data.columns[y]

        label = kwds['label'] if 'label' in kwds else y
        series = data[y].copy()  # Don't modify
        series.name = label

        data = series

    elif is_dict_like(y):
        data = data[list(y.values())].copy()
        data = data.rename(columns=y)

    elif is_list_like(y):
        data = data[y].copy()

    elif not isinstance(data[y], ABCSeries):
        raise ValueError("y must be a label or position")

... continue with plot

this provides for the following options using the DataFrame defined above.

df.plot.bar(x='i', y=1)              # <-- plot the second column
df.plot.bar(x='i', y=['b,'a'])       # <-- plot multiple columns
df.plot.bar(x='i', y=dict(y=a, z=b)) # <-- plot multiple columns and with custom labels

After I teach myself how to build Pandas I'll test this change.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

Most helpful comment

Yeah, that was the point of #18695, to raise when the user passes invalid arguments. x and y are supposed to be single labels or positions. Passing x and y sends the code down a path that's expecting all the other kwargs to deal with single values, not multiple.

If you want to plot multiple, I'd recommend df.set_index('i')[['b', 'a']].plot.bar(stacked=True).

All 4 comments

Can you check if this was fixed by https://github.com/pandas-dev/pandas/pull/18695?

Actually, #18695 breaks my plot entirely by only allowing y to be a scalar. I'll submit a PR along with tests soon.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.22.0.dev0+356.g9705a4806'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2701, in bar
    return self(kind='bar', x=x, y=y, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 2666, in __call__
    sort_columns=sort_columns, **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1905, in plot_frame
    **kwds)
  File "/Users/adefusco/Development/Continuum/pandas/pandas/plotting/_core.py", line 1716, in _plot
    raise ValueError("y must be a label or position")
ValueError: y must be a label or position

Yeah, that was the point of #18695, to raise when the user passes invalid arguments. x and y are supposed to be single labels or positions. Passing x and y sends the code down a path that's expecting all the other kwargs to deal with single values, not multiple.

If you want to plot multiple, I'd recommend df.set_index('i')[['b', 'a']].plot.bar(stacked=True).

Ok, thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthiasroder picture matthiasroder  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

venuktan picture venuktan  路  3Comments

scls19fr picture scls19fr  路  3Comments

andreas-thomik picture andreas-thomik  路  3Comments