Pandas: df.groupby(.).plot.scatter() creates a spurious initial plot

Created on 23 Jun 2018  路  5Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=['x', 'y'])
df['cat'] = [1, 1, 1, 1]
df.groupby('cat').plot.scatter(x='x', y='y')

Problem description

The above creates 2 plots rather than one (notice there is only one category).

Tested in Jupyter 5.4.0 with %matplotlib inline.

Expected Output

A single plot.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.24.0.dev0+141.gf1ffc5fae
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.25.2
numpy: 1.14.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.2.2.post1153+gff6786446
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Groupby Visualization

All 5 comments

This seems to happen because of this line:

https://github.com/pandas-dev/pandas/blob/b2305ea6d7ca1787bce8eb646a1791a0399353e7/pandas/_libs/reduction.pyx#L515

What the Cython code seems to be doing is that it initially runs the function (in this case scatter) on the first group to check if there's no segmentation fault, and then loops through all groups (including the one that it just ran) and applies the function to them. So you always end up with one extra plot.

I think the solution is probably to save the output of first run and skip it during the loop.

If no one else is picking this up I'd be willing to look into it and see if I can come up with a PR.

@javadnoorb sounds reasonable. PRs are always welcome!

I created #21963 to specifically refer to the creation of one plot per category, so that this issue is devoted to the spurious initial plot only (what @javadnoorb is fixing).

Screen Shot 2019-08-10 at 4 09 44 PM

i somehow could not replicate this issue, and seems running code above could correctly do scattering with only a single plot. i think this issue can be closed @TomAugspurger @toobaz this is pandas 0.25

Yes, I think this is fixed. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthiasroder picture matthiasroder  路  3Comments

andreas-thomik picture andreas-thomik  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

scls19fr picture scls19fr  路  3Comments

MatzeB picture MatzeB  路  3Comments