Pandas: Better error message when using DataFrame.hist() without numerical columns

Created on 26 Jun 2015 · 12Comments · Source: pandas-dev/pandas

pandas version: 0.16.2
matplotlib version 1.4.3 (and produced different error message on older version)

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10,2))
df_o = df.astype(np.object)
df_o.hist()

ValueError                                Traceback (most recent call last)
<ipython-input-1-26253737011d> in <module>()
      4 df = pd.DataFrame(np.random.rand(10,2))
      5 df_o = df.astype(np.object)
----> 6 df_o.hist()

/usr/local/lib/python2.7/dist-packages/pandas/tools/plotting.pyc in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
   2764     fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
   2765                           sharex=sharex, sharey=sharey, figsize=figsize,
-> 2766                           layout=layout)
   2767     _axes = _flatten(axes)
   2768 

/usr/local/lib/python2.7/dist-packages/pandas/tools/plotting.pyc in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, **fig_kw)
   3244 
   3245     # Create first subplot separately, so we can share it if requested
-> 3246     ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
   3247 
   3248     if sharex:

/usr/local/lib/python2.7/dist-packages/matplotlib/figure.pyc in add_subplot(self, *args, **kwargs)
    962                     self._axstack.remove(ax)
    963 
--> 964             a = subplot_class_factory(projection_class)(self, *args, **kwargs)
    965 
    966         self._axstack.add(key, a)

/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_subplots.pyc in __init__(self, fig, *args, **kwargs)
     62                     raise ValueError(
     63                         "num must be 0 <= num <= {maxn}, not {num}".format(
---> 64                             maxn=rows*cols, num=num))
     65                 if num == 0:
     66                     warnings.warn("The use of 0 (which ends up being the "

ValueError: num must be 0 <= num <= 0, not 1

Dtypes Error Reporting Visualization good first issue

Source

goretkin

Most helpful comment

An error message like hist method requires numerical columns, nothing to plot or anything clearer would be useful.

datapythonista on 6 Jul 2018

👍7

All 12 comments

We (only?) plot numeric types (df._get_numeric_data IIRC).

What's you use here that you're getting object dtypes? Integer NaNs? You'll typically want to avoid object dtypes since they're much slower for numeric operations.

TomAugspurger on 26 Jun 2015

I agree with what you're saying, and I think a suitable fix would include a more explicit check for the dtypes and show an error. As it is, I spent some time trying to figure out what the issue was, especially because the string representation of the DataFrame doesn't show the dtypes.

I am using floats and integers, but by accident, when I constructed the DataFrame, all entries were NaN objects, and then I populated the DataFrame in a loop.

goretkin on 26 Jun 2015

👍5

An error message like hist method requires numerical columns, nothing to plot or anything clearer would be useful.

datapythonista on 6 Jul 2018

👍7

Any progress on this? It took me quite a few hours to realize it was a dtype error.

zhanwenchen on 21 Aug 2018

Still open. Interested in submitting a PR to fix it?

On Tue, Aug 21, 2018 at 10:06 AM zhanwenchen notifications@github.com
wrote:

Any progress on this? It took me quite a few hours to realize it was a
dtype error.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/10444#issuecomment-414707813,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIrdc3hje4W1xv92Y3_uxXpg7nC0Rks5uTCHbgaJpZM4FMPtc
.

TomAugspurger on 21 Aug 2018

@TomAugspurger i get a nonsense error here from a dataframe with 2 columns, but not when histogramming them one at a time as a series:

ValueError: num must be 1 <= num <= 0, not 1

traceback:

ValueError                                Traceback (most recent call last)
<ipython-input-41-7cfbfac10616> in <module>()
      1 dfi['ml_data'][
----> 2     ['duration_', 'duration__']].dropna().astype('timedelta64[D]').astype(float).hist(bins=20)

/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
   2176     fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
   2177                           sharex=sharex, sharey=sharey, figsize=figsize,
-> 2178                           layout=layout)
   2179     _axes = _flatten(axes)
   2180 

/usr/local/lib/python3.6/dist-packages/pandas/plotting/_tools.py in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, **fig_kw)
    235 
    236     # Create first subplot separately, so we can share it if requested
--> 237     ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
    238 
    239     if sharex:

/usr/local/lib/python3.6/dist-packages/matplotlib/figure.py in add_subplot(self, *args, **kwargs)
   1072                     self._axstack.remove(ax)
   1073 
-> 1074             a = subplot_class_factory(projection_class)(self, *args, **kwargs)
   1075 
   1076         self._axstack.add(key, a)

/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_subplots.py in __init__(self, fig, *args, **kwargs)
     62                     raise ValueError(
     63                         "num must be 1 <= num <= {maxn}, not {num}".format(
---> 64                             maxn=rows*cols, num=num))
     65                 self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
     66                 # num - 1 for converting from MATLAB to python indexing

ValueError: num must be 1 <= num <= 0, not 1

colab notebook:

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.33+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
numpy: 1.14.5
matplotlib: 2.1.2

denfromufa on 24 Sep 2018

@denfromufa do you want to fix it?

Here you have the general documentation on how to do it: https://pandas.pydata.org/pandas-docs/stable/contributing.html

The fix should be easy, just checking the type and raising an exception with a useful message.

datapythonista on 25 Sep 2018

@datapythonista i don't agree on this solution - it should just work. why raise an exception when histogram works for each series? however i did not have a chance to debug this yet.

denfromufa on 25 Sep 2018

if you make it work even better, feel free to send a PR for it.

datapythonista on 25 Sep 2018

I am struggling with a similar issue. I am building data frames from various sources, each with 2690 rows; from one source I can get the histogram to work, from the other I get the error reported above (and below). My plan was to convert both data frames to a dict using df.to_dict() and back to a data frame using pd.DataFrame.from_dict(), to explore more and reproduce the issue here in case someone could point out what the problem was. But when I do that, they both plot just fine. E.g.

dic=df1.to_dict()
df1=pd.DataFrame.from_dict(dic)

When I try to examine the original data frames for NaNs, etc, I cannot tell a difference. Any idea why converting my dfs to dict and back solves this issue?

df0:

print(df0.sort_values('rate',ascending=False).head(5))
print(pd.isnull(df).sum())

df1:

print(df1.sort_values('rate',ascending=False).head(5))
print(pd.isnull(df).sum())

ValueError Traceback (most recent call last)
in
9 print(df.loc[:,['obnme','rate','total_et','acres']].sort_values('obnme',ascending=False).to_dict())
10 print(pd.isnull(df).sum())
---> 11 df.hist('rate',bins=np.arange(df['rate'].min(),df['rate'].max(),0.25))
12 plt.title('ET rate for all WR')
13 plt.xlabel('ET rate (ft/yr)')

C:conda3x64envsp3x64libsite-packagespandasplotting_core.py in hist_frame(data, column, by, grid, xlabelsize, xrot, ylabelsize, yrot, ax, sharex, sharey, figsize, layout, bins, **kwds)
2406 fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
2407 sharex=sharex, sharey=sharey, figsize=figsize,
-> 2408 layout=layout)
2409 _axes = _flatten(axes)
2410

C:conda3x64envsp3x64libsite-packagespandasplotting_tools.py in _subplots(naxes, sharex, sharey, squeeze, subplot_kw, ax, layout, layout_type, *fig_kw)
236
237 # Create first subplot separately, so we can share it if requested
--> 238 ax0 = fig.add_subplot(nrows, ncols, 1, *subplot_kw)
239
240 if sharex:

C:conda3x64envsp3x64libsite-packagesmatplotlibfigure.py in add_subplot(self, args, *kwargs)
1237 self._axstack.remove(ax)
1238
-> 1239 a = subplot_class_factory(projection_class)(self, args, *kwargs)
1240 self._axstack.add(key, a)
1241 self.sca(a)

C:conda3x64envsp3x64libsite-packagesmatplotlibaxes_subplots.py in __init__(self, fig, args, *kwargs)
65 raise ValueError(
66 ("num must be 1 <= num <= {maxn}, not {num}"
---> 67 ).format(maxn=rows*cols, num=num))
68 self._subplotspec = GridSpec(
69 rows, cols, figure=self.figure)[int(num) - 1]

ValueError: num must be 1 <= num <= 0, not 1