Pandas: ENH/BUG: support TimedeltaIndex plotting

Created on 2 Nov 2014  Â·  31Comments  Â·  Source: pandas-dev/pandas

This raises

s = Series(range(5),pd.timedelta_range('1day',periods=5))
s.plot()

This will show the timedeltas with a formatted (albeit string index)

s.index = s.index.format()
s.plot()

wonder if we can just register a converter somehow? like #8614

Bug Enhancement Timedelta Visualization

Most helpful comment

As a workaround, the following works with master:

plt.plot(s.index,s.values)

All 31 comments

I don't think that matplotlib already has a converter for datetime.timedelta, so just registering our Timedelta type will not be enough. Eg plt.plot(s.index.to_pytimedelta(), s) also fails.

But writing a basic converter should not be that difficult I think (and if it also works for datetime.timedelta it could maybe also be pushed upstream to matplotlib)

Timedelta is s. subclass of datetime.timedelta

I just encountered a MemoryError when attempting to plot a TimedeltaIndex!

pd.Series(range(15), pd.timedelta_range(0, freq='D', periods=15)).plot()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-113-e9a2d53dcace> in <module>()
----> 1 pd.Series(range(15), pd.timedelta_range(0, freq='H', periods=15)).plot()

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2516                  yerr=yerr, xerr=xerr,
   2517                  label=label, secondary_y=secondary_y,
-> 2518                  **kwds)
   2519 
   2520 

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
   2322         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2323 
-> 2324     plot_obj.generate()
   2325     plot_obj.draw()
   2326     return plot_obj.result

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in generate(self)
    925         self._make_legend()
    926         self._post_plot_logic()
--> 927         self._adorn_subplots()
    928 
    929     def _args_adjust(self):

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _adorn_subplots(self)
   1058                     ax.set_xticklabels(xticklabels)
   1059                 self._apply_axis_properties(ax.xaxis, rot=self.rot,
-> 1060                                             fontsize=self.fontsize)
   1061                 self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize)
   1062             elif self.orientation == 'horizontal':

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _apply_axis_properties(self, axis, rot, fontsize)
   1069 
   1070     def _apply_axis_properties(self, axis, rot=None, fontsize=None):
-> 1071         labels = axis.get_majorticklabels() + axis.get_minorticklabels()
   1072         for label in labels:
   1073             if rot is not None:

/Users/shoyer/miniconda/envs/rapid/lib/python2.7/site-packages/matplotlib/axis.pyc in get_majorticklabels(self)
   1166     def get_majorticklabels(self):
   1167         'Return a list of Text instances for the major ticklabels'
-> 1168         ticks = self.get_major_ticks()
   1169         labels1 = [tick.label1 for tick in ticks if tick.label1On]
   1170         labels2 = [tick.label2 for tick in ticks if tick.label2On]

/Users/shoyer/miniconda/envs/rapid/lib/python2.7/site-packages/matplotlib/axis.pyc in get_major_ticks(self, numticks)
   1295         'get the tick instances; grow as necessary'
   1296         if numticks is None:
-> 1297             numticks = len(self.get_major_locator()())
   1298         if len(self.majorTicks) < numticks:
   1299             # update the new tick label properties from the old

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in __call__(self)
    901             vmin, vmax = vmax, vmin
    902         if self.isdynamic:
--> 903             locs = self._get_default_locs(vmin, vmax)
    904         else:  # pragma: no cover
    905             base = self.base

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in _get_default_locs(self, vmin, vmax)
    882 
    883         if self.plot_obj.date_axis_info is None:
--> 884             self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
    885 
    886         locator = self.plot_obj.date_axis_info

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in _daily_finder(vmin, vmax, freq)
    505                     Period(ordinal=int(vmax), freq=freq))
    506     span = vmax.ordinal - vmin.ordinal + 1
--> 507     dates_ = PeriodIndex(start=vmin, end=vmax, freq=freq)
    508     # Initialize the output
    509     info = np.zeros(span,

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in __new__(cls, data, ordinal, freq, start, end, periods, copy, name, tz, **kwargs)
    637             else:
    638                 data, freq = cls._generate_range(start, end, periods,
--> 639                                                  freq, kwargs)
    640         else:
    641             ordinal, freq = cls._from_arraylike(data, freq, tz)

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in _generate_range(cls, start, end, periods, freq, fields)
    651                 raise ValueError('Can either instantiate from fields '
    652                                  'or endpoints, but not both')
--> 653             subarr, freq = _get_ordinal_range(start, end, periods, freq)
    654         elif field_count > 0:
    655             subarr, freq = _range_from_fields(freq=freq, **fields)

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in _get_ordinal_range(start, end, periods, freq)
   1317                              dtype=np.int64)
   1318     else:
-> 1319         data = np.arange(start.ordinal, end.ordinal + 1, dtype=np.int64)
   1320 
   1321     return data, freq

MemoryError: 

> /Users/shoyer/dev/pandas/pandas/tseries/period.py(1319)_get_ordinal_range()
   1318     else:
-> 1319         data = np.arange(start.ordinal, end.ordinal + 1, dtype=np.int64)
   1320 

Working on this. Doesn't look too bad.

As an update, it's a bit worse than I thought. I think it was @changhiskhan who put in a ton of heuristics for figuring out what to resolution to draw when plotting datetimes. I wasn't sure if we'd need that for timedeltas, and then I got busy with other thing. My branch is here

As a workaround, the following works with master:

plt.plot(s.index,s.values)

I don't think freq adjustment of different timedeltas is mandatory at initial version. If ok, I'll try.

Coming here from #10650, and adding a little more info just in case it can help. In my case, the bug manifests in _get_ordinal_range's end parameter having a huge ordinal. This means the following line:

data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)

allocates a gigantic array. To be specific, when doing:

pd.Series(np.random.randn(4), index=pd.timedelta_range('0:00:00', periods=4, freq='min')).plot()

the values of start.ordinal and end.ordinal are 0 and 180000000000, respectively.

@lucas-eyer is the mult parameter on that line appropriate, or is it some very small number? That might be the source of the issue...

I don't know what appropriate would be, but it's 1 (one).

Edit: pip freeze | grep pandas gives pandas==0.17.0.

I also just ran into this issue on 0.17.1. I'm not very familiar with the code, but it appears the issue is in pandas.tseries.converter.

The issue is that vmin and vmax as specified in the call to _get_default_locs in the get_major_locator function are in nanoseconds as returned from XAxis.get_view_interval:

def __call__(self):
    'Return the locations of the ticks.'
    # axis calls Locator.set_axis inside set_m<xxxx>_formatter
    vi = tuple(self.axis.get_view_interval())             # THIS IS IN NANOS
    if vi != self.plot_obj.view_interval:
        self.plot_obj.date_axis_info = None
    self.plot_obj.view_interval = vi
    vmin, vmax = vi
    if vmax < vmin:
        vmin, vmax = vmax, vmin
    if self.isdynamic:
        locs = self._get_default_locs(vmin, vmax)     # VMIN AND VMAX ARE IN NANOS
    else:  # pragma: no cover
        base = self.base
        (d, m) = divmod(vmin, base)
        vmin = (d + 1) * base
        locs = lrange(vmin, vmax + 1, base)
    return locs

But downstream in _daily_finder the freq parameter is used, which means that the system is interpreting the deltas in terms of minutes/hours/etc. rather than nanos:

def _daily_finder(vmin, vmax, freq):
    periodsperday = -1

    if freq >= FreqGroup.FR_HR:
        if freq == FreqGroup.FR_NS:
            periodsperday = 24 * 60 * 60 * 1000000000
       # ETC MAPPING periodsperday
       # .....
    # save this for later usage
    vmin_orig = vmin

    (vmin, vmax) = (Period(ordinal=int(vmin), freq=freq),    # NOW THESE ARE INTERPRETED AS MINUTES (or whatever freq)
                    Period(ordinal=int(vmax), freq=freq))

Replacing the final line above with

 (vmin, vmax) = (Period(ordinal=int(vmin), freq='N'), Period(ordinal=int(vmax), freq='N'))

appears to fix the issue.

@Liam3851 glad you have tracked this down! Any chance you're interested in making a pull request with the fix? :)

Sure, I just have to figure out how to do it lol. Longtime pandas user but kinda new on this github thingy. I'll head over to the FAQ.

Great! Give it a try and let us know if you have any questions :).

On Wed, Jan 13, 2016 at 11:48 AM, Liam3851 [email protected] wrote:

Sure, I just have to figure out how to do it lol. Longtime pandas user but
kinda new on this github thingy. I'll head over to the FAQ.

—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/issues/8711#issuecomment-171412395.

Lots of love from me too @Liam3851!

Hmm, ok still slightly more complicated. Was testing the fix and the bounds are now right and the graphs themselves look correct but the axis labels don't always work properly (sometimes they disappear)-- probably something related to how the labels are interpreted. I'm busy these next few days but I'll try to get around to making the fix sound.

Just guessing, but you could be hitting what I ran into. I can't remember how much progress if any I made on that.

@TomAugspurger Hmm.. I'll try your version to see what it does. From the diff it looks like we're taking slightly different paths. It looks like you were building a TimedeltaConverter that worked parallel to DatetimeConverter and TimeConverter; I've been trying to fix the codepath the timedeltas are currently taking (through DatetimeConverter). But it's entirely possible that getting it to look just right will require going down your path.

I’d say getting it somewhat functional is good enough for now. Hopefully you don’t have to go down that rabbit hole.

On Jan 14, 2016, at 10:29 AM, Liam3851 [email protected] wrote:

@TomAugspurger https://github.com/TomAugspurger Hmm.. I'll try your version to see what it does. From the diff it looks like we're taking slightly different paths. It looks like you were building a TimedeltaConverter that worked parallel to DatetimeConverter and TimeConverter; I've been trying to fix the codepath the timedeltas are currently taking (through DatetimeConverter). But it's entirely possible that getting it to look just right will require going down your path.

—
Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/8711#issuecomment-171691022.

Hello. I am using pandas version 0.19.0 and matplotlib version 1.5.3 with python 3 and this issue is still there: If I try to plot a Dataframe where the index is a timedelta I get Memory Error. I am working around this by calling plt.plot(df.index, df.values) but it would be nice if there was a proper fix for this...

@sam-cohan As you can see, the issue is still open, so it's indeed not yet solved. But any help is certainly welcome!

Sorry I was looking at the wrong "Closed" :)

Really wish this was fixed. I'm using datetime as a work around but stringing along 1970-01-01 to do time deltas is not fun.

@TomAugspurger does your branch with a first attempt still exist? (the link above is not working anymore)

So the issue here is that we are trying to use the Int64Index as a base class for TimedeltaIndex but we are trying to use the plotting routines for the PeriodIndex which relies on DatetimeIndex (matplotlib.date) underneath. Matplotlib.date scales the view interval to the selected frequency. Int64Index does not, so this explains the issues above.

Options:

  1. Rebase timedelta index on DatetimeIndex
  2. Write a another routine to plot time deltas like this: http://stackoverflow.com/questions/15240003/matplotlib-intelligent-axis-labels-for-timedelta. I think this is the easiest path forward, but I need help figuring out how to hook it in. With the time series mix-ins for plotting I'd have to override the plotting routines based on the type of index somewhere.

@jgoppert you should take a look at pandas/tseries/converter.py and the TimeConverter and DatetimeConverter classes. A possible way forward is to make a new TimedeltaConverter similar to those.

@jorisvandenbossche I did consider that approach, but I think having a separate matplotlib plotting function is cleaner and will require less maintenance. We also won't have to worry about ever seeing jan 1970 on the time delta plot like we do on the period index based plots now. It seems pretty robust and I have added nano-second level precision labels.

@TomAugspurger does your branch with a first attempt still exist? (the link above is not working anymore)

Seems like I deleted that branch when I was cleaning up my fork. I didn't get far beyond the TimedletaConverter, which is pretty straightforward. IIRC the difficult part was getting the dynamic relabeling to work like datetimes do (which can be a separate fix from fixing the memory error).

@TomAugspurger can you take a look at my PR. Totally different approach but seems to work for me.

Does this mean the fix for this will be in next release? If so, what is the timeline for that? Thanks in advance.

@sam-cohan yes it will be in 0.20.0

I think we are still about 1 month away from an rc.

Was this page helpful?
0 / 5 - 0 ratings