Pandas: Week Start/End Resamping: WS/WE frequencies

Created on 6 May 2014  路  17Comments  路  Source: pandas-dev/pandas

There are a number of offset aliases listed here:
http://pandas.pydata.org/pandas-docs/dev/timeseries.html#offset-aliases

I'm currently trying to resample a timeseries where the buckets are return with the beginning and/or end of the week. There are a number of Start Month/End Month frequency aliases. Am I missing something or should we include Week End/Week Start resampling aliases as well?

API Design Enhancement Frequency Resample Timeseries

Most helpful comment

+1 on this. I keep running into issues, and none of the solutions are nice.... a simple WS would make my life easier.

Take this for example. Feb in 2016 started on a MON, and the last week of march 31 was on THURS.

I want to group by week (starting on MON and ending on SUN), count the number of days in the week, but use the first day of the week as the label. That means that for the month of Feb in 2016, my expected output is:

2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1

I can do this a few different ways, but they all seem cumbersome and not very intuitive.

import pandas as pd
df = pd.DataFrame(pd.date_range('2016-02-01', '2016-02-29').rename('dt'))
df['val'] = 1

# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()

# option 2
a = df.groupby(pd.Grouper(key='dt',freq='W'))['val'].sum()
a.index = a.index + pd.DateOffset(weekday=0, days=-7)

# option 3
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=-7)

# option 4
a = df.groupby(pd.Grouper(key='dt',freq='W', label='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=1)

# ideal solution
a = df.groupby(pd.Grouper(key='dt',freq='WS'))['val'].sum()

All 17 comments

look at the next section
u can specify w-mon for example

Yes, the anchored offsets are nice. But these will change the values:

In [1]: import pandas as pd

In [2]: import datetime as dt

In [3]: daily_daterange = pd.date_range(start=dt.datetime(2013, 1, 31),
   ...:                                 end=dt.datetime(2013,2,28),
   ...:                                 freq='B')

In [4]: daily_dataset = pd.DataFrame({'value' : range(len(daily_daterange))}, index=daily_daterange)

In [5]: daily_dataset.resample('W-MON')
Out[5]: 
            value
2013-02-04      1
2013-02-11      5
2013-02-18     10
2013-02-25     15
2013-03-04     19

In [6]: daily_dataset.resample('W-SUN')
Out[6]: 
            value
2013-02-03    0.5
2013-02-10    4.0
2013-02-17    9.0
2013-02-24   14.0
2013-03-03   18.5

I want the values to stay the same and the buckets returned to be the beginning of the week.

ok

so give an example then of what u want

Thu  2013-01-31  val_1
Fri  2013-02-01  val_2
Sat  2013-02-02  val_3
Sun  2013-02-03  val_4
In [4]: daily_dataset.resample('W-THU')
Out[4]: 
            value
2013-01-31      0
2013-02-07      3
2013-02-14      8
2013-02-21     13
2013-02-28     18

In [5]: daily_dataset.resample('W-MON')
Out[5]: 
            value
2013-02-04      1
2013-02-11      5
2013-02-18     10
2013-02-25     15
2013-03-04     19

In [6]: daily_dataset.resample('W-SUN')
Out[6]: 
            value
2013-02-03    0.5
2013-02-10    4.0
2013-02-17    9.0
2013-02-24   14.0
2013-03-03   18.5

I want to resample weekly but the bucket returned should be the weeks beginning:

daily_dataset.resample('WBEGIN') 
            value
2013-02-03    0
2013-02-10    3
2013-02-17    8
2013-02-24   13
2013-03-03   18

Essentially, adding a timedelta to the index.

how is this diffferent from W-SUN ? (I mean your values are), but conceptually

Conceptually, nothing is different but the resampling of the values is significant. I believe month end and month start frequency don't change the values but it does return the a bucket at the beginning of the month or the end of month

You could do something like:

daily_dataset.resample('7D')

Then _fix_ the index with an appropriate offset. Maybe I should submit a PR and not waste too much more time :)

why wouldn't you reindex to a daily freq, then simply divide by 7 starting on the day you want?

+1 on this. I keep running into issues, and none of the solutions are nice.... a simple WS would make my life easier.

Take this for example. Feb in 2016 started on a MON, and the last week of march 31 was on THURS.

I want to group by week (starting on MON and ending on SUN), count the number of days in the week, but use the first day of the week as the label. That means that for the month of Feb in 2016, my expected output is:

2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1

I can do this a few different ways, but they all seem cumbersome and not very intuitive.

import pandas as pd
df = pd.DataFrame(pd.date_range('2016-02-01', '2016-02-29').rename('dt'))
df['val'] = 1

# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()

# option 2
a = df.groupby(pd.Grouper(key='dt',freq='W'))['val'].sum()
a.index = a.index + pd.DateOffset(weekday=0, days=-7)

# option 3
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=-7)

# option 4
a = df.groupby(pd.Grouper(key='dt',freq='W', label='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=1)

# ideal solution
a = df.groupby(pd.Grouper(key='dt',freq='WS'))['val'].sum()

FYI you can do this with .resample directly (looking at your option 1). I will comment more following

In [30]: df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()
    ...: 
Out[30]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: W-MON, Name: val, dtype: int64

In [32]: df.resample('W-MON', on='dt', closed='left', label='left').val.sum()
Out[32]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: W-MON, Name: val, dtype: int64

why is this not the solution you want?

In [65]: df.resample('7D', on='dt').val.sum()
Out[65]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: 7D, Name: val, dtype: int64

Thanks! I actually used groupby because in my real example I'm also adding other groupby cols, I just simplified for posting here.

The issue with the 7D approach is that I would have to make sure the first date available is on a Monday.

For example:

df = pd.DataFrame(pd.date_range('2016-01-29', '2016-02-29').rename('dt'))
df['val'] = 1
df.resample('7D', on='dt').val.sum()

dt
2016-01-29    7
2016-02-05    7
2016-02-12    7
2016-02-19    7
2016-02-26    4
Freq: 7D, Name: val, dtype: int64

I want a solution that will always group by weeks starting on Monday and using that Monday as the label, regardless of what dates are available (in my real data, the dates come from elsewhere).

@aiguofer

why is this not reasonable? (or the .resample version)

# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()

you are esentially saying that the defaults for anchored are not what you want.

In my mind, that's essentialy saying, "end the week on Monday, but close the inclusion on the left (so that the monday is included on the 'following' week), and use the previous week's label"

Additionally, since there is a MS and QS, it seems odd to have to do it differently with weeks.

In my app, I have a dropdown choice for how to group and I have to add a check for W to then add the other parameters, it'd be much simpler to do WS.

ahh, I c, you are angling for a WS and WE frequencies. IOW, like W-SUN, but have a tiny bit of special handling on inclusions in resampling.

this would not be very hard to implement actually. Its much more work to properly test this :>

ok if you would like to put up a PR for WS/WE (I actually think these should just work in resampling directly). that would be great!

Exactly. And I think it could even work with a specific day anchoring.

So for example, if someone wanted to bucket on weeks starting on SUN using the first day as label, they could do WS-SUN. If they wanted the end then they'd do W-SAT.

Thanks...it helped a lot :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hiiwave picture hiiwave  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

ericdf picture ericdf  路  3Comments

venuktan picture venuktan  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments