Lubridate: Add function season()

Created on 13 Nov 2017  路  7Comments  路  Source: tidyverse/lubridate

What do you think of adding a function season? Something like

season <- function(timedate) {
  m <- month(timedate)
  s <- sapply(m, function(x) switch(x,
                                    "DJF", "DJF",
                                    "MAM", "MAM", "MAM",
                                    "JJA", "JJA", "JJA",
                                    "SON", "SON", "SON",
                                    "DJF")
  )
  factor(s, levels = c("MAM", "JJA", "SON", "DJF"))
}

Because the correct term depends on the hemisphere, I use the abbreviations of the months (which is common in meteorology). Obviously, I added the factor conversion for correct ordering when plotting.

The only issue I see is that there are several different defintions of season, however, you have chosen the meteorological one in floor_date so it's easy to be consistent here.

Most helpful comment

Just my 2 cents

I have the feeling that being consistent by using numbers to describe seasons is more confusing than using the month initials.

I agree in that the definition of "seasons" varies widely between use cases. However, this is so commonly useful (feature extraction to control for seasonality) that I think it should be a feature. There could be a nice default and then an optional parameter to re-define seasons.

All 7 comments

I would prefer common names for the labels. Those DJF, MAM ... look pretty cryptic.

What do you thing of an additional argument to choose a convention? Like

season <- function(timedate, convention = "northern_hemisphere") {
  s_terms <- switch(convention, 
                    "northern_hemisphere" = c("spring", "summer", "autumn", "winter"),
                    "southern_hemisphere" = c("autumn", "winter", "spring", "summer"),
                    "month_initials"      = c("MAM",    "JJA",    "SON",    "DJF"),
                    stop("Wrong value of convention")
  )

  m <- month(timedate)
  s <- sapply(m, 
              function(x) switch(x,
                                 s_terms[4], s_terms[4],
                                 s_terms[1], s_terms[1], s_terms[1],
                                 s_terms[2], s_terms[2], s_terms[2],
                                 s_terms[3], s_terms[3], s_terms[3],
                                 s_terms[4]
              )
  )
  factor(s, levels = s_terms)
}

This way, we can have the standard language terms, which depend on the hemisphere, though, as well as the hemisphere-independent month name initials.

Yes, that could be. I think leaving month initials as the default is the right approach actually. Making it integers by default (as in day and month, quarter) would be even more consistent, but I what would be the convention? Winter =1, Fall = 4?

BTW, that sapply+switch is very inefficient. It could be replaced with a direct subsetting.

I have the feeling that being consistent by using numbers to describe seasons is more confusing than using the month initials. For example, I would choose spring=1, winter=4. Using the month initials strings is also not self-explaining but when you understand the system _once_, you can easily remember the system and it is always consistent. At least in meteorology and climatology, the system is widely used.

Here the updated function. Did you have something like this in mind?

season <- function(timedate, convention = "month_initials") {
  s_terms <- switch(convention, 
                    "northern_hemisphere" = c("spring", "summer", "autumn", "winter"),
                    "southern_hemisphere" = c("autumn", "winter", "spring", "summer"),
                    "month_initials"      = c("MAM",    "JJA",    "SON",    "DJF"),
                    stop("Wrong value of convention")
  )

  m <- month(timedate)
  s <- factor(character(length(m)), levels = s_terms)
  s[m %in% c( 3,  4,  5)] <- s_terms[1]
  s[m %in% c( 6,  7,  8)] <- s_terms[2]
  s[m %in% c( 9, 10, 11)] <- s_terms[3]
  s[m %in% c(12,  1,  2)] <- s_terms[4]
  s
}

For example, I would choose spring=1, winter=4.

Well, there is also localization which is now supported for wday and month. Month abbreviations is for English months, you know ;) I will think a bit more about this. It might be that sprint=1 is a better idea indeed.

Using numbers is a good idea because they are much easier and more efficient to use in programs. They don't depend on locales, hemispheres etc.

Did you have something like this in mind?

Nope. I meant:

months2seasons <- c(1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1)
months2seasons[month(x)]

with months2seasons being stored outside of the function for micro-efficiency.

Just my 2 cents

I have the feeling that being consistent by using numbers to describe seasons is more confusing than using the month initials.

I agree in that the definition of "seasons" varies widely between use cases. However, this is so commonly useful (feature extraction to control for seasonality) that I think it should be a feature. There could be a nice default and then an optional parameter to re-define seasons.

Due to the challenge of localisation and the varying definition of seasons, I think this outside of the scope of lubridate. It's easy to implement whatever definition of season you want with [ and factor():

x <- today() - sample(100)
#> Error in today(): could not find function "today"

seasons <- c(
  "DJF", "DJF", "MAM", "MAM", "MAM", "JJA",
  "JJA", "JJA", "SON", "SON", "SON", "DJF"
)
factor(seasons[month(x)], levels = unique(seasons))
#> Error in month(x): could not find function "month"

Created on 2019-11-18 by the reprex package (v0.3.0)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

skrackow picture skrackow  路  14Comments

Demetrio92 picture Demetrio92  路  26Comments

MichaelJW picture MichaelJW  路  7Comments

earuniitm picture earuniitm  路  5Comments

arnonerba picture arnonerba  路  9Comments