Cudf: [FEA] Add duration column types

Created on 25 May 2020  路  6Comments  路  Source: rapidsai/cudf

Is your feature request related to a problem? Please describe.
To support arithmetic on timestamps, we need duration types.
We need to support difference between every resolution of timestamp i.e. days, seconds, milliseconds, microseconds and nanoseconds.

Describe the solution you'd like
It should be based on std::chrono::duration in libcu++.

See also #4074

  • [x] add duration types (#5359 #5394)
  • [x] factories (#5359 #5394)
  • [x] binary and unary operators support (#5394)
  • [x] aggregation support

    • [x] mean, sum, product - add for duration types, disable for timestamps (#5319)

    • [x] device operators (sum, product)

    • [x] column reduction,

    • [x] rolling, grouped_rolling (#5419)

    • [x] groupby-aggregations (#5789)

  • [x] duration string support (parsing, to_string) (#5625)
  • [x] unary casts (#5394)
  • [x] cuIO area (#5903 #6076 #6281)
  • [x] typed tests to include the new types (#5419)
  • [x] prevent construction of timestamps with integers and force construction of timestamps with duration (#5735)
  • [x] support spark sql add_months api (#5931)
feature request libcudf

Most helpful comment

parquet support for duration (https://github.com/rapidsai/cudf/pull/5903) is already done. ORC does not support duration (INTERVAL) type.
duration support for JSON and CSV only is pending in cuIO.

All 6 comments

It's possible I missed a discussion regarding this.

But, just to clarify, does this need one duration type or multiple? (duration_s, duration_ms, duration_ms, duration_ns). Asking because duration_ns would be required to support differences involving timestamp_ns but wouldn't be able to represent the full range of differences between two timestamp_s.

does this need one duration type or multiple?

Unfortunately, multiple. I believe we need a duration type for every resolution of timestamp.

does this need one duration type or multiple?

Unfortunately, multiple. I believe we need a duration type for every resolution of timestamp.

Ok, I'll reword the issue

Bumping this to 0.16 as the cuIO piece will not land in 0.15.

parquet support for duration (https://github.com/rapidsai/cudf/pull/5903) is already done. ORC does not support duration (INTERVAL) type.
duration support for JSON and CSV only is pending in cuIO.

All features for duration type are complete.

Was this page helpful?
0 / 5 - 0 ratings