Describe the bug
When casting a TIMESTAMP_MICROSECONDS value to TIMESTAMP_SECONDS I have to add one second to the resulting value if the TIMESTAMP_MICROSECONDS value is negative (before Jan 1, 1970)
Steps/Code to reproduce bug
I reproduced this using spark. I don't know of another way to make it happen. If you need me to I can try and make a c++ unit test for this.
Expected behavior
Spark and cudf produce the same result.
This sounds related to #3664
This sounds related to #3664
Nope. I reverted the change made for #3664 That causes casting to days to fail in spark, but it had no impact on the MICROS to SECONDS issue. The patch is simple enough I think I did it right.
A repro is still greatly appreciated.
C++ repro
TEST_F(CastTimestamps, PriorTo1970usTos)
{
using namespace cudf::test;
// 1965-10-26 14:01:12.762000000 GMT
auto timestamps_us = cudf::test::fixed_width_column_wrapper<cudf::timestamp_us>({-131968727238000});
auto timestamps_s = cudf::experimental::cast(timestamps_us, cudf::data_type{cudf::TIMESTAMP_SECONDS});
cudf::test::print(timestamps_s->view());
}
expected:1965-10-26T14:01:12Z
output:1965-10-27T00:00:00Z
@revans2 please confirm.
@rgsl888prabhu this seems more a bug in test::print (which relies on cudf::strings::from_timestamps) than the cast. e.g. adding these two lines seem to pass the test:
auto expected = fixed_width_column_wrapper<cudf::timestamp_s>({-131968728});
expect_columns_equal(timestamps_s->view(), expected);
I'll try to get a repro case shortly
Here is the test case that I came up with.
diff --git a/cpp/tests/unary/unary_ops_test.cu b/cpp/tests/unary/unary_ops_test.cu
index a62f46197..2d1ec83af 100644
--- a/cpp/tests/unary/unary_ops_test.cu
+++ b/cpp/tests/unary/unary_ops_test.cu
@@ -594,6 +594,19 @@ void validate_cast_result(cudf::column_view expected, cudf::column_view actual)
struct CastTimestampsSimple : public cudf::test::BaseFixture {
};
+TEST_F(CastTimestampsSimple, PriorTo1970usTos)
+{
+ using namespace cudf::test;
+
+ // 723-3-21 8:29:37.781000 GMT
+ auto timestamps_us = make_column<cudf::timestamp_us>(std::vector<int64_t>{-39344693986219000});
+ // Spark produces -39344693986 cudf produces -39344693987
+ auto expected_s = make_column<cudf::timestamp_s>(std::vector<int64_t>{-39344693986});
+ auto timestamps_s = cudf::experimental::cast(timestamps_us, cudf::data_type{cudf::TIMESTAMP_SECONDS});
+
+ validate_cast_result<cudf::timestamp_s, cudf::timestamp_s>(expected_s, *timestamps_s);
+}
+
TEST_F(CastTimestampsSimple, IsIdempotent)
{
using namespace cudf::test;
This does pass by reverting #3670.
The logic is: if it's negative, we do need to floor it.
timestamp_s(-1) == 1969/12/31 23:59:59
So the day is still 31 December. It won't be the next day until one more second.
timestamp_us(-1) == 1969/12/31 23:59:59.999999
It won't be 1970/1/1 for 0.000001 more microseconds. So at the seconds resolution, the time is still 1969/12/31 23:59:59 == timestamp_s(-1)
Normally users want the behavior that truncating a timestamp to a lower resolution doesn't change behavior based on when the timestamp occurs relative to some arbitrary epoch position that users may not even know. I verified CPU date_trunc behavior in both PostgreSQL and Spark on a timestamp before the epoch matches what was implemented in #3670. Seems like Spark is being a bit self-inconsistent here based on whether we're using date_trunc or unix_timestamp.
Maybe I was arguing for the wrong level of compatibility in #3670. SQL wants the current behavior, but apparently C++ and Java are truncating to zero rather than to negative infinity, even for time values, which produces these weird results where it truncates timestamps after the epoch but ceilings timestamps before the epoch.
We just need to pick what cudf is going to do here and the Spark plugin can do some postprocessing if necessary to adjust to that behavior. Adding @jrhemstad and @kkraus14 for their opinions. I don't know the Pandas expected behavior for downcasting the resolution of timestamps before and after the epoch.
Yes because Spark is self inconsistent I am fine with following a single standard and working around it in the places we need to.
@revans2 so should I close this then?