Cudf: [BUG] Writing negative timestamps in ORC doesn't match after its read in

Created on 12 Jun 2020  路  3Comments  路  Source: rapidsai/cudf

Describe the bug
When writing negative timestamps using write_orc method and then read back in using read_orc. The output doesn't match.

Steps/Code to reproduce bug

TEST_F(OrcWriterTest, negTimestampsNano) 
{
  using namespace cudf::test;
  auto timestamps_ns = fixed_width_column_wrapper<cudf::timestamp_ns>{
    -131968727238000000,  
    -1530705634500000000,  
    -1674638741932929000,  
  }; 

  std::vector<std::unique_ptr<column>> cols;
  cols.push_back(timestamps_ns.release());
  auto expected = std::make_unique<table>(std::move(cols));
  EXPECT_EQ(1, expected->num_columns());

  auto filepath = temp_env->get_temp_filepath("OrcNegTimestamp.orc");
  cudf_io::write_orc_args out_args{cudf_io::sink_info{filepath}, expected->view()}; 

  cudf_io::write_orc(out_args);

  cudf_io::read_orc_args in_args{cudf_io::source_info{filepath}};
  in_args.use_index = false;
  auto result       = cudf_io::read_orc(in_args);

  expect_columns_equal(expected->view().column(0), result.tbl->view().column(0), true);
  expect_tables_equal(expected->view(), result.tbl->view());
}

TEST_F(OrcWriterTest, negTimestamps) 
{
  using namespace cudf::test;
  auto timestamps_us = fixed_width_column_wrapper<cudf::timestamp_us>{
    -131968727238000,  // 1965-10-26 14:01:12.762 GMT
    -1530705634500000,  // 1921-06-30 11:59:25.500 GMT
    -1674638741932929,  // 1916-12-07 2:34:18.067 GMT
  }; 

  std::vector<std::unique_ptr<column>> cols;
  cols.push_back(timestamps_us.release());
  auto expected = std::make_unique<table>(std::move(cols));
  EXPECT_EQ(1, expected->num_columns());

  auto filepath = temp_env->get_temp_filepath("OrcNegTimestamp.orc");
  cudf_io::write_orc_args out_args{cudf_io::sink_info{filepath}, expected->view()}; 

  cudf_io::write_orc(out_args);

  cudf_io::read_orc_args in_args{cudf_io::source_info{filepath}};
  in_args.use_index = false;
  auto result       = cudf_io::read_orc(in_args);

  expect_columns_equal(expected->view().column(0), result.tbl->view().column(0), true);
  expect_tables_equal(expected->view(), result.tbl->view());
}

Expected behavior
The above tests should pass.

bug cuIO

All 3 comments

@jlowe can you please verify if the CPP tests look OK?

There's a small bug in the second test. It is expecting microseconds to be returned but it needs to explicitly ask for that via something like:

  in_args.timestamp_type = cudf::data_type{cudf::type_id::TIMESTAMP_MICROSECONDS};

Once that small change is done, it's notable that these two tests pass if we simply make the test values positive. It's also interesting that when the test fails, the output shows them as equivalent? Either the delta is sub-second or something weird is going on. For example:

differences:
lhs[0] = 1965-10-27T00:00:00Z, rhs[0] = 1965-10-27T00:00:00Z
lhs[1] = 1921-07-01T00:00:00Z, rhs[1] = 1921-07-01T00:00:00Z
lhs[2] = 1916-12-08T00:00:00Z, rhs[2] = 1916-12-08T00:00:00Z

Don't rely on the timestamp print output. Printing negative timestamps involves converting them to strings for the console output. This is not supported yet (#5189 - time values are shown as 00s) and should be fixed in #5452

Was this page helpful?
0 / 5 - 0 ratings

Related issues

randerzander picture randerzander  路  3Comments

jrhemstad picture jrhemstad  路  3Comments

shwina picture shwina  路  3Comments

jmkim picture jmkim  路  3Comments

ayushdg picture ayushdg  路  3Comments