Describe the bug
Reader and writer tests have the potential to pass erroneously if the readers/writers change in a way that maintains isomorphism, regardless of whether the written contents contain the correct contents. Unlikely, but possible.
Expected behavior
Examples:
https://github.com/rapidsai/cudf/blob/2bc8eb0edfdc2974a17b396f572d9f16f23caf3c/cpp/tests/io/csv_test.cpp#L902-L911
https://github.com/rapidsai/cudf/blob/2bc8eb0edfdc2974a17b396f572d9f16f23caf3c/cpp/tests/io/orc_test.cpp#L165-L173
https://github.com/rapidsai/cudf/blob/2bc8eb0edfdc2974a17b396f572d9f16f23caf3c/cpp/tests/io/parquet_test.cpp#L191-L200
Is this issue about the use of CUDF_TEST_EXPECT_TABLES_EQUIVALENT ?
No, this issue is about:
data == decode(encode(data)) // is tested, guarantees isomorphism
expected_file == encode(data) // is not tested, does not guarantee expected bytes are written to file
expected_data == decode(file) // is not tested, does not guarantee expected data is read from file
I don't think we have a way to test this in C++ tests.
We do have Python tests that leverage readers/writers from other libraries to validate the file content.
That said, how do resolve this issue? Migrate such tests to Python?
I don't think we have a way to test this in C++ tests.
Pretty sure that's accurate.
It's technically possible to test them in c++, but it's _very_ inconvenient because we'd need sample files and associated expected data. It'll definitely be easier to test on the python layer. It's possible we account for these cases in python tests already, but imho we should check python's tests before closing this issue.