Presto: Join triggers "Nulls fraction should be within [0, 1]" error on Hive/Parquet tables

Created on 1 May 2019  路  3Comments  路  Source: prestodb/presto

Hi, I am trying to run a simple join between two tables like so:

SELECT * FROM table_one A JOIN table_two B ON A.USER_KEY=B.USER_KEY ;

and this always result in:

Query [...] failed: Nulls fraction should be within [0, 1] or NaN, got: -1.606566858743409E-9

I am able to run a select * separately successfully on both tables (both relying on Parquet files). Couldn't find anything in existing issues or google, does anybody know where this could come from?

Thanks!

Most helpful comment

cc: @arhimondr @highker

@talnicolas: Thanks for reporting this error! What version of Presto are you running? Can you show us the output of SHOW STATS for each of the two tables you're joining?

All 3 comments

cc: @arhimondr @highker

@talnicolas: Thanks for reporting this error! What version of Presto are you running? Can you show us the output of SHOW STATS for each of the two tables you're joining?

Presto version: 0.203-T.0.3

For the stats:

presto:some_db> show stats for (select * from table_one);
         column_name         | data_size | distinct_values_count | nulls_fraction |       row_count       | low_value | high_value 
-----------------------------+-----------+-----------------------+----------------+-----------------------+-----------+------------
 col_a                       | NULL      | NULL                  | NULL           | NULL                  | NULL      | NULL       
 user_key                    | NULL      | NULL                  | NULL           | NULL                  | NULL      | NULL       
 some_other_key              | NULL      |                3543.0 |            0.0 | NULL                  | 20090119  | 20181001   
 NULL                        | NULL      | NULL                  | NULL           | 1.1015643838588234E10 | NULL      | NULL

presto:some_db> show stats for (select * from table_two);
             column_name              |       data_size       | distinct_values_count |    nulls_fraction     |  row_count   | low_value | high_value 
--------------------------------------+-----------------------+-----------------------+-----------------------+--------------+-----------+------------
 col_a                                | NULL                  |                   1.0 | -1.606566858743409E-9 | NULL         | NULL      | NULL       
 col_b                                |  1.4012488751183999E9 |                 137.0 | -1.606566858743409E-9 | NULL         | NULL      | NULL       
 col_c                                | 2.0053009232965504E10 |          6.13331008E8 | -1.606566858743409E-9 | NULL         | NULL      | NULL       
 user_key                             | NULL                  |          6.11070144E8 | -1.606566858743409E-9 | NULL         | NULL      | NULL       
 NULL                                 | NULL                  | NULL                  | NULL                  | 6.22445306E8 | NULL      | NULL

Hope this helps.

Thanks.

There were some bugs in the cost estimation code. Try updating the Presto version and see if the issue reoccurs.

Was this page helpful?
0 / 5 - 0 ratings