Presto: Proper truncation missing in casts to varchar(x)

Created on 27 Mar 2019  路  13Comments  路  Source: prestosql/presto

presto:default> select cast(bigint '12345678' as varchar(2));
 _col0
-------
 12

but ...

presto:default> select cast(bigint '12345678' as varchar(2)) = '12';
 _col0
-------
 false

This is because https://github.com/prestosql/presto/blob/c74b775e103a7a239aafef1b7ca15fe6370495fe/presto-main/src/main/java/io/prestosql/type/BigintOperators.java#L270-L276
should consider x (@LiteralParameter("x") long x) and truncate the result to make sure no more than x characters are returned.

Same probably applies to casts from other types to varchar.

bug correctness

Most helpful comment

Is someone already working on this? I would like to give it a try...

All 13 comments

Is someone already working on this? I would like to give it a try...

@findepi Does the work involve here involve just truncating the values in castToVarchar
return utf8Slice(String.valueOf(value));
or is this on hold awaiting clarity as discussion on prestodb/presto#7815 seem to inconclusive

@ankitdixit each case needs to be looked at separately.

  • in some cases truncation should be applied
  • in some cases exception should be raised
  • I imagine, in some cases the outcome depends on actual value being converted (truncation or exception)

If you want to work on this, I would propose to raise a separate PR for each case being fixed.
(all integer types {{tinyint,smallint,integer,bigint}} should be considered as one case; same for floating point: {{real,double}})

@findepi @dain @sopel39 trying to summarize from comments on https://github.com/prestodb/presto/issues/7815/, please let me know if this makes sense.

Approach could be to add a 2 new session propertes

  • cast_legacy_behavior with default value true.
  • cast_notation_type with values simple and scientific
    We cast on the type based on the value of cast notation

Behaviour:

  • Integer -> varchar

    • select cast(cast(bigint '12' as varchar(2))



      • both flag values: '12'



    • select cast(bigint '123' as varchar(2))



      • cast_legacy_behavior = true :'12'


      • cast_legacy_behavior = false : throw CastException



  • Decimal->varchar:

    • select cast(cast(1.0 as decimal(2,1)) as varchar(3)))



      • both flag values: '1.0'



    • select cast(cast(1.0 as decimal(2,1)) as varchar(2))



      • both flag values: '1.0'



    • select cast(cast(1.1 as decimal(2,1)) as varchar(2))



      • cast_legacy_behavior = true :'1.0'


      • cast_legacy_behavior = false : throw CastException



  • real/double: Find whether conventional or scientific notation is smaller and use that

    • select cast(cast(1.0 as double) as varchar(2))



      • cast_legacy_behavior = true : '1.'


      • cast_legacy_behavior = false : '1'



  • real/double: Find whether conventional or scientific notation is smaller and use that

    • select cast(cast(1.1 as double) as varchar(1))



      • cast_legacy_behavior = true : '1.'


      • cast_legacy_behavior = false : throw CastException



    • select cast(cast(1.0 as double) as varchar(x)) (for all x >= 3)



      • both flag values : 1.0



    • select cast(cast(10.0 as double) as varchar(x)) (x>4)



      • cast_legacy_behavior = true : '10.0'


      • cast_legacy_behavior = false : ''1.0e1' [scientific notation, if x >4], if x=4, throw exception


      • cast_legacy_behavior = false : ''10.0' [simple]



We can also introduce another session property which can be used to chose notation type (conventional/scientific)

@ankitdixit I don't have an opinion on this, so I'll let @martint and @findepi reply.

select cast(cast(1.0 as double) as varchar(2))
cast_legacy_behavior = false : throw CastException

why would this fail?

@sopel39 from https://github.com/prestodb/presto/issues/7815#issuecomment-342013400
Additionally, since we don't use scientific notation for double/float textual representation we should probably fail when doing casting from double to varchar when any digit is truncated. Assumption is that if user wants to truncate he can simply cast to decimal and round to n-th decimal places before casting to varchar.
I thought this meant we should not not truncate to '1.' and rather throw CastException

@ankitdixit yes, but
select cast(cast(1.0 as double) as varchar(2))
could simply return
'1'

@sopel39 , I thought changing 1.0 to 1 would be losing precision in the generic sense as then 1.0 will cast as varchar(2) but 1.1 will fail.
Which also makes sense, then 1.0,2.0... will work but 1.1, 2.1... will not.

I thought changing 1.0 to 1 would be losing precision

I think you can think of any floating point as having infinite number of trailing zeros. In this context you won't lose any precision.

For reference, here's what the SQL spec says about this:

Screen Shot 2019-09-28 at 4 46 56 PM

Screen Shot 2019-09-28 at 4 48 32 PM

@ankitdixit, are you still planning to work on this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fredabood picture fredabood  路  3Comments

JamesRTaylor picture JamesRTaylor  路  5Comments

lxynov picture lxynov  路  4Comments

byungnam picture byungnam  路  4Comments

bill-warshaw picture bill-warshaw  路  4Comments