Crystal: JSON parse float is wrong

Created on 13 Nov 2017 · 21Comments · Source: crystal-lang/crystal

require "json"

p Array(Float64).from_json("[0.00045808]")
# [0.00045808]
p Array(Float64).from_json("[0.00045809]")
# [0.00045808999999999997]
p Array(Float64).from_json("[0.00045808999999999997]")
# [0.00589844844299458]
p "0.00045808999999999997".to_f64
# 0.00045808999999999997

I think JSON parse float uses something different to parse the floats, but it's not correct for all cases.

Source

benoist

Most helpful comment

I suggest we copy an algorithm from a known-fast json implementation.

RX14 on 14 Nov 2017

👍3 ❤1

All 21 comments

It's not linked to json nor crystal:

crystal eval puts 0.00045809 < Float64, small error step
crystal eval puts "0.00045809".to_f32 < Float32, no error step

It's normal for a float to not be exact. This code above shows that the JSON you get from is exported in 32-bit float format, not Float64 😉

anykeyh on 13 Nov 2017

I know floats are not exact but in this case its a 12x difference and even floats are not that bad.

0.0058
0.00045

Float32.new("0.00045808999999999997")
=> 0.00045809

That gets correctly rounded. To me it still appears to be an issue when parsing the JSON.

benoist on 13 Nov 2017

Oh yes my bad, I didn't saw the real problem

p Array(Float64).from_json("[0.00045808999999999997]") #=> [0.00589844844299458]

anykeyh on 13 Nov 2017

Yeah I think I've figured out what it is, see the following current implementation:

  private def consume_float(negative, integer, digits)
    append_number_char
    divisor = 1_u64
    char = next_char

    unless '0' <= char <= '9'
      unexpected_char
    end

    while '0' <= char <= '9'
      append_number_char
      integer *= 10
      integer += char - '0'
      divisor *= 10
      char = next_char
    end
    float = integer.to_f64 / divisor

    if char == 'e' || char == 'E'
      consume_exponent(negative, float, digits)
    else
      @token.type = :FLOAT
      # If there's a chance of overflow, we parse the raw string
      if digits >= 18
        @token.float_value = number_string.to_f64
      else
        @token.float_value = negative ? -float : float
      end
      number_end
    end
  end

The digits are actually not incremented so the overflow check fails. In case of my example the divisor indeed overflows, but it's still used to calculate the float.

  private def consume_float(negative, integer, digits)
    append_number_char
    divisor = 1_u64
    char = next_char

    unless '0' <= char <= '9'
      unexpected_char
    end

    while '0' <= char <= '9'
      append_number_char
      integer *= 10
      integer += char - '0'
      divisor *= 10
      digits += 1
      char = next_char
    end
    float = integer.to_f64 / divisor

    if char == 'e' || char == 'E'
      consume_exponent(negative, float, digits)
    else
      @token.type = :FLOAT
      # If there's a chance of overflow, we parse the raw string
      if digits >= 18
        @token.float_value = number_string.to_f64
      else
        @token.float_value = negative ? -float : float
      end
      number_end
    end
  end

Adding the digits += 1 fixes the problem

@asterite do you agree this is the correct implementation?

benoist on 13 Nov 2017

Maybe it would be best to not have a custom float parser but employ String#to_f32 (which utilizes LibC.strtof) as it is already for Float64 values.

straight-shoota on 13 Nov 2017

The downside of that would be having to allocate an extra string.

jhass on 13 Nov 2017

Yes, however that's only an extra for the string based lexer. The IO based lexer already concatenates the characters to a memory IO for Float64 parsing.

straight-shoota on 13 Nov 2017

perhaps we can just stop incrementing the integer and divisor after 18 digits?

(45808999999999 / 100000000000000000.0)
 => 0.00045808999999999

benoist on 13 Nov 2017

I suggest we copy an algorithm from a known-fast json implementation.

RX14 on 14 Nov 2017

👍3 ❤1

Hmm, it's very disturbing:

json = JSON.parse(%Q[{"amount":0.00012145842417411307}])
# => {"amount" => 0.0015639203059625239}

Please fix it.

Basically, my finances-sensitive apps depends on that. Damn.

vladfaust on 20 Nov 2017

Just relax. I’ve temporarily fixed it in my code base by monkey patching the method myself. You can copy the example I gave until it’s been fixed the correct way.

benoist on 20 Nov 2017

👍1

@vladfaust Probably you shouldn't rely on floating-points and JSON serialization for finance-sensitive data anyway :smile:

hugoabonizio on 21 Nov 2017

🎉1

@hugoabonizio what are alternatives if I'm working on JSON API? Separate "before-comma" and "after-comma" fields? No sarcasm.

vladfaust on 22 Nov 2017

@vladfaust Am I misunderstanding that a simple implementation of the pull parser (i.e., a converter interface if using JSON.mapping) would solve this in the meantime?

require "json"

json = %({"amount":0.00012145842417411307})

parser = JSON::PullParser.new(json)
parser.read_object do
  pp parser.read_raw.to_f 
end

# => 0.00012145842417411307

z64 on 22 Nov 2017

@vladfaust You can use either integers or BigDecimal. Here's a more in-depth answer: https://stackoverflow.com/a/3730040.

hugoabonizio on 22 Nov 2017

@z64 I have a JSON API which parses incoming "application/json" requests, so reading floats only is not an option.

@hugoabonizio yes, separated before and after comma values. This is an ugly workaround, though. Clients want to use floating numbers for values in my API. I vote for fixing this issue asap! :1st_place_medal:

vladfaust on 22 Nov 2017

👍1

So there's a patch, why on earth is there no PR?

RX14 on 22 Nov 2017

just added https://github.com/crystal-lang/crystal/pull/5323

benoist on 23 Nov 2017

@vladfaust Clients "want" to use floating numbers for values in your API. Those clients don't aren't aware of what they want, because they don't know what floating point actually is. I'll leave it to you as an exercise to find out what 0.1 + 0.2 returns. Use BigDecimals, separated before and after comma values, or else your finances-sensitive "apps" are broken. You can have the UI accept floating point formatted BigDecimals so that the users don't see a difference.

zatherz on 25 Nov 2017

Another commonly used solution when working with currency related data is to convert to cents and make them integers. Thats what I normally do. This time I stumbled on the issue when using an external API from a bitcoin exchange. The smallest value is not a cent but a satoshi. Converting bitcoin to satoshi and use integers would be a good option as well.

benoist on 25 Nov 2017

Surely if BigDecimal implemented from_json by parsing the raw value, JSON decimal values could theoretically be used without ever being converted to float. It's probably quite error-prone on the client's side though.

RX14 on 25 Nov 2017

Was this page helpful?

0 / 5 - 0 ratings