Crystal: JSON parse float is wrong

Created on 13 Nov 2017  路  21Comments  路  Source: crystal-lang/crystal

require "json"

p Array(Float64).from_json("[0.00045808]")
# [0.00045808]
p Array(Float64).from_json("[0.00045809]")
# [0.00045808999999999997]
p Array(Float64).from_json("[0.00045808999999999997]")
# [0.00589844844299458]
p "0.00045808999999999997".to_f64
# 0.00045808999999999997

I think JSON parse float uses something different to parse the floats, but it's not correct for all cases.

Most helpful comment

I suggest we copy an algorithm from a known-fast json implementation.

All 21 comments

It's not linked to json nor crystal:

crystal eval puts 0.00045809 < Float64, small error step
crystal eval puts "0.00045809".to_f32 < Float32, no error step

It's normal for a float to not be exact. This code above shows that the JSON you get from is exported in 32-bit float format, not Float64 馃槈

I know floats are not exact but in this case its a 12x difference and even floats are not that bad.

0.0058
0.00045

Float32.new("0.00045808999999999997")
=> 0.00045809

That gets correctly rounded. To me it still appears to be an issue when parsing the JSON.

Oh yes my bad, I didn't saw the real problem

p Array(Float64).from_json("[0.00045808999999999997]") #=> [0.00589844844299458]

Yeah I think I've figured out what it is, see the following current implementation:

  private def consume_float(negative, integer, digits)
    append_number_char
    divisor = 1_u64
    char = next_char

    unless '0' <= char <= '9'
      unexpected_char
    end

    while '0' <= char <= '9'
      append_number_char
      integer *= 10
      integer += char - '0'
      divisor *= 10
      char = next_char
    end
    float = integer.to_f64 / divisor

    if char == 'e' || char == 'E'
      consume_exponent(negative, float, digits)
    else
      @token.type = :FLOAT
      # If there's a chance of overflow, we parse the raw string
      if digits >= 18
        @token.float_value = number_string.to_f64
      else
        @token.float_value = negative ? -float : float
      end
      number_end
    end
  end

The digits are actually not incremented so the overflow check fails. In case of my example the divisor indeed overflows, but it's still used to calculate the float.

  private def consume_float(negative, integer, digits)
    append_number_char
    divisor = 1_u64
    char = next_char

    unless '0' <= char <= '9'
      unexpected_char
    end

    while '0' <= char <= '9'
      append_number_char
      integer *= 10
      integer += char - '0'
      divisor *= 10
      digits += 1
      char = next_char
    end
    float = integer.to_f64 / divisor

    if char == 'e' || char == 'E'
      consume_exponent(negative, float, digits)
    else
      @token.type = :FLOAT
      # If there's a chance of overflow, we parse the raw string
      if digits >= 18
        @token.float_value = number_string.to_f64
      else
        @token.float_value = negative ? -float : float
      end
      number_end
    end
  end

Adding the digits += 1 fixes the problem

@asterite do you agree this is the correct implementation?

Maybe it would be best to not have a custom float parser but employ String#to_f32 (which utilizes LibC.strtof) as it is already for Float64 values.

The downside of that would be having to allocate an extra string.

Yes, however that's only an extra for the string based lexer. The IO based lexer already concatenates the characters to a memory IO for Float64 parsing.

perhaps we can just stop incrementing the integer and divisor after 18 digits?

(45808999999999 / 100000000000000000.0)
 => 0.00045808999999999

I suggest we copy an algorithm from a known-fast json implementation.

Hmm, it's very disturbing:

json = JSON.parse(%Q[{"amount":0.00012145842417411307}])
# => {"amount" => 0.0015639203059625239}

Please fix it.

Basically, my finances-sensitive apps depends on that. Damn.

Just relax. I鈥檝e temporarily fixed it in my code base by monkey patching the method myself. You can copy the example I gave until it鈥檚 been fixed the correct way.

@vladfaust Probably you shouldn't rely on floating-points and JSON serialization for finance-sensitive data anyway :smile:

@hugoabonizio what are alternatives if I'm working on JSON API? Separate "before-comma" and "after-comma" fields? No sarcasm.

@vladfaust Am I misunderstanding that a simple implementation of the pull parser (i.e., a converter interface if using JSON.mapping) would solve this in the meantime?

require "json"

json = %({"amount":0.00012145842417411307})

parser = JSON::PullParser.new(json)
parser.read_object do
  pp parser.read_raw.to_f 
end

# => 0.00012145842417411307

@vladfaust You can use either integers or BigDecimal. Here's a more in-depth answer: https://stackoverflow.com/a/3730040.

@z64 I have a JSON API which parses incoming "application/json" requests, so reading floats only is not an option.

@hugoabonizio yes, separated before and after comma values. This is an ugly workaround, though. Clients want to use floating numbers for values in my API. I vote for fixing this issue asap! :1st_place_medal:

So there's a patch, why on earth is there no PR?

@vladfaust Clients "want" to use floating numbers for values in your API. Those clients don't aren't aware of what they want, because they don't know what floating point actually is. I'll leave it to you as an exercise to find out what 0.1 + 0.2 returns. Use BigDecimals, separated before and after comma values, or else your finances-sensitive "apps" are broken. You can have the UI accept floating point formatted BigDecimals so that the users don't see a difference.

Another commonly used solution when working with currency related data is to convert to cents and make them integers. Thats what I normally do. This time I stumbled on the issue when using an external API from a bitcoin exchange. The smallest value is not a cent but a satoshi. Converting bitcoin to satoshi and use integers would be a good option as well.

Surely if BigDecimal implemented from_json by parsing the raw value, JSON decimal values could theoretically be used without ever being converted to float. It's probably quite error-prone on the client's side though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oprypin picture oprypin  路  3Comments

lgphp picture lgphp  路  3Comments

will picture will  路  3Comments

Papierkorb picture Papierkorb  路  3Comments

lbguilherme picture lbguilherme  路  3Comments