require "json"
p Array(Float64).from_json("[0.00045808]")
# [0.00045808]
p Array(Float64).from_json("[0.00045809]")
# [0.00045808999999999997]
p Array(Float64).from_json("[0.00045808999999999997]")
# [0.00589844844299458]
p "0.00045808999999999997".to_f64
# 0.00045808999999999997
I think JSON parse float uses something different to parse the floats, but it's not correct for all cases.
It's not linked to json nor crystal:
crystal eval puts 0.00045809 < Float64, small error step
crystal eval puts "0.00045809".to_f32 < Float32, no error step
It's normal for a float to not be exact. This code above shows that the JSON you get from is exported in 32-bit float format, not Float64 馃槈
I know floats are not exact but in this case its a 12x difference and even floats are not that bad.
0.0058
0.00045
Float32.new("0.00045808999999999997")
=> 0.00045809
That gets correctly rounded. To me it still appears to be an issue when parsing the JSON.
Oh yes my bad, I didn't saw the real problem
p Array(Float64).from_json("[0.00045808999999999997]") #=> [0.00589844844299458]
Yeah I think I've figured out what it is, see the following current implementation:
private def consume_float(negative, integer, digits)
append_number_char
divisor = 1_u64
char = next_char
unless '0' <= char <= '9'
unexpected_char
end
while '0' <= char <= '9'
append_number_char
integer *= 10
integer += char - '0'
divisor *= 10
char = next_char
end
float = integer.to_f64 / divisor
if char == 'e' || char == 'E'
consume_exponent(negative, float, digits)
else
@token.type = :FLOAT
# If there's a chance of overflow, we parse the raw string
if digits >= 18
@token.float_value = number_string.to_f64
else
@token.float_value = negative ? -float : float
end
number_end
end
end
The digits are actually not incremented so the overflow check fails. In case of my example the divisor indeed overflows, but it's still used to calculate the float.
private def consume_float(negative, integer, digits)
append_number_char
divisor = 1_u64
char = next_char
unless '0' <= char <= '9'
unexpected_char
end
while '0' <= char <= '9'
append_number_char
integer *= 10
integer += char - '0'
divisor *= 10
digits += 1
char = next_char
end
float = integer.to_f64 / divisor
if char == 'e' || char == 'E'
consume_exponent(negative, float, digits)
else
@token.type = :FLOAT
# If there's a chance of overflow, we parse the raw string
if digits >= 18
@token.float_value = number_string.to_f64
else
@token.float_value = negative ? -float : float
end
number_end
end
end
Adding the digits += 1 fixes the problem
@asterite do you agree this is the correct implementation?
Maybe it would be best to not have a custom float parser but employ String#to_f32 (which utilizes LibC.strtof) as it is already for Float64 values.
The downside of that would be having to allocate an extra string.
Yes, however that's only an extra for the string based lexer. The IO based lexer already concatenates the characters to a memory IO for Float64 parsing.
perhaps we can just stop incrementing the integer and divisor after 18 digits?
(45808999999999 / 100000000000000000.0)
=> 0.00045808999999999
I suggest we copy an algorithm from a known-fast json implementation.
Hmm, it's very disturbing:
json = JSON.parse(%Q[{"amount":0.00012145842417411307}])
# => {"amount" => 0.0015639203059625239}
Please fix it.
Basically, my finances-sensitive apps depends on that. Damn.
Just relax. I鈥檝e temporarily fixed it in my code base by monkey patching the method myself. You can copy the example I gave until it鈥檚 been fixed the correct way.
@vladfaust Probably you shouldn't rely on floating-points and JSON serialization for finance-sensitive data anyway :smile:
@hugoabonizio what are alternatives if I'm working on JSON API? Separate "before-comma" and "after-comma" fields? No sarcasm.
@vladfaust Am I misunderstanding that a simple implementation of the pull parser (i.e., a converter interface if using JSON.mapping) would solve this in the meantime?
require "json"
json = %({"amount":0.00012145842417411307})
parser = JSON::PullParser.new(json)
parser.read_object do
pp parser.read_raw.to_f
end
# => 0.00012145842417411307
@vladfaust You can use either integers or BigDecimal. Here's a more in-depth answer: https://stackoverflow.com/a/3730040.
@z64 I have a JSON API which parses incoming "application/json" requests, so reading floats only is not an option.
@hugoabonizio yes, separated before and after comma values. This is an ugly workaround, though. Clients want to use floating numbers for values in my API. I vote for fixing this issue asap! :1st_place_medal:
So there's a patch, why on earth is there no PR?
@vladfaust Clients "want" to use floating numbers for values in your API. Those clients don't aren't aware of what they want, because they don't know what floating point actually is. I'll leave it to you as an exercise to find out what 0.1 + 0.2 returns. Use BigDecimals, separated before and after comma values, or else your finances-sensitive "apps" are broken. You can have the UI accept floating point formatted BigDecimals so that the users don't see a difference.
Another commonly used solution when working with currency related data is to convert to cents and make them integers. Thats what I normally do. This time I stumbled on the issue when using an external API from a bitcoin exchange. The smallest value is not a cent but a satoshi. Converting bitcoin to satoshi and use integers would be a good option as well.
Surely if BigDecimal implemented from_json by parsing the raw value, JSON decimal values could theoretically be used without ever being converted to float. It's probably quite error-prone on the client's side though.
Most helpful comment
I suggest we copy an algorithm from a known-fast json implementation.