In our test case, validator V
has voting power 1000000000
, some A
delegate 10
to V
and vote independently,
V
vote yes and A
vote no, current code will calculate tally of no as:
(10 / 1000000010) * 1000000010
which with current decimal settings results in 9.999999999999999000
, which is 9
when convert to tokens.
I suggest we calculate as:
(10 * 1000000010) / 1000000010
which should have better result under limited precision.
/cc @sunnya97
Sounds reasonable
I wonder whether this is reproducible with Launchpad too? CC'ing @ethanfrey @njmurarka
In general (A * C) / B
gives better precision than (A / B) * C
, as it does not round off least-significant digits.
However, there is more potential for overflow with the first case. If these are u64
we would have to be very careful about this, and likely use a GCD algorithm. I believe in this case we are using sdk.Dec (which wraps big.Int) so no overflow is possible.
I feel I saw a very similar issue recently. In any case .MultInt(x).Quo(y)
looks better than .Quo(y).MultInt(x)
and I see no problem with backporting to launchpad
So this is actually a bigger problem than it appears to be.
I was thinking about your reply @ethanfrey and I was thinking well if order of operations matters so much, we should delay rounding as long as possible which something like big.Rat
would get us. Needing to think so much about order of operations so much is kind of hard to get right and it should be easy to just get the answer 10
. Then I discovered that the SDK used to use big.Rat
, but it was replaced with a custom "big decimal" type sdk.Dec
(#1807).
So I figured I'd do some tests. As expected big.Rat
gets the correct answer independent of order of operations. What about other "big decimal" libraries? I tried https://github.com/cockroachdb/apd which I'd previously suggested here #7113. It properly implements General Decimal Arithmetic/IEEE 754-2008. Sure enough, apd produces the correct result 10
using decimal128, decimal64, and even decimal32 settings! How is it possible that decimal32
is more accurate than sdk.Dec
which supposedly has 18 decimal places?
Even worse,float32
gets the correct result with either order of operations. Run this example and see: https://play.golang.org/p/f9cF7ItDtxe. So sdk.Dec
is basically less precise than 32-bit floating point...
Looks like we have a bigger problem on our hands.
I would say that if we are building "the financial infrastructure of the future", getting arithmetic correct is non-negotiable.
I will open up a follow-up issue.
Note @odeke-em this sort of stuff is likely good material for fuzzing.
I think decimal32
would have precision rounding errors with other numbers, like (123 / 1234567) * 1234567
. Or even better (1/12345678901234567890) * 12345678901234567890
It would be good to check these with many input values.
I think there are two approaches, and both may be useful.
1) use a general purpose (but possibly slow) library to always get the correct issue regardless of ordering and with no underflow or overflow issues - big.Rat
is possibly a good approach here
2) use a limited precision library that gets correct and fast results over a reasonable range of values and returns errors on any under/overflow - this looks like the cockroachdb decimalXXX
cases
The argument to move from big.Rat
to big.Dec
was for speed, but it was at a cost of correctness. And should be significantly slower than a decimal64/decimal128 type (arbitrary precision ops are always much slower than fixed precision). I would support reverting big.Dec
to use big.Rat
under the hood (without changing APIs) and test the cockroach version. If it is a good speed, that should be recommended for any operations that really need speed (A * B / C is very unlikely to be a performance bottleneck anyway that even reads one item from disk)
I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)
So would I. I also think we should keep the current API around at least until 1.0.0
is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.
Happy to do some work on this :+1:
Did some actual benchmarks here: https://github.com/cosmos/cosmos-sdk/pull/7772
Obviously this is just one case to consider of many... but some surprising results:
sdk.Dec
can never get (x/y)*y
quite right with an error similar to a float64
or decimal64
and always gets (x*y)/y
right 🤷 big.Rat
always gets both correct and is actually marginally faster than sdk.Dec
big.Float
configured as a float128
always gets this right and is actually quite fast... there must be some hardware optimization happening there 🤔 I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)
So would I. I also think we should keep the current API around at least until
1.0.0
is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.
Ideally we'd have proper numerical computing and an interface we want to stick with for 1.0
.
Happy to do some work on this 👍
Awesome 👍
Created a new issue for addressing all this: https://github.com/cosmos/cosmos-sdk/issues/7773
@alessio If I recall, weren't we having some issues related to power back in the early part of this year? I think they had to do with the fact that if you had a small # of tokens associated with the stake, your power lost so much accuracy it no longer made alot of sense. The solution was in general to use small units and therefore, all magnitudes were larger.
@alessio Is there something I can check to help? I am not clear on what that is.
Regardless of precision, commutativity is an important fundamental property. I am going to evaluate the issues as well as usages and potential pitfalls. In the future though if performance is a problem but correctness isn't, we can wiggle into high performance, but we shouldn't trade off on correctness at all.
Most helpful comment
In general
(A * C) / B
gives better precision than(A / B) * C
, as it does not round off least-significant digits.However, there is more potential for overflow with the first case. If these are
u64
we would have to be very careful about this, and likely use a GCD algorithm. I believe in this case we are using sdk.Dec (which wraps big.Int) so no overflow is possible.I feel I saw a very similar issue recently. In any case
.MultInt(x).Quo(y)
looks better than.Quo(y).MultInt(x)
and I see no problem with backporting to launchpad