Cosmos-sdk: tally calculation has precision issue

Created on 23 Oct 2020 · 12Comments · Source: cosmos/cosmos-sdk

Summary of Bug

In our test case, validator V has voting power 1000000000, some A delegate 10 to V and vote independently,
V vote yes and A vote no, current code will calculate tally of no as:

(10 / 1000000010) * 1000000010

which with current decimal settings results in 9.999999999999999000, which is 9 when convert to tokens.
I suggest we calculate as:

(10 * 1000000010) / 1000000010

which should have better result under limited precision.

Version

Steps to Reproduce

For Admin Use

[ ] Not duplicate issue
[ ] Appropriate labels applied
[ ] Appropriate contributors tagged
[x] Contributor assigned/self-assigned

bug

Source

yihuang

👍1

Most helpful comment

In general (A * C) / B gives better precision than (A / B) * C, as it does not round off least-significant digits.

However, there is more potential for overflow with the first case. If these are u64 we would have to be very careful about this, and likely use a GCD algorithm. I believe in this case we are using sdk.Dec (which wraps big.Int) so no overflow is possible.

I feel I saw a very similar issue recently. In any case .MultInt(x).Quo(y) looks better than .Quo(y).MultInt(x) and I see no problem with backporting to launchpad

ethanfrey on 31 Oct 2020

👍5

All 12 comments

/cc @sunnya97

alexanderbez on 23 Oct 2020

Sounds reasonable

sunnya97 on 23 Oct 2020

I wonder whether this is reproducible with Launchpad too? CC'ing @ethanfrey @njmurarka

alessio on 31 Oct 2020

In general (A * C) / B gives better precision than (A / B) * C, as it does not round off least-significant digits.

I feel I saw a very similar issue recently. In any case .MultInt(x).Quo(y) looks better than .Quo(y).MultInt(x) and I see no problem with backporting to launchpad

ethanfrey on 31 Oct 2020

👍5

So this is actually a bigger problem than it appears to be.

I was thinking about your reply @ethanfrey and I was thinking well if order of operations matters so much, we should delay rounding as long as possible which something like big.Rat would get us. Needing to think so much about order of operations so much is kind of hard to get right and it should be easy to just get the answer 10. Then I discovered that the SDK used to use big.Rat, but it was replaced with a custom "big decimal" type sdk.Dec (#1807).

So I figured I'd do some tests. As expected big.Rat gets the correct answer independent of order of operations. What about other "big decimal" libraries? I tried https://github.com/cockroachdb/apd which I'd previously suggested here #7113. It properly implements General Decimal Arithmetic/IEEE 754-2008. Sure enough, apd produces the correct result 10 using decimal128, decimal64, and even decimal32 settings! How is it possible that decimal32 is more accurate than sdk.Dec which supposedly has 18 decimal places?

Even worse,float32 gets the correct result with either order of operations. Run this example and see: https://play.golang.org/p/f9cF7ItDtxe. So sdk.Dec is basically less precise than 32-bit floating point...

Looks like we have a bigger problem on our hands.

I would say that if we are building "the financial infrastructure of the future", getting arithmetic correct is non-negotiable.

I will open up a follow-up issue.

Note @odeke-em this sort of stuff is likely good material for fuzzing.

aaronc on 1 Nov 2020

👍1

I think decimal32 would have precision rounding errors with other numbers, like (123 / 1234567) * 1234567. Or even better (1/12345678901234567890) * 12345678901234567890 It would be good to check these with many input values.

I think there are two approaches, and both may be useful.

1) use a general purpose (but possibly slow) library to always get the correct issue regardless of ordering and with no underflow or overflow issues - big.Rat is possibly a good approach here
2) use a limited precision library that gets correct and fast results over a reasonable range of values and returns errors on any under/overflow - this looks like the cockroachdb decimalXXX cases

The argument to move from big.Rat to big.Dec was for speed, but it was at a cost of correctness. And should be significantly slower than a decimal64/decimal128 type (arbitrary precision ops are always much slower than fixed precision). I would support reverting big.Dec to use big.Rat under the hood (without changing APIs) and test the cockroach version. If it is a good speed, that should be recommended for any operations that really need speed (A * B / C is very unlikely to be a performance bottleneck anyway that even reads one item from disk)

ethanfrey on 1 Nov 2020

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Happy to do some work on this :+1:

alessio on 1 Nov 2020

Did some actual benchmarks here: https://github.com/cosmos/cosmos-sdk/pull/7772

Obviously this is just one case to consider of many... but some surprising results:

sdk.Dec can never get (x/y)*y quite right with an error similar to a float64 or decimal64and always gets (x*y)/y right 🤷
big.Rat always gets both correct and is actually marginally faster than sdk.Dec
the https://github.com/cockroachdb/apd data types are generally a bit slower, although reasonably accurate
surprisingly big.Float configured as a float128 always gets this right and is actually quite fast... there must be some hardware optimization happening there 🤔

aaronc on 2 Nov 2020

❤2 👍2

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Ideally we'd have proper numerical computing and an interface we want to stick with for 1.0.

Happy to do some work on this 👍

Awesome 👍

aaronc on 2 Nov 2020

Created a new issue for addressing all this: https://github.com/cosmos/cosmos-sdk/issues/7773

aaronc on 2 Nov 2020

👍2

@alessio If I recall, weren't we having some issues related to power back in the early part of this year? I think they had to do with the fact that if you had a small # of tokens associated with the stake, your power lost so much accuracy it no longer made alot of sense. The solution was in general to use small units and therefore, all magnitudes were larger.

@alessio Is there something I can check to help? I am not clear on what that is.

njmurarka on 3 Nov 2020

Regardless of precision, commutativity is an important fundamental property. I am going to evaluate the issues as well as usages and potential pitfalls. In the future though if performance is a problem but correctness isn't, we can wiggle into high performance, but we shouldn't trade off on correctness at all.