Cosmos-sdk: Decide about future of sdk.Dec

Created on 2 Nov 2020 · 22Comments · Source: cosmos/cosmos-sdk

Summary

sdk.Dec gets some numerical calculations wrong (always): #7640. We should have a correct approach to numerical computation in the SDK.

Problem Definition

sdk.Dec was created as a custom "big decimal" type in #1807. It appears that the motivation was primarily performance. Currently, sdk.Dec is getting certain calculations always wrong. For instance it never seems to be able to correctly compute that (x / y) * y == x (#7640, #7772 ). It isn't based off of any clear, well-researched specification as far as I can tell. While I applaud the efforts of the authors at making a custom decimal arithmetic implementation and find the algorithm relatively concise and elegant, it's hard to know all the details that have been figured out by IEEE, IBM, etc. without a serious research project.

Proposal

Find a quick correct solution for the current bugs, sdk.Rat is probably a good alternative and isn't actually slower in these albeit very limited tests 🤷 : https://github.com/cosmos/cosmos-sdk/issues/7640#issuecomment-720184641
Do an audit of the SDK's usage of arithmetic and figure out what precision, rounding, etc. we _actually_ need (@odeke-em ?)
Use properly vetted and benchmarked libraries. Maybe okay to roll our own but there need to be a strong reasons and thorough auditing. Probably better to use the built in big.Rat or big.Float or something like https://github.com/cockroachdb/apd
Write up clear documentation on how to do math in the SDK! What types, precision, rounding, order of operations, etc. to do when If we're building the "financial infrastructure of the future" it's our job to get this right. If we need to know to use type X with precision Y in order A*B/A vs A/B*A, developers need to know that.

For Admin Use

[ ] Not duplicate issue
[ ] Appropriate labels applied
[ ] Appropriate contributors tagged
[ ] Contributor assigned/self-assigned

backlog bug proposal 13

Source

aaronc

👍4 👀2

Most helpful comment

Maybe @ValarDragon and/or @rigelrozanski could weigh in here.

ebuchman on 2 Nov 2020

👍4

All 22 comments

Maybe @ValarDragon and/or @rigelrozanski could weigh in here.

ebuchman on 2 Nov 2020

👍4

ref: https://github.com/cosmos/cosmos-sdk/issues/4399 (for arbitrary precision on arithmetic yielding a fixed precision result). This is needed.

alexanderbez on 2 Nov 2020

sdk.Rat was the original approach before the introduction of the sdk.Dec... it quickly led to enormous irrational numbers which used huge amounts of data, which needed to be trimmed down, virtually creating an equivalent to a decimal type (except much more confusing because it was arbitrary base fraction, rather than a base-ten fraction as all decimals are when represented as fractions).
- big.Float will probably not work because of non-determinism, there may be some ugly way to finagle a determinism to a certain number of digits but this seems hacky/unadvisable.
- 👍 to better documentation

Ultimately I stand by earlier thoughts in that we just need to introduce arbitrary precision decimal types, which are clearly and explicitly defined at the instantiation of the decimal type - aka we'd have a decimal interface which would be instantiated with a decimal struct which was created with "X" level of precision.

Over and out GL. Thanks for pinging my @ebuchman - also yeah @ValarDragon has a great grasp on this issue and has worked on the decimal type.

rigelrozanski on 2 Nov 2020

👍2

sdk.Rat was the original approach before the introduction of the sdk.Dec... it quickly led to enormous irrational numbers which used huge amounts of data, which needed to be trimmed down, virtually creating an equivalent to a decimal type (except much more confusing because it was arbitrary base fraction, rather than a base-ten fraction as all decimals are when represented as fractions).

So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.

big.Float will probably not work because of non-determinism, there may be some ugly way to finagle a determinism to a certain number of digits but this seems hacky/unadvisable.

Why would big.Float introduce non-determinism? It has configurable precision and rounding mode. Afaik, the reason why float64 is considered non-deterministic is because rounding modes can vary from machine to machine. big.Float shouldn't have this problem. Or am I missing something?

Ultimately I stand by earlier thoughts in that we just need to introduce arbitrary precision decimal types, which are clearly and explicitly defined at the instantiation of the decimal type - aka we'd have a decimal interface which would be instantiated with a decimal struct which was created with "X" level of precision.

So something like a proper decimal128 implementation could take care of this.

aaronc on 2 Nov 2020

So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.

I think this is exactly what we want.

alexanderbez on 2 Nov 2020

So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.

I think this is exactly what we want.

It is true that big.Rat scales poorly as you perform more and more operations. This replicates @rigelrozanski's findings: https://github.com/cosmos/cosmos-sdk/pull/7772#issuecomment-720678178

Based on that, why not big.Float? Is it really non-deterministic??

aaronc on 2 Nov 2020

Glad this is getting looked into again! Totally agreed that there wasn't enough analysis in #1807, viewing the exact error models / analysis into tracking the errors. (Though in defense of that issue and the subsequent work, it seems clear to me that it only had significant benefits over rational, in both error handling and performance) However, I also believe that #7772 is presenting an oversimplification of what is going on and the guarantees you get. Precision handling is super tricky! A great reference is to see some worked out examples for floats, which is on https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems

I'd like to give some context on the error models, at least so we have some nice shared state. (The conclusions in #7772 didn't seem right to me)

Floats don't simplify that much with regard to errors, they should give you only more fundamental error causes than Decimal. None of Decimal, Float, or Rat will guarantee that either (x / y) * y == x or (x * y) / y == x in all cases. This is intrinsic to having rounding error. Also, with floats and rationals its even _worse_. (a + b) + c is not guaranteed to equal a + (b + c). Decimal is designed to fix precisely that problem, but it does in turn change the error model for divisions. Do note that multiplication is not associative on _any_ of these types, e.g. (x * y) * z may not equal x * (y * z).

+1 to big.Rat's performance and error modes being extremely bad. I believe the benchmarks in #7772 don't accurately reflect the situation. AFAICT they compare correctness, and speed against 64 bit arithmetic. The error model gets really wack once you start hitting the max bit length, as you truncate based on this highly variable numerator/denominator bit length. (At which point you couldn't be comparing against 64 bit arithmetic) And the performance gets quite bad, due to the GCD involved.

Also if you plan on using normal floats (base 2), you get _incredibly_ weird situations for 'normal' decimals.

The problems you are experiencing won't go away no matter which data type you choose, at best you can do things to track what is the error being maintained. Every individual floating point and decimal op has isolated errors you can think about (much easier in the decimal case), and then you can track to what degree these compound as you do more of your computation.

I am personally still in favor of Decimals as default, as you strictly remove the complications of error due to the sliding exponent, and at _least_ get associativity for addition. I also think that we should build code to track the building up error.

ValarDragon on 4 Nov 2020

So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.

The key component of this can be phrased a bit differently. "If we want 18 places of final correct precision, what if we actually compute over 36 decimal places of precision, and then truncate". I think this is great, and exactly what should be done! IIRC, we actually had a long conversation about this at one point, and marked it #postlaunch. (The idea was to separate 'saved' data types, and 'computation' data types) I'm having trouble finding it on github, maybe it was an in person convo.

However I push back on using 36 decimal places within rationals, for reasons alluded to above for its error model being very weird. (Even weirder than floats!) I believe we should be using decimals still, as it gives the most properties. However, I haven't really justified the 36 decimal places of precision. And actually, you can't justify that number for generic computation. You need more decimal places for the more operations you do. It fundamentally has to be white boxed (or tracked with taint analysis/proper error analysis techniques) per region of the code. Happy to brainstorm on designs for implementing this well.

We definitely should make our decimals able to gracefully have differing precisions (or use a library that does)

ValarDragon on 4 Nov 2020

Thanks for chiming in and sharing all those details @ValarDragon!

So I'm currently thinking that it makes sense to align first and foremost on the General Decimal Arithmetic (GDA) specification which aligns with IEEE-754 2008 and has multiple implementations. GDA specifies how operations should work for both fixed and arbitrary position decimal arithmetic. In terms of choosing a specific precision within GDA, https://en.wikipedia.org/wiki/Decimal128_floating-point_format which specifies 34 digits of precision seems like a logical choice. What are your thoughts on taking that approach?

If we are able to agree that GDA is the correct specification to use, and possibly that decimal128 is the right "general purpose" precision, then choosing a particular implementation with particular performance characteristics becomes a secondary matter. (I did benchmarks on Intel RDFP in Julia btw and it performs quite well comparatively so I know good performance is possible with decimal128.)

aaronc on 4 Nov 2020

I don't agree that we want decimal-based floating points. I fail to see how the variable exponent helps us with any problem we have, it should only introduce more problems AFAICT

ValarDragon on 4 Nov 2020

I don't agree that we want decimal-based floating points. I fail to see how the variable exponent helps us with any problem we have, it should only introduce more problems AFAICT

Well GDA does give us a specification to work with that's gone through robust review and lots of usage. So we have something to base off of for correctness.

Also, I think what you're suggesting would basically be GDA with Emin and Emax both fixed to -18 or -34 and arbitrary coefficient size.

aaronc on 5 Nov 2020

Just chiming in, the issue here is that fundamentally commutativity isn’t
working so a wide range of arithmetic with the sdk.Dec type and I observed
that it happens with any fraction that has precision of 18+ decimal points
in order to successfully be represented eg (12/11 * 11) != (12*11 / 11)
which are small values and there is a very wide range that aren’t accurate.
Which brings me to another question: why did we choose big.Int as the
underlying/intermediate type?

On Wed, Nov 4, 2020 at 5:15 PM Aaron Craelius notifications@github.com
wrote:

I don't agree that we want decimal-based floating points. I fail to see
how the variable exponent helps us with any problem we have, it should only
introduce more problems AFAICT

Well GDA does give us a specification to work with that's gone through
robust review and lots of usage. So we have something to base off of for
correctness.

Also, I think what you're suggesting would basically be GDA with Emin and
Emax both fixed to -18 or -34 and arbitrary coefficient size.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722063492,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3VZ66XLRXTSTHXAAWY3SOH4EPANCNFSM4TG4A4OA
.

odeke-em on 5 Nov 2020

I would just note @odeke-em, that commutativity fundamentally can't be guaranteed as @ValarDragon has pointed out. But, the issue is moreover that it actually never works right even for the easy cases... I think the underlying problem is more related to rounding behavior than big.Int which apd uses as well.

aaronc on 5 Nov 2020

Thanks Aaron! Ideally it should though for most cases with big number
libraries.
Yeah rounding behaviour came up too and there was a particular case in
which the proper digits could be represented in 19 decimal points but we
rounded down in (12/7 * 7) which this produced 12.0000000000000002 or so
(typing from my phone and not close to my machine for a proper citation)

Let me examine how apd handles this.

On Thu, Nov 5, 2020 at 9:12 AM Aaron Craelius notifications@github.com
wrote:

I would just note @odeke-em https://github.com/odeke-em, that
commutativity fundamentally can't be guaranteed as @ValarDragon
https://github.com/ValarDragon has pointed out. But, the issue is
moreover that it actually never works right even for the easy cases... I
think the underlying problem is more related to rounding behavior than
big.Int which apd uses as well.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722513964,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3V3OW33H3RQ2OH3UBKDSOLMIVANCNFSM4TG4A4OA
.

odeke-em on 5 Nov 2020

In general I would like to better understand all the uses of decimals in the SDK - particularly where division and multiplication are used and rounding happens. I guess I can review that all myself, but ideally we have a summarized list that outlines goals and potential problems.

aaronc on 5 Nov 2020

The core areas where this happens is distribution & slashing.

alexanderbez on 5 Nov 2020

Aaron, I started compiling that list and should have a report by end of the
day or early tomorrow, but was also trying to understand the entire problem
and root causes.

On Thu, Nov 5, 2020 at 9:31 AM Aleksandr Bezobchuk notifications@github.com
wrote:

The core areas where this happens is distribution & slashing.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722525101,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3V7AH2U46PMLC5ZR6KLSOLOP3ANCNFSM4TG4A4OA
.

odeke-em on 5 Nov 2020

👍2

I was thinking about it independently when @aaronc mentioned about using arbitrary decimal numbers.
Multiplication and division operations are known to be inaccurate.
Let me enumerate the motivations I have in head when talking about designing a right type for represent important numerical values:

People like to use base units and fraction decimals. So definitely supporting a decimal number is very pro UX
Floats are known to not work in any numerical representation. Integers are very predictable and standardize. If we want to have a good precision then the only solution is symbolic representation and symbolic operations.
Having few standards will be a disaster. Even if we will have both sdk.Int and some Dec it will be confusing.

Looking from an experience with other blockchains, I see that using integers is the lowest hanging fruit. However this is not very user friendly. Because we will end up with big numbers with different numerical denomination (eg some denominated in 1e2 others in 1e9 ...) and looking at that numbers is a nightmare. As a user I really want to work with decimal numbers and don't thing about the denomination.

In Ethereum, 18 decimals became kind of a standard. Even some token standards require that all tokens have 18 fractional decimal digits.

With all that in mind, the solution I would see that it's both easy and flexible is:

Do all operations using big.Int. Everything is big.Int and everything has the same factional decimal places, let's call it decimals
To accommodate divisibility requirements form token creators we will need to introduce one more parameter: granularity. It must hold that: 0 <= granularity <= decimals. So, granularity=2 means that user can trade / operate with units of minimal value = 0.01. However the arithmetic will be done
efficient serialization. Since the data structure has only 2 values: granularity and value, we don't need protobuf or any other general serialization. We can just use a little endian bytes representation and concatenate the bytes. granularity will be either int8 or int16 (depending what we will pick for decimals).
developers will need to learn best practices as the have to do with any other blockchain (EVM, Rust + wasmer , ... ) Examples:
- always start with multiplication and then division
- for certain cases firstly multiply by a 10^x number (eg if you want to multiply by 0.2).

If we choose decimals to something low, eg 9, then we can use a very efficient number representation like int128. For bigger numbers we can use int128 as a base and convert to checked int256 whenever we do multiplication. Or leave it to user - this is in fact what Rust on WASM smart contracts are doing.

robert-zaremba on 6 Nov 2020

I've checked apd. Their operational model is similar to the one described above: each value is represented as a pair (big.Int, exponent) (and two other fields, not important here). Arithmetic is done on the big.Int. Firstly they make sure they operate on the same exponent - so upscale the value with a lower exponent, and do normal operation on big.Int only.

robert-zaremba on 6 Nov 2020

Alright, so firstly sorry for the delays, finally finished contract logistics at the end of last week, but now we are good for a near-long term. As for the investigation, here goes

TL;DR: let's use decimal128 or just not perform truncations with sdk.Dec (which does so much work for every API call) yet has low precision and quickly introduces a high error rate: 18 digits of precision quickly run out as soon as you have say 10.XXXXXXXXX; even just (12/7)*7 != (12*7)/7, but for most operations, we'll see a numerator much smaller than the denominator

Background

The usages of sdk.Dec heavily invoke the (Dec).Quo* methods. For every invocation of (Dec).Quo, there is a precision check and rounding that happens, to fit results in 18 digits of precision.

The usage of Dec.Quo happens for a denominator of a usually smaller value (e.g. staking amount, a small portion of all shares), and the denominator of the larger value (e.g. total shares), thus we'll usually be dealing with values less than 1. For example with this transaction Fee 600uatom (0.0006ATOM) delegated 1.5ATOM to Binance, and total ownership was 32.4994ATOM. Calculating the rewards resulted in a 0.XXXXX result

Some sites

DecCoins.MulDecTruncate —> Dec.MulTruncate —> chopPrecisionAndTruncate
x/distribution/keeper.

x/staking/types:

shares.MulInt(…).QuoTruncate(…)
shares.MulInt(…).Quo(…)
shares.MulInt(…).QuoRoundUp(…)
.Add(…)
.Sub(…)
tx.pb.go —> CommissionRate, CommissionRates.(DelegatorShares, Shares)

x/slashing/simulation:

sdk.NewDec(…)—> r.Intn(200) + 1
sdk.NewDecWithPrec(…)

x/staking/keeper:

Keeper.BondedRatio—>QuoInt(stakeSupply)

x/mint/types:

Minter struct as (Inflation, AnnualProvisions)
StakingKeeper interface

x/distribution/keeper:

(Keeper).AllocateTokensToValidator —> tokens.MulDec(val.GetCommission())
(Keeper).calculateDelegationRewardsBetween —> truncates rewards
(Keeper).CalculateDelegationRewards —> truncates

Cons

There are quite many fractional values whose results can't fit in 18 digits of precision. As soon as a single value is not 0, even worse when greater than |10|, we quickly lose precision
Given the truncation performed by Quo, this means that the folks receiving rewards will always be skimmed, less than what they were owed mathematically, hence this is technically unfair to participants.
The code performs a whole lot of truncations, a bunch of big.Int multiplications, and this is magic that is hidden from the API surface nor documented. This is a potential DOS complexity vector since we don't know and haven't tested nor fuzzed complexity paths

Pros of keeping this library

Truncation (from (sdk.Dec).QuoTruncate) is used as a good measure to ensure withdrawals of rewards, stakes, don't consume more values than owed.
From a survey of code sites in the cosmos-sdk, I didn't directly see a site which has high precision losses like the sequence of division first then a multiplication; that got reported in https://github.com/cosmos/cosmos-sdk/issues/7640 and fixed by https://github.com/cosmos/cosmos-sdk/pull/7641. The rest seem to work normally

Suggestions

As we've seen in this issue, suggestions to use decimal128 (it has 34 decimal places) by @aaronc hold: I did calculations with 30 "digits of precision" and a calculation like (12/7) * 7 vs (12*7) / 7 still didn't yield a commutative result, but 34 yields a good result. We need to use in computations, as much precision and then when serializing results, truncate

odeke-em on 10 Nov 2020

Having a fixed precision is effectively what we are discussing above. If we will choose a fixed precision for both fractional part and the natural part, then, as suggested in my comment above we will need to have clear guidance how to do math and have int256 type (or falling back to big.Int) for doing multiplications of big numbers.

robert-zaremba on 10 Nov 2020

Just saw this when investigating what was in v0.40.1.
Noting that this is a breaking change, and unless it makes it in v0.40.0 (very unlikely) needs to wait til v0.41.

That said, I do think it is very important and difficult to get a good fixed point representation that does everything you want to do with floats, but without non-determinism. Just shows a lot of thought went into floating point ops.