sdk.Dec gets some numerical calculations wrong (always): #7640. We should have a correct approach to numerical computation in the SDK.
sdk.Dec was created as a custom "big decimal" type in #1807. It appears that the motivation was primarily performance. Currently, sdk.Dec is getting certain calculations always wrong. For instance it never seems to be able to correctly compute that (x / y) * y == x (#7640, #7772 ). It isn't based off of any clear, well-researched specification as far as I can tell. While I applaud the efforts of the authors at making a custom decimal arithmetic implementation and find the algorithm relatively concise and elegant, it's hard to know all the details that have been figured out by IEEE, IBM, etc. without a serious research project.
sdk.Rat is probably a good alternative and isn't actually slower in these albeit very limited tests 🤷 : https://github.com/cosmos/cosmos-sdk/issues/7640#issuecomment-720184641big.Rat or big.Float or something like https://github.com/cockroachdb/apdX with precision Y in order A*B/A vs A/B*A, developers need to know that.Maybe @ValarDragon and/or @rigelrozanski could weigh in here.
ref: https://github.com/cosmos/cosmos-sdk/issues/4399 (for arbitrary precision on arithmetic yielding a fixed precision result). This is needed.
big.Float will probably not work because of non-determinism, there may be some ugly way to finagle a determinism to a certain number of digits but this seems hacky/unadvisable. Ultimately I stand by earlier thoughts in that we just need to introduce arbitrary precision decimal types, which are clearly and explicitly defined at the instantiation of the decimal type - aka we'd have a decimal interface which would be instantiated with a decimal struct which was created with "X" level of precision.
Over and out GL. Thanks for pinging my @ebuchman - also yeah @ValarDragon has a great grasp on this issue and has worked on the decimal type.
- sdk.Rat was the original approach before the introduction of the sdk.Dec... it quickly led to enormous irrational numbers which used huge amounts of data, which needed to be trimmed down, virtually creating an equivalent to a decimal type (except much more confusing because it was arbitrary base fraction, rather than a base-ten fraction as all decimals are when represented as fractions).
So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.
big.Floatwill probably not work because of non-determinism, there may be some ugly way to finagle a determinism to a certain number of digits but this seems hacky/unadvisable.
Why would big.Float introduce non-determinism? It has configurable precision and rounding mode. Afaik, the reason why float64 is considered non-deterministic is because rounding modes can vary from machine to machine. big.Float shouldn't have this problem. Or am I missing something?
Ultimately I stand by earlier thoughts in that we just need to introduce arbitrary precision decimal types, which are clearly and explicitly defined at the instantiation of the decimal type - aka we'd have a decimal interface which would be instantiated with a decimal struct which was created with "X" level of precision.
So something like a proper decimal128 implementation could take care of this.
So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.
I think this is exactly what we want.
So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.
I think this is exactly what we want.
It is true that big.Rat scales poorly as you perform more and more operations. This replicates @rigelrozanski's findings: https://github.com/cosmos/cosmos-sdk/pull/7772#issuecomment-720678178
Based on that, why not big.Float? Is it really non-deterministic??
Glad this is getting looked into again! Totally agreed that there wasn't enough analysis in #1807, viewing the exact error models / analysis into tracking the errors. (Though in defense of that issue and the subsequent work, it seems clear to me that it only had significant benefits over rational, in both error handling and performance) However, I also believe that #7772 is presenting an oversimplification of what is going on and the guarantees you get. Precision handling is super tricky! A great reference is to see some worked out examples for floats, which is on https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems
I'd like to give some context on the error models, at least so we have some nice shared state. (The conclusions in #7772 didn't seem right to me)
Floats don't simplify that much with regard to errors, they should give you only more fundamental error causes than Decimal. None of Decimal, Float, or Rat will guarantee that either (x / y) * y == x or (x * y) / y == x in all cases. This is intrinsic to having rounding error. Also, with floats and rationals its even _worse_. (a + b) + c is not guaranteed to equal a + (b + c). Decimal is designed to fix precisely that problem, but it does in turn change the error model for divisions. Do note that multiplication is not associative on _any_ of these types, e.g. (x * y) * z may not equal x * (y * z).
+1 to big.Rat's performance and error modes being extremely bad. I believe the benchmarks in #7772 don't accurately reflect the situation. AFAICT they compare correctness, and speed against 64 bit arithmetic. The error model gets really wack once you start hitting the max bit length, as you truncate based on this highly variable numerator/denominator bit length. (At which point you couldn't be comparing against 64 bit arithmetic) And the performance gets quite bad, due to the GCD involved.
Also if you plan on using normal floats (base 2), you get _incredibly_ weird situations for 'normal' decimals.
The problems you are experiencing won't go away no matter which data type you choose, at best you can do things to track what is the error being maintained. Every individual floating point and decimal op has isolated errors you can think about (much easier in the decimal case), and then you can track to what degree these compound as you do more of your computation.
I am personally still in favor of Decimals as default, as you strictly remove the complications of error due to the sliding exponent, and at _least_ get associativity for addition. I also think that we should build code to track the building up error.
So what's wrong with using sdk.Rats in computation but storing the results as fixed precision decimal? Basically we could have the sdk.Dec interface (18 places of precision when serialized) but big.Rat under the hood.
The key component of this can be phrased a bit differently. "If we want 18 places of final correct precision, what if we actually compute over 36 decimal places of precision, and then truncate". I think this is great, and exactly what should be done! IIRC, we actually had a long conversation about this at one point, and marked it #postlaunch. (The idea was to separate 'saved' data types, and 'computation' data types) I'm having trouble finding it on github, maybe it was an in person convo.
However I push back on using 36 decimal places within rationals, for reasons alluded to above for its error model being very weird. (Even weirder than floats!) I believe we should be using decimals still, as it gives the most properties. However, I haven't really justified the 36 decimal places of precision. And actually, you can't justify that number for generic computation. You need more decimal places for the more operations you do. It fundamentally has to be white boxed (or tracked with taint analysis/proper error analysis techniques) per region of the code. Happy to brainstorm on designs for implementing this well.
We definitely should make our decimals able to gracefully have differing precisions (or use a library that does)
Thanks for chiming in and sharing all those details @ValarDragon!
So I'm currently thinking that it makes sense to align first and foremost on the General Decimal Arithmetic (GDA) specification which aligns with IEEE-754 2008 and has multiple implementations. GDA specifies how operations should work for both fixed and arbitrary position decimal arithmetic. In terms of choosing a specific precision within GDA, https://en.wikipedia.org/wiki/Decimal128_floating-point_format which specifies 34 digits of precision seems like a logical choice. What are your thoughts on taking that approach?
If we are able to agree that GDA is the correct specification to use, and possibly that decimal128 is the right "general purpose" precision, then choosing a particular implementation with particular performance characteristics becomes a secondary matter. (I did benchmarks on Intel RDFP in Julia btw and it performs quite well comparatively so I know good performance is possible with decimal128.)
I don't agree that we want decimal-based floating points. I fail to see how the variable exponent helps us with any problem we have, it should only introduce more problems AFAICT
I don't agree that we want decimal-based floating points. I fail to see how the variable exponent helps us with any problem we have, it should only introduce more problems AFAICT
Well GDA does give us a specification to work with that's gone through robust review and lots of usage. So we have something to base off of for correctness.
Also, I think what you're suggesting would basically be GDA with Emin and Emax both fixed to -18 or -34 and arbitrary coefficient size.
Just chiming in, the issue here is that fundamentally commutativity isn’t
working so a wide range of arithmetic with the sdk.Dec type and I observed
that it happens with any fraction that has precision of 18+ decimal points
in order to successfully be represented eg (12/11 * 11) != (12*11 / 11)
which are small values and there is a very wide range that aren’t accurate.
Which brings me to another question: why did we choose big.Int as the
underlying/intermediate type?
On Wed, Nov 4, 2020 at 5:15 PM Aaron Craelius notifications@github.com
wrote:
I don't agree that we want decimal-based floating points. I fail to see
how the variable exponent helps us with any problem we have, it should only
introduce more problems AFAICTWell GDA does give us a specification to work with that's gone through
robust review and lots of usage. So we have something to base off of for
correctness.Also, I think what you're suggesting would basically be GDA with Emin and
Emax both fixed to -18 or -34 and arbitrary coefficient size.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722063492,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3VZ66XLRXTSTHXAAWY3SOH4EPANCNFSM4TG4A4OA
.
I would just note @odeke-em, that commutativity fundamentally can't be guaranteed as @ValarDragon has pointed out. But, the issue is moreover that it actually never works right even for the easy cases... I think the underlying problem is more related to rounding behavior than big.Int which apd uses as well.
Thanks Aaron! Ideally it should though for most cases with big number
libraries.
Yeah rounding behaviour came up too and there was a particular case in
which the proper digits could be represented in 19 decimal points but we
rounded down in (12/7 * 7) which this produced 12.0000000000000002 or so
(typing from my phone and not close to my machine for a proper citation)
Let me examine how apd handles this.
On Thu, Nov 5, 2020 at 9:12 AM Aaron Craelius notifications@github.com
wrote:
I would just note @odeke-em https://github.com/odeke-em, that
commutativity fundamentally can't be guaranteed as @ValarDragon
https://github.com/ValarDragon has pointed out. But, the issue is
moreover that it actually never works right even for the easy cases... I
think the underlying problem is more related to rounding behavior than
big.Int which apd uses as well.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722513964,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3V3OW33H3RQ2OH3UBKDSOLMIVANCNFSM4TG4A4OA
.
In general I would like to better understand all the uses of decimals in the SDK - particularly where division and multiplication are used and rounding happens. I guess I can review that all myself, but ideally we have a summarized list that outlines goals and potential problems.
The core areas where this happens is distribution & slashing.
Aaron, I started compiling that list and should have a report by end of the
day or early tomorrow, but was also trying to understand the entire problem
and root causes.
On Thu, Nov 5, 2020 at 9:31 AM Aleksandr Bezobchuk notifications@github.com
wrote:
The core areas where this happens is distribution & slashing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cosmos/cosmos-sdk/issues/7773#issuecomment-722525101,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABFL3V7AH2U46PMLC5ZR6KLSOLOP3ANCNFSM4TG4A4OA
.
I was thinking about it independently when @aaronc mentioned about using arbitrary decimal numbers.
Multiplication and division operations are known to be inaccurate.
Let me enumerate the motivations I have in head when talking about designing a right type for represent important numerical values:
sdk.Int and some Dec it will be confusing.Looking from an experience with other blockchains, I see that using integers is the lowest hanging fruit. However this is not very user friendly. Because we will end up with big numbers with different numerical denomination (eg some denominated in 1e2 others in 1e9 ...) and looking at that numbers is a nightmare. As a user I really want to work with decimal numbers and don't thing about the denomination.
In Ethereum, 18 decimals became kind of a standard. Even some token standards require that all tokens have 18 fractional decimal digits.
With all that in mind, the solution I would see that it's both easy and flexible is:
big.Int. Everything is big.Int and everything has the same factional decimal places, let's call it decimalsgranularity. It must hold that: 0 <= granularity <= decimals. So, granularity=2 means that user can trade / operate with units of minimal value = 0.01. However the arithmetic will be done If we choose decimals to something low, eg 9, then we can use a very efficient number representation like int128. For bigger numbers we can use int128 as a base and convert to checked int256 whenever we do multiplication. Or leave it to user - this is in fact what Rust on WASM smart contracts are doing.
I've checked apd. Their operational model is similar to the one described above: each value is represented as a pair (big.Int, exponent) (and two other fields, not important here). Arithmetic is done on the big.Int. Firstly they make sure they operate on the same exponent - so upscale the value with a lower exponent, and do normal operation on big.Int only.
Alright, so firstly sorry for the delays, finally finished contract logistics at the end of last week, but now we are good for a near-long term. As for the investigation, here goes
TL;DR: let's use decimal128 or just not perform truncations with sdk.Dec (which does so much work for every API call) yet has low precision and quickly introduces a high error rate: 18 digits of precision quickly run out as soon as you have say 10.XXXXXXXXX; even just (12/7)*7 != (12*7)/7, but for most operations, we'll see a numerator much smaller than the denominator
The usages of sdk.Dec heavily invoke the (Dec).Quo* methods. For every invocation of (Dec).Quo, there is a precision check and rounding that happens, to fit results in 18 digits of precision.
The usage of Dec.Quo happens for a denominator of a usually smaller value (e.g. staking amount, a small portion of all shares), and the denominator of the larger value (e.g. total shares), thus we'll usually be dealing with values less than 1. For example with this transaction Fee 600uatom (0.0006ATOM) delegated 1.5ATOM to Binance, and total ownership was 32.4994ATOM. Calculating the rewards resulted in a 0.XXXXX result
x/staking/types:
x/slashing/simulation:
x/staking/keeper:
x/mint/types:
x/distribution/keeper:
|10|, we quickly lose precisionTruncation (from (sdk.Dec).QuoTruncate) is used as a good measure to ensure withdrawals of rewards, stakes, don't consume more values than owed.
From a survey of code sites in the cosmos-sdk, I didn't directly see a site which has high precision losses like the sequence of division first then a multiplication; that got reported in https://github.com/cosmos/cosmos-sdk/issues/7640 and fixed by https://github.com/cosmos/cosmos-sdk/pull/7641. The rest seem to work normally
As we've seen in this issue, suggestions to use decimal128 (it has 34 decimal places) by @aaronc hold: I did calculations with 30 "digits of precision" and a calculation like (12/7) * 7 vs (12*7) / 7 still didn't yield a commutative result, but 34 yields a good result. We need to use in computations, as much precision and then when serializing results, truncate
Having a fixed precision is effectively what we are discussing above. If we will choose a fixed precision for both fractional part and the natural part, then, as suggested in my comment above we will need to have clear guidance how to do math and have int256 type (or falling back to big.Int) for doing multiplications of big numbers.
Just saw this when investigating what was in v0.40.1.
Noting that this is a breaking change, and unless it makes it in v0.40.0 (very unlikely) needs to wait til v0.41.
That said, I do think it is very important and difficult to get a good fixed point representation that does everything you want to do with floats, but without non-determinism. Just shows a lot of thought went into floating point ops.
Most helpful comment
Maybe @ValarDragon and/or @rigelrozanski could weigh in here.