Rust: Lint against float literals like 4444444444_f32 that "overflow" their mantissa

Created on 1 Jul 2018 · 15Comments · Source: rust-lang/rust

4444444444_f32 is actually the number 4444444700 because f32 doesn't have enough bits to represent 4444444444. This seems like an obvious thing we should be linting against, but http://play.rust-lang.org/?gist=3e360c4ab4deb3ef9cc16b9c9a084f6e&version=stable&mode=debug&edition=2015 emits no warnings today.

I'm not sure if this should be a new lint or part of the existing overflowing_literals lint. It obviously "feels" like the same sort of issue, and it's hard to imagine a situation where you'd want to allow/deny/forbid this and not the other kinds of overflowing_literals. But it's also clearly not "overflowing" in the usual sense of being an unrepresentable value greater than MAX or less than MIN, rather it's an unrepresentable value in-between multiple representable values.

(thought of this when reading https://github.com/rust-lang/rust/issues/51534#issuecomment-396917857)

A-lint C-feature-request T-lang

Source

Ixrec

👍4

Most helpful comment

Note that clippy already lints this:

warning: float has excessive precision
 --> src/main.rs:2:34
  |
2 |     println!("Hello, world! {}", 4444444444_f32 );
  |                                  ^^^^^^^^^^^^^^ help: consider changing the type or truncating it to: `4_444_444_700`
  |
  = note: #[warn(excessive_precision)] on by default
  = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.211/index.html#excessive_precision

this seems like an uplift candidate

oli-obk on 1 Jul 2018

👍4

All 15 comments

I think it's known and expected that many decimal numbers cannot be exactly represented by floating-point numbers and linting against every single literal that would instead be rounded to the closest floating-point number would be an inconvenience and unhelpful.

It doesn't seem obvious to me that it's a common problem for people to both want to exactly represent a decimal, and also are unaware that IEEE-754 floating-point cannot do so in a large number of cases and so take their own precautions to avoid rounding issues.

varkor on 1 Jul 2018

It doesn't seem obvious to me that it's a common problem for people

It's arguably not advisable to rely on the programmer's knowledge to avoid logical errors like this. The sum total of such 'obvious' things makes C so terrible at memory management as it is.

I propose that the lint uses algorithmic checks and provides warnings as suggested by @Ixrec using the implementations as researched here:

https://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2/

DrizztVD on 1 Jul 2018

👍1

While it's true that it's a reasonably common error to write a decimal number without realizing it isn't being represented exactly, I'm not sure how to turn that insight into a lint that has a small enough false positive rate to be useful. While many people may be surprised by 1e12f32 not being exactly one trillion, it's ludicruous to write that as 999999995904 instead. This gets even worse when we also consider fractional parts (and we really should, since that's an even more common source of confusion and mistakes than large integers). When I want to add 10% to a floating point quantity, it doesn't help to write that as * 1.10000002384185791015625. It won't make people who don't know about floating point learn more and it won't be very effective at reminding those who know, it'll just weird everyone out.

hanna-kruppe on 1 Jul 2018

@rkruppe , would it not be sufficient to pull the value being casted as a higher-bit representation and compare it bitwise with it's requested type? A mismatch equals a rounding error and this gets flagged as a warning.

DrizztVD on 1 Jul 2018

@DrizztVD I'm not sure I understand, can you elaborate?

hanna-kruppe on 1 Jul 2018

Note that clippy already lints this:

warning: float has excessive precision
 --> src/main.rs:2:34
  |
2 |     println!("Hello, world! {}", 4444444444_f32 );
  |                                  ^^^^^^^^^^^^^^ help: consider changing the type or truncating it to: `4_444_444_700`
  |
  = note: #[warn(excessive_precision)] on by default
  = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.211/index.html#excessive_precision

this seems like an uplift candidate

oli-obk on 1 Jul 2018

👍4

@rkruppe I'm hesitant to elaborate because the article I linked contains too much good info to do so succinctly without repeating it.

@oli-obk I hope it is uplifted. I've done some DSP work before and the math implementation on the C++ compiler was horrendous because of these precision losses. You;ll find people just assign doubles everywhere to try and push the problem to the background instead of designing around it - and then running out of embedded memory and slowing things down on 32 bit micro architectures. A comprehensive lint on data losses due to type casting would have helped a lot. I remember running into this on Python as well last year actually... spent two hours looking for my mistake only to find it's technically a compiler(interpreter) mistake.

DrizztVD on 1 Jul 2018

@DrizztVD I am quite familiar with the subjects that article describes. I'm not asking for a primer on the problem statement, I don't understand what specific algorithm you propose for solving it. For example, I don't know what it means to "pull" a value"as a higher-bit representation", and if I blindly guess that you mean parsing an f32 literal with more precision than f32 has (how much?), then I don't know how comparing that with the literal parsed as f32 will address my concern: it will sometimes not warn about decimal literals that have to be rounded to fit into f32, but most of the time it will.

hanna-kruppe on 1 Jul 2018

The clippy lint goes by {f32,f64}::DIGITS, i.e. it warns if you write more decimal digits than can "usually" fit into a float, but it doesn't account for the precision varying by scale (so it's wrong about subnormals) and it uses "does parsing->formatting reproduce the literal exactly?" as guidance so it misses some things due to float formatting preferring fewer digits (e.g., it doesn't fire on 1_000_000_000_000f32 which is actually represented as 15258789 * 2^16 = 999999995904). It's also otherwise conservative, e.g. it doesn't say anything if the literal ends in .0. So I'm not sure if it really addresses this issue as written.

That is not to say I don't want this lint in rustc -- quite the opposite, it doesn't appear to have false positives and it does catch some interesting cases, so it's a nice candidate. And per my previously stated concern, I'm not sure whether strictly warning about any "overflow of mantissa" is even desirable. I just wanted to make clear what the lint does and doesn't do.

hanna-kruppe on 1 Jul 2018

👍1

it doesn't appear to have false positives and it does catch some interesting cases
[...]
I just wanted to make clear what the lint does and doesn't do.

I think this is an important point regarding the presentation of the lint, if it were implemented in the compiler. If you have a lint with false negatives, you don't want people to think that an incorrect float is correct (in terms of it being exactly represented by the literal, when it's not), or you potentially cause even more confusion than not having the lint at all.

varkor on 1 Jul 2018

👍2

@rkruppe I didn't link the article to explain the problem, but to explain the solution. I thought this part was reasonably clear:

My test code calculates the desired 7-digit number using double precision math, assigns it to a float, and then prints the float and the double to 7 digits of precision. The printing is assumed to use correct rounding, and if the results from the float and the double don’t match then we know we have a number that cannot be uniquely identified/represented as a float.

This means that when using a type in a way that attempts to extract more precision than can be guaranteed by that type - this is obviously an error - and an error that is made often enough to warrant a warning. Based on the linked article, it appears that a bound can, in fact, be placed through a lint that would mathematically ensure any subsequent operations with that value does not attempt to calculate with a greater precision than can be guaranteed.

Regarding your point about 1_000_000_000_000f32, it should be noted that the convention in physics is that such a number has a been rounded to one significant number- meaning that no warning should be produced because the programmer is ostensibly not interested in a high amount of precision. Trying to write 999999995905 f32 should, however, clearly be marked as problematic since the programmer is evidently trying to use 12 digits of precision when only 6 can be guaranteed.

Hence, in this case, 444_444_f32 would be fine, but the mathematical bound on precision says that only 6 significant figures can be accurately supported, so 4_444_444_f32 gets flagged because any subsequent operation would be assumed to be correct up to 7 digits, which it is not.

The implementation aims to safeguard the programmer from inserting a value of high precision only to have the compiler revert to the incorrect precision, thereby throwing away information needed in the subsequent operation based on accepted rules taught in chemistry/physics.: http://chemistry.bd.psu.edu/jircitano/sigfigs.html

Making the compiler aware of these accepted conventions would go a long way toward optimising calculation lengths. This is all the more relevant when looking at half-precision as used in artificial intelligence. The compiler should be able to reason about quantisation error and keep it away from the programmer as far as possible unless the programmer explicitly chooses to ignore (deactivate) it.

DrizztVD on 2 Jul 2018

❤1

Regarding your point about 1_000_000_000_000f32, it should be noted that the convention in physics is that such a number has a been rounded to one significant number

Inferring precision based on the number of trailing zeroes makes me very uneasy.

varkor on 2 Jul 2018

Inferring precision based on the number of trailing zeroes makes me very uneasy.

It's very commonly accepted in Chemistry/physics. The author will state the accuracy if it is something else. Or they'll do this: 1.000e9, which unambiguously contains 4 significant figures.

DrizztVD on 2 Jul 2018

👍1

Also relevant: https://randomascii.wordpress.com/2012/02/11/they-sure-look-equal/

Which version of IEEE 754 is implemented? IEEE 754_2008? And is IEEE 754_2018 support in the pipeline?

DrizztVD on 2 Jul 2018

I'm not sure if this should be a new lint or part of the existing overflowing_literals lint.

The overflowing_literals lint already fires for _overflowing_ floats, like 1e99999999.

I think there's something interesting here, but I'm not sure how to target it usefully. I find

let pi = 3.14159265358979323846264338327950288419716939937510582097494459230781640;

totally reasonable despite being way excessive yet would certainly like to be warned about 1.00000001_f32, and have no idea how to codify the difference.

(And in the "probably impossible" department, I'd also like to be warned about 3.141592_f32, which looks reasonable -- 7 digits --- but should actually be 3.1415927_f32.)

In a different direction: I'd also sometimes like to be able to get 3.141592653589793 somehow turned into 3.1415925_f32 .. 3.1415927_f32, since it's not obvious to me which of those is consts::PI.