Powershell: Number literals: parse integer values too large to fit into Decimal as BigInteger, not Double

Created on 19 Jul 2020  路  25Comments  路  Source: PowerShell/PowerShell

Note: This is technically a breaking change, but arguably one that falls into Bucket 3: Unlikely Grey Area.

Summary of the new feature/enhancement

Currently and historically, an integer number literal in source code that is too big (in absolute terms) to fit into [decimal] is parsed as a [double] - _this invariably leads to a loss of accuracy_:

# 99999999999999999999999999999 is greater than [decimal]::MaxValue, so it is parsed as a [double]
PS> [bigint] 99999999999999999999999999999
99999999999999991433150857216

Given that 7.0 introduced the n suffix to _explicitly_ request a [bigint] and that suffixes are generally _optional_, with PowerShell _picking the appropriate type_, it would make sense to parse integer values outside the range of [decimal] as [bigint].

In other words: it would make sense to parse 99999999999999999999999999999 the same as 99999999999999999999999999999n.

Breaking-Change Issue-Enhancement WG-Language

Most helpful comment

I take it back. Sorry. I was thinking of something else.
I misinterpreted the title of the issue and, despite reading the full issue and comments, failed to notice what the proposal is actually about. I don't have a preference for parsing the number literals discussed.
(Just don't change the behavior of [int64]::MaxValue+1.) (Well, you can, but I'll whine about it a little bit.)

All 25 comments

I'm personally for this, but to my memory this option was pretty quickly vetoed when I was initially implementing the n suffix for BigInteger.

this option was pretty quickly vetoed

@vexx32 Could you find the conclusion?

Looks like it was here-ish: https://github.com/PowerShell/PowerShell/pull/7993#issuecomment-442651543

Looking back at the discussion, the question posed was a little narrower in scope - asking whether we should have hex/byte number literals automatically use BigInteger and leave decimal literals alone (I suppose I assumed it was too much of a reach to change decimal literals as well). Even that wasn't accepted, so we'd need the committee's approval to go ahead and make an even larger change here, for sure.

IMO it would be a good change, but if it does break something it'll be quite confusing and hard to pin down exactly what had changed and why.

Thanks for digging, @vexx32.

True, hypothetically something could break, but (a) number literals that large are unusual, (b) in case where they were used, it strikes me as likely that the loss of accuracy was _undesired_ and simply went _unnoticed_, (c) PowerShell's flexibility around handling numbers of different types makes it unlikely that something truly _breaks_ (as opposed to simply now operating on a more accurate value).

(That in _calculations_ the same type promotion as in number literals _isn't_ applied is a separate problem: [long]::MaxValue + 1 becomes a [double](!), and [decimal]::MaxValue + 1 curiously _breaks_).

I should say that BigInteger is not full "native" numeric in PowerShell and in .Net too. So I'd do not force the issue.

I'd say that with the introduction of the n suffix [bigint] has gone native - PowerShell-native, that is.

True, [bigint] isn't a .NET _primitive_ type ([bigint].IsPrimitive yields $false), but I'm not sure why that distinction matters, given that PowerShell handles it just fine, and we've chosen to support literals with suffix n.

The issue with how calculations behave in terms of widening is because all calculations are handled there basically the same as they are in .net from what I can tell.

Double is the next option up from integer types, but decimal doesn't get widened in the same way. I assume that's probably because it's already designed to be a fairly precise type, and promoting in would be a huge loss of precision.

I'd say that with the introduction of the n suffix [bigint] has gone native - PowerShell-native, that is.

We did not update all code paths in PowerShell.
I asked in .Net Runtime and they said they saw no reason to make this type native and I agreed because it is still an edge case.

@iSazonov, what code paths weren't updated and how does that affect the end user?

Why should it matter whether it is a _primitive_ type or not?

Clearly, we've decided to support this type in a PowerShell-idiomatic fashion, by allowing it to be used in number literals.

The _current behavior_ amounts to a subtle source of bugs: the number you _think_ you'r expressing in your source code _quietly turns into something else_, with loss of accuracy.

If, for some reason, the _implicit_ promotion to [bigint] is deemed too risky (again: why?), the proper solution would be to _report an error_ (and force users to use n explicitly) - but that would be a _problematic_ breaking change.

@vexx32, from what I can tell .NET (at least via C#) doesn't have implicit type promotions (_update_: in _calculations_) at all, so you either get an _overflow exception_ (long.MaxValue + 1) or you _let the overflow happen_ with an unchecked statement - but then you stay within the original type (and therefore the calculation doesn't work as intended).

So I assume that _PowerShell_ chose those widening rules, and unfortunately it chose different rules for number literals vs. calculations.

Note that even going beyond [long] can result in loss of accuracy:

# LHS is converted from [double], resulting in loss of accuracy: 922337203900225945*6*
# RHS is accurate, because the entire operation is [bigint]: 922337203900225945*4*
PS> [bigint] ([long]::MaxValue + [int]::MaxValue) -eq ([bigint] [long]::MaxValue) + [int]::MaxValue
False

@mklement0

@vexx32, from what I can tell .NET (at least via C#) doesn't have an implicit type promotion at all, so you either get an _overflow exception_ (long.MaxValue + 1) or you _let the overflow happen_ with an unchecked statement - but then you stay within the original type (and therefore the calculation doesn't work as intended).

For parsing literals it does:

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQBMQGoA+ABATARgFgAobAZgAI8KBhCgbxIuasuwBYKBZACgEoGTFsIBuEBBRgBPAA5wKAXgoAGAB65lmrcoDcQ4czETpc3IpXrtV63uLCAviXtA

Anything higher than long and it throws though.

C#/VB/F# compiler playground.

Thanks for the demonstration, @SeeminglyScience, but we were talking about _calculations_, not number-literal parsing.

With respect to number-literal parsing _PowerShell already does its own thing_ (never picking _unsigned_ types, and allowing values larger than [long]), so we're not tied to C#'s behavior.

what code paths weren't updated and how does that affect the end user?
Why should it matter whether it is a primitive type or not?

There are some code paths for primitive numeric types in Engine. In the mentioned PR we discussed should we fix all ones
and a conclusion was we shouldn't.

I see, @iSazonov.

Given that our intent is clearly to support [bigint], separate from the issue at hand:

  • How do these code paths affect the end user? What doesn't work?
  • Do we need to document these limitations?

Separately:

  • Why do any potential limitations in our [bigint] support suddenly become _more_ of a problem if we implicitly parse too-large-for-[decimal] integers as [bigint]?

IIRC we don't _need_ to change all the code paths, and we invite more trouble by doing so (mainly because BigInteger is one of the few numeric types that _isn't_ capable of being handled by the IConvertible interface). There are a couple of more impactful ones where we can make explicit exceptions for bigint and get the same result without nearly as much hassle.

The remaining work for that is already done if memory serves, just pending review and a decision and whether or not we want to accept it: https://github.com/PowerShell/PowerShell/pull/12629

As such, I'm reasonably in favor of allowing this kind of change, myself, it just complements the existing allowances for bigint and saves some confusion where high-order numerals are desired. Yes, they can use the n suffix, but they have to know they can go looking for it first. IMO it makes more sense to just give them a bigint first and if they want to cast it later they can do so without too much hassle.

BigInteger type does not implement IConvertable. So PowerShell can not do a generic conversion like [bigint]1 to [int]. Perhaps there are other not implemented interfaces (like IFormatable). As result we would have to fix all code path there numerics are processed. Since BigInteger is an edge case a conclusion was does not to do this.

[bigint] 1 -is [IFormattable] is $true, and [bigint] also implements IEquatable and IComparable.

Again, if we need to make [bigint] more usable in PowerShell, that seems worth doing, and that's what @vexx32's PR is about.

By supporting the n suffix we have clearly signaled that we do _not_ consider use of [bigint] an edge case.

Drawing the line at not supporting _implicit_ [bigint] literals is arbitrary, and it prevents us from implicitly fixing the _treacherous_ current behavior (potentially undetected loss of accuracy).

I have scripts that would break with this change. (Granted they are recreational, and not for work.)

I have often wished that integers that grew too large would natively become [bigint], but simply making that so isn't an adequate solution in many cases, as [math] methods can't handle [bigint] parameter values. Sometimes the tradeoff of loss of accuracy is preferable to the loss of functionality.

Sometimes the tradeoff of loss of accuracy is preferable to the loss of functionality.

I think the current quiet loss of accuracy is insidious - a source of subtle bugs, so the change is worth making for that reason alone.

There is no loss of functionality - you can always down-cast to [decimal] or [double] if you want to call a [math] method.

As stated in the OP, the change is technically breaking, but I still see this as a bucket 3 change, especially given that it's likely that code that relies on the current behavior is conceptually flawed (lack of awareness of the loss of accuracy).

Yeah, that's a very good point as well. BigInteger is a more specialized type, and it doesn't have the same common support throughout .NET's math or conversion classes.

It _does_ have some alternate math methods on BigInteger's class itself as static methods, but it isn't a complete 1:1 compatibility with everything that's supported with the more general-purpose number types, and still requires some rewriting of scripts.

@vexx32:

If you write code with number literals that you _know_ to result in loss of accuracy - yet you need the [double] or [decimal] type for library calls - you should either signal that with, say, [double] 99999999999999999999999999999 or, preferably, run [bigint] [double] 99999999999999999999999999999 and then use the true resulting integer as the source-code literal: [double] 99999999999999991433150857216 - accompanied by an explanatory comment.

Also note that the [math] problem would only arise for number literals too large to fit into a [decimal], only in which case [bigint] is called for.

The current behavior has no redeeming qualities that I can see.

Sure, but most folks don't know / necessarily need to be aware of that at the moment; it "just works" in many ways (excepting if you need precision, which I'd _like_ to think concerned folks would have noticed anyway, I suppose).

Currently if you have a script that uses a large literal and somewhere along the line you're using a [math]::Pow() or any other [math] static method, changing this would break that. Whether that's worth doing / a bit too confusing for folks to run into / etc, I don't really know. I don't think the impact would be _huge_ but it would be an unexpected and somewhat inexplicable break for the folks that do hit it. The errors you encounter there tend to be pretty opaque (e.g., could not find an overload for "MethodName" and the argument count: "X" -- no mention of mismatching types in some cases) and so finding what's actually changed is a bit tricky at best.

Subverting the user expectation that a number literal such as 99999999999999999999999999999 isn't used _as such, accurately_ is something that should never have been implemented: it is counter-intuitive, happens quietly, and can lead to subtle bugs - I am not at all optimistic about concerned folks having awareness.

  • Fixing that by _parsing error_ would be a _problematic_ breaking change - and needlessly limiting.

  • By contrast, widening the type automatically - to [bigint], if [decimal] is too small - is in the spirit of PowerShell and is useful.

    • As noted, the mere fact that we provide a [bigint] _type suffix_, n, indicates that we consider [bigint] a first-class citizen, and just like you don't have to use, say, 2147483648L with the L _explicitly_ to get a [long] - just 2147483648 is sufficient - you shouldn't be _required_ to use n - 99999999999999999999999999999n - to get a [bigint].

Yes, existing code can break, but the benefit of the change to me far outweighs the risk, especially considering that:

  • only number literals starting at 79228162514264337593543950336 ([decimal]::MaxValue + 1, on the order of 8e28) are affected.

  • existing code may still continue to work, depending on how the numbers are handled; yes, [math] calls would break, but not regular arithmetic operations such as * and +.

I take it back. Sorry. I was thinking of something else.
I misinterpreted the title of the issue and, despite reading the full issue and comments, failed to notice what the proposal is actually about. I don't have a preference for parsing the number literals discussed.
(Just don't change the behavior of [int64]::MaxValue+1.) (Well, you can, but I'll whine about it a little bit.)

Thanks, @TimCurwick.

I find [int64]::MaxValue+1 returning a [double] - i.e., the behavior in _calculations_ as opposed to the parsing of _number literals_ - quite unfortunate as well, but it is indeed a separate issue.

Making calculations use the same type-widening rules as the parsing of literals would make much more sense, but I do realize that this would be a more severe breaking change.

Was this page helpful?
0 / 5 - 0 ratings