Go: proposal: spec: change int to be arbitrary precision

Created on 20 Mar 2017  ·  114Comments  ·  Source: golang/go

An idea that has been kicking around for years, but never written down:

The current definition of int (and correspondingly uint) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:

  • overflow when constants like math.MaxUint64 is automatically promoted to int type
  • maximum size of a byte slice is only half the address space on a 32-bit machine
  • int values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)
  • great care must be taken with conversion between potentially large values, as information can be lost silently
  • many more

I propose that for Go 2 we make a profound change to the language and have int and uint be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)

The advantages are many:

  • int (and uint, but I'll stop mentioning it now) become very powerful types
  • overflow becomes impossible, simplifying and securing many calculations
  • the default type for len etc. can now capture any size without overflow
  • we could permit any integer type to be converted to int without ceremony, simplifying some arithmetical calculations

Most important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.

Go2 LanguageChange NeedsInvestigation Proposal

Most helpful comment

I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32 or the unsigned word (like the current uint) mentioned by @griesemer?

All 114 comments

I'm a big fan of this, myself. It would elevate int (and uint) to "unrestricted" (for lack of a better word) types, and only the types with explicit sizes (int16, int32, etc.) would be subject to wrap around.

In many cases (loop iteration variables) the compiler may be able to prove that an int is in a certain range and could produce optimal code. In many other cases, the use of int is not speed-critical.

Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves...

A minor related point about this: The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform (strconv.Itoa comes to mind). We may want to consider introducing an unsigned "word" type instead (in the past we have also used uintptr in that role, but that may not necessarily guaranteed to be of register size on all platforms).

Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't.
The only easy solution I see is to reserve some top bit(s) to set for raw integers and disallow those top bits for any heap pointer.

@randall77

Two proposals enter, one proposal leaves...

Actually, I think they're completely compatible. I recommend both!

@randall77

the GC sees a word which is sometimes a pointer and sometimes isn't

Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit.

FWIW, the ML family of languages worked around that issue by making the native types int31 and int63 instead of int32 and int64. I do not recommend that approach.

@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it.

I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops.

@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an int on a 64bit platform effectively 62bit in the common (small) case - which is likely more than plenty in most cases (and where it isn't it's either because we need very large integers, or we should be using int64).

One way of using the tag bits is as follows:
00 means small integer (smi)
01 means that the value is a pointer p to an arbitrary precision integer (at p-1)
10 typically the result of an operation (see below)
11 reserved for other uses or unused

Given this encoding, if we have two int values x and y, we can optimistically assume they are smis. For instance x + y would be translated into a single ADD machine instruction, followed by a test of the tag bits. If they are not 00, one (or both) of the operands were large ints and then a runtime call is needed. Also, if there was an overflow, a runtime call is needed. This does add 3 instructions in the fast path, but not more (one conditional jump if overflow, one test of tag bits, one conditional jump if tag bits are not 0 - perhaps more depending on how the runtime call is achieved or if there's a conditional call instruction). It's not cheap, but it's much less expensive than a runtime call in the common case. If both operands were smis, the tag bits remain 00, and the result doesn't need further work. The same is true for subtraction. Multiplication requires a bit more work but is also more expensive. Finally, division is the most complicated one, but also the most expensive operation in general.

It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is.

Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is.

Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)?

Can you discuss how such int variables would represented in mediums other than RAM, and marshaled/unmarshaled?

Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large int would encode as as base 10 number (and vice versa for decoding). E.g.:

{"number": 12312323123131231312312312312312321312313123123123123123123}

Would map to an int with value 12312323123131231312312312312312321312313123123123123123123.

What about something like encoding/gob or encoding/binary?

@shurcooL
1) printing is obvious (the usual human-readable forms) - applies to JSON
2) encoding/gob could use a private internal representation
3) encoding/binary is for fixed-width numbers at the moment, the proposed int's wouldn't be - but a var-int encoding could work (though the current format would probably be inefficient).

Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants.

@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types.

Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size".

The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this.

Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types.

@jasonwatkinspdx

It's true that one of the roles of int in Go 1 is as an analogue to C's ssize_t. But it doesn't actually matter whether int is _exactly_ the size of the address space or merely _sufficient_ to cover the address space.

Perhaps more importantly, C APIs don't actually _use_ ssize_t all that often: they use size_t instead.
You have to range-check conversions from C.size_t to int, because the latter is potentially only half the range of the former. Under this proposal, you'd need the range check in the other direction instead, because C.size_t would now be smaller than int. The only way to avoid the need for a range check entirely is by making the Go "size" type have _exactly the same range_ as C.size_t, and at that point you're looking at a fairly high risk of overflow bugs (especially from decrementing past zero).

@bcmills

But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic?

I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types.

People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values.

Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision?

The same thing it looks like today: using C.size_t instead of int.

@robpike
It looks harder to reduce CPU of golang program when ints become arbitrary precision. 😝
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

programs using raw syscalls generally need to validate their arguments anyway.

I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming.

I honestly thought it's 11 days more than it really is.

I don't want to lose normal int performance. I don't believe there's a space/time effective way for an arbitrary precision int to be, in many cases, comparable in performance to the fixed width int.

I have nothing against adding an arbitrary precision mpint type (name does not matter), which the compiler accepts mixed in expressions with normal ints providing the conversions as needed. IOW, it would be really nice to be able to use standard operators with arbitrary precision integers, yes, but please only explicitly. Please leave int alone.

What about floats? JSON can have values like 12345678901234567890.12345678901234567890, with many digits before and after the dot, but IIRC no Go type can accurately represent those. That is, no Go type can do maths with it, I know about json.Number.

Personally, I would keep int as it is and instead added a number type that could represent any number with or without a fractional part.

Please do not make int operation slower by default.
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

I do not need some like mpint with Infix expression eithor, *big.Int is enough for me.

I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32 or the unsigned word (like the current uint) mentioned by @griesemer?

I think this might be a good change, but I do have some concerns regarding bitwise operations and performance impact. In my experience, int tends to be the default type for numereic values which makes me a little hessitant without seening before and after numbers.

I am 100% for this proposal. Worrying about the performance impact of arbitrary precision ints is like worrying about the performance impact of array bounds checks, in my opinion. It's negligible if implemented correctly.

I also like this because of how inconvenient the big package is and how I always fall back to Python when I'm dealing with big integers.

For an arbitrary precision uint, what is uint(0)-1 ?
Maybe mpint is enough.

Imho mpint is a very bad idea. No one would use it, except for when they really need it, because int is nicer and feels more standard. Since the majority of libraries would still use int instead of mpint, integrating two libraries, one of which uses int and the other one uses mpint would become cumbersome.

Also, it would be entirely unclear whether to use int or mpint in common situations, which contradicts Go's philosophy, which empathizes clarity and tries to reduce programmer's mental overhead.

@faiface If we are talking about Go 2, by then we should have a better type system anyway. At the same time I would really glad if my arithmetic operations would not suddenly allocate memory if a combination of my integer inputs has resulted in N > n^64. Overflow I can live with - this is my problem.

Again - currently instance of something like

type MyIntCombination struct {
    FirstInt     int
    SecondInt   int
    ThirdInt    int
}

has determined (during compile time) size and has good performance because of the alignment. If they suddenly become pointers - the cache locality would be lost.

All of this can be solved using machine dependent int, but I fear that this change would cause fragmentation in our community between those who prefer explicitly sized ints and the ones who prefer the arbitrary precision. For me, It would mean the strict restriction on using int anywhere in my company code.

But, alas, without working prototype it's all just a speculation. We all remember how alias proposal went.

While the cost of doing an arithmetic operation on two arbitrary precision
ints is important there is also the cost to the garbage collector that
needs to be considered. We probably need an extra bit on each heap word to
indicate whether it is a tagged word. I suspect we can be clever and it
won't be a full bit per word but being clever has a way of complicating
things. Even if we are memory clever, if we have large arrays of arbitrary
precision ints the GC will need to visit each of them to verify that they
are immediates and not pointers. In dynamically typed languages like Lisp,
JavaScript, and Smalltalk this isn't a big deal because the GC had to visit
the words anyway so the GC's marginal cost was minimal.

Furthermore, the cost to the GC is somewhat amplified because it happens
during the mark phase when the GC is already using CPU otherwise available
to the application. The less time the GC is in the mark phase the easier it
is for the application to get the CPU it needs to meet its deadlines, the
less floating garbage, and so forth. This cost is hard to measure but it is
real.

Getting a sense of these systemic costs isn't intractable but needs to be
done before we can claim we know the full cost of arbitrary precision ints.

On Tue, Mar 21, 2017 at 7:49 AM, Tamás Gulácsi notifications@github.com
wrote:

For an arbitrary precision uint, what is uint(0)-1 ?
Maybe mpint is enough.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623#issuecomment-288055090, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn_Uq14yr57DLGVxTB0LPA0ubrdVcks5rn7lngaJpZM4Mi7Wo
.

Edit: I was confused if the new int would be immutable or not.
It is immutable.

@champioj It needs to be immutable, otherwise it's going to be impossible to use correctly.

@tgulacsi

For an arbitrary precision uint, what is uint(0) - 1?

A very good question. I would argue that that implies one of two things, and the choice depends on whether the difference between uint and int is one of space-efficiency or program invariants.

If the difference is due to space-efficiency, then uint should not exist. An arbitrary-precision int is capable of representing all of the values of an arbitrary-precision uint without loss, so uint is redundant.

If the difference is due to program invariants — the ability to express "a nonnegative integer" — then subtracting a positive value from uint(0) should panic (and IMO that would argue for also adopting #19624 for consistency).

Being fast and close to the metal is a great advantage of Go1. Many other languages start with hard to optimize semantics and suffer bad performance forever, shackled by backward compatibility.

It's true that _JIT compiled_ languages can get away with complex semantics and still be relatively fast: they speculate, profile, optimize and reoptimize at runtime. Yet they rarely if ever reach C speeds and always at a cost of great complexity.

Offline compiler can't speculate much, unlikely paths still have to be compiled. This extra code hanging off integer arithmetic ops will inhibit optimizations, bloat the binaries and slow everything down universally.

Hidden cost of int will be especially glaring if fast and unencumbered int64 is easily available, but unnecessarily ugly (casts).

Other languages in Go's domain don't have this feature because it's too expensive (C), few people need it (Java, JavaScript) and there are _library types with overloaded operators_ (C++, C#, Scala, Rust, Swift, D).

Languages that do have built-in arbitrary-precision arithmetic are very high level, dynamically typed, relatively slow and often math focused (Python, Ruby, Erlang, Haskell, Scheme, Mathematica and other CAS).

Re: cited benefits.

  • Security and correctness: dangerous operations on int (overflow, truncation) should just trap, because they are likely to be bugs or fatal errors. Silently propagating astronomical values throughout the program isn't any better than silently truncating them. The underlying problem will be obscured either way. I'm sure many Python programmers will agree.

  • Representing object sizes: objects larger than half of address space should be simply forbidden. Maybe heaps larger than that should be forbidden too. It's not a big deal considering other GC limitations.

  • Safety of potentially overflowing operations: I'd reach for int128 or int256 first.

@nnemkin

If I can easy disable this functional at program level and package level and function level,I think the arbitrary precision can add to golang 2.0 and make them default.
Bloat the binaries to 120% is not a big problem for me.

I like this idea.

I also think floats should be decimal and arbitrary precision by default, since so many programmers use binary fixed precision floats in inappropriate situations, such as for currency values. There should be binfloat64 and binfloat32 for the special purpose high speed approximate binary floating point numbers of IEEE 754.

@lpar I disagree about the floats. Decimal floats have the same, fundamentally unavoidable problem. You are just going to swap people being confused about f/10 with people being confused about f/3 (or sqrt(f), or…).
Computers just fundamentally can't handle real numbers and trying harder to pretend they can won't solve this.

@lpar What would it mean for floats to be arbitrary-precision? In particular, how would you indicate the precision of a float literal? (big.NewFloat in today's standard library implicitly uses the precision of float64.)

It's also worth noting that, unlike int, there is no float type in the language today: users of float must already choose float32 or float64.

Some other kind of floating-point representation — interval arithmetic or unums or some similar approach — might be interesting to consider as alternatives, but that belongs in a separate proposal: it's more-or-less independent of the choice of representation of the int type.

Sorry, didn't mean to imply that floats belonged in the same proposal. That was just an aside, I was floating the idea of decimal floats in case anyone else thinks it would be good to knock out another common source of error as a separate proposal.

Why make it Go2? Would it not be better to have that kind of numeric type today in Go1? Select a name ('integer' or 'bigint') that does not collide (often?) with existing go code and use it to introduce such type today. This way if and when Go2 happens we will have experience with dealing with such type :)

@tumdum, because we would only do this if we could do removals and simplifications at the same time to pay for its addition. We don't want to be a language that just keeps accreting features.

@bcmills One choices for arbitrary-precision floats is "Constructive Reals" which have become much harder to demo since browsers quit supporting Java. See http://www.hboehm.info/crcalc/ .

Or also https://android.googlesource.com/platform/external/crcalc/

Basic idea is that a + b returns a function from requested precision to a string of digits. Crcalc does lots more than just addition. One slightly non-intuitive problem with constructive reals is that certain operations take longer than the naive programmer might expect -- if a and b are actually equal but in a way that the built-in hacks cannot determine, (full) comparison is a non-terminating operation.

I have made practical use of these; if you were interested in knowing things about the quality of floating point operations (e.g., transcendental functions) you can use constructive reals to obtain N digits of truth. A real live numerical analysis prof (John Dennis, then at Rice U.) once proposed using them in error term calculations, because checking the error often costs a factor of N fewer operations than computing the "answer", and using exact arithmetic to calculate the error counts out a layer of estimation, resulting in a tighter error bound. We actually implemented this in a Fortran interpreter, and one amusing and not-too-surprising result is that constructive arithmetic on poorly conditioned matrices costs more (because you have to pull more bits out to prevent cancellation errors).

Please let's keep this focused on the subject at hand (arbitrary precision ints).

I am afraid that compiler could not help much if the arbitrary precision integers are used in function parameters. Doesn't that mean the parameters must always be boxed?

That said, I do strongly hope an integer type, besides int and uint, be added to Go1. A lot of code could then be simplified.

@typeless An arbitrary precision int type would always use one word (32 or 64bit) when passed as a parameter. See https://github.com/golang/go/issues/19623#issuecomment-287930352 for a possible implementation.

I think there will be a cost for programs overall, but I also think the fears of huge slow-down are overblown.

Is the following proposal feasible if this feature gets implemented?

make "range aChannel" also return two values, one message index and one message value, so that the forms of all "range aContainer" be consistent.

@Go101 No. The problem with index over channel is not there, it's just useless. If a channel emmited 1,000,000 messages a second, a int64 counter would overflow after ~300,000 years.

@faiface

a int64 counter would overflow after ~300,000 years.

Yes, I think this is one reason in theory why "range channel" only return one value in Go1.

The type of the index returned by range is int, not int64.
So if this proposal is implemented, then there will be not the channel message index overflow problem any more.

The index is not totally useless. It is really useful sometimes.

@Go101 int is equivalent to int64 on most machines. When is it not useless?

For example, there are M senders and N receivers, when one of the receivers receives a K+th messages, it will notify all senders and receivers to stop messaging. Or print a log for every K messages.

@Go101 Questions of overflow notwithstanding, assigning a unique index to every send and receive operation would introduce a sequential bottleneck (similar to the one in #17973). The spec as it stands today only requires a happens-before relationship between sends and the corresponding receives, allowing for looser synchronization and better parallelism.

With a fixed-size int type you could at least hope for making the counter update an atomic instruction, but with arbitrary-precision int even that goes out the window.

@bcmills
I'm not very understanding the issue your referred. But I think there is no needs to assign a unique index to every send. Just adding a numReceiveds field for the internal channel struct is ok.

Could you please discuss any range-expression changes for channels in a separate issue/thread? It's at most tangentially related to this proposal and thus adds noise for people who are subscribed because they are interested in the discussion of arbitrary precision ints.

same as @faiface, I also fall back to Python when deal with big numbers. Once I had a mind to extend the "math/big" package with an Eval function.

func Eval(s string) (*Float, error)

not good but at least clearer

// numbers here can be big numbers
n, err := Eval("(1 * 2 * 3 + 4 * 5) + 6")

If this can be done efficiently in language level, why not

Why does this new int need to be immutable?

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn.

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn.

If it was up to me I would name the arbitrary-precision integer types as:

  • aint
  • auint

Does it make sense to have "unsigned" integer if it's arbitrary precision ?

No. What is auint(0)-1 ?

Pierre Durand notifications@github.com ezt írta (időpont: 2017. jún. 5.,
H 15:49):

Does it make sense to have "unsigned" integer if it's arbitrary precision ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623#issuecomment-306191784, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPoSkoYNpsKh-zq6Owiu2YBekMYrCkIks5sBAdigaJpZM4Mi7Wo
.

@pierrre, @tgulacsi, we've been over this already (https://github.com/golang/go/issues/19623#issuecomment-288229046).

@mahdix

Why does this new int need to be immutable?

Because all of the other numeric types are "immutable", in the sense that they do not expose changes through aliased values. For all of the existing numeric types.

x := y
x++

does not change the value of y. It would be surprising for that to be any different for arbitrary-precision ints.

I don't know what kind of software other people unknown to me are writing, but it seems to me that in most software in most use cases integer values never exceed 64 bits (32 bits). That is: integer values never exceed the bitwidth of the 64-bit address space (of the 32-bit address space).

The need for arbitrary precision integers is relatively small, especially in a performance-sensitive systems language compiled to machine code like Go, but it is nice if the language provides arbitrary precision integers.

It's not about need, it's about quality of language.

Presumably arbitrary precision constants would map nicely to ints/uints in Go 2.0 with this proposal. That would be certainly make the language more friendly, and do away with this caveat:

There is a caveat to constants of unusual size. Though the Go compiler utilizes the big package for untyped numeric constants, constants and big.Int values aren’t interchangeable. Listing 8.2 displayed a big.Int containing 24 quintillion, but you can’t display the distance constant due to an overflow error.
```
fmt.Println("Andromeda Galaxy is", distance, "km away.") // *
````
* constant 24000000000000000000 overflows int

- from Lesson 8, Get Programming with Go

Being able to pass a numeric constant or literal to a function and not worry about overflows or conversions would be very nice indeed.

The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform - @griesemer, https://github.com/golang/go/issues/19623#issuecomment-287903210

If needed, @wanton7's suggestion sounds good to me:

I really like this suggestion, but if this is added I think native int's need to be added as well. Maybe something like nint and nuint or intn and uintn. - @wanton7, https://github.com/golang/go/issues/19623#issuecomment-306189100

intn for "natural" or "native" and int32, int64, intn

overflow when constants like math.MaxUint64 is automatically promoted to int type - @robpike

Would it be useful/possible to have a math.MaxIntn constant determined at compile time?

Simply let int as it is but add a new type natural for explicit usage of it.

Why not introduce a new type instead?

@omeid Because to have more robust software in general (no overflows) you need to have it as default int type, so that everyone will use it. It's like vaccination, everyone needs to get on board to eradicate the disease.

Since the majority of libraries would still use int instead of mpint, integrating two libraries, one of which uses int and the other one uses mpint would become cumbersome.

@faiface Would #19412 address this concern? An addition to the type-system that lets a library accept both int and mpint could allow:

type integral union {
    int
    mpint
}
func acceptsEither(v integral)

@wanton7 Calling something with inconsistent, or generally slow, perfs robust is bold claim that requires bold justifications.

@omeid I'm not native english speaker so robust to me means more secure and stable software. I did write about overflowing so not sure how you could misunderstood me. Also I didn't write about performance at all and don't know what it has do to with software robustness, maybe you can explain it to me?

Upd.: This is a response to a comment that has been deleted.

No. There is no strict definition that is universally accepted, but in software, ‘robust’ generally means ‘correct in the face of failure or adverse conditions’, ‘hard to break’ [WikiWikiWeb], ‘capable of performing without failure under a wide range of conditions’ [Merriam-Webster], ‘able to cope with errors during execution and cope with erroneous input’ [Wikipedia], ‘able to recover from errors; unlikely to fail, reliable’ [OED].

I don’t think there is any connotation of efficiency or high performance. In fact, it is common to assume that the safety checks required to make a system robust may incur a performance penalty.

I don't like the specifics of the proposal--let's do more--but i like its motivation which is important in its several aspects:

  • A way to deal with values having precisions or ranges outside the normal hardware ability
  • A way to naturally perform high-precision or arbitrary-precision math

Some of my software does extensive computations using extended precision real and complex numbers (via math.big). Much of my software does extensive computation using extended precision integers (my own assembler). Some of my software does unlimited precision integer computation using math.big. Based on my own experience, it seems that this take on the proposal here could be better:

  1. Add int128 as a supported type. (super simple, handy CPU instructions with carry/borrow, fixed size, easily passed on stack, etc.)

In fact...consider int{number of bits = 82*bytes} (as in int8, int32, int128, int256) to have the following meaning: on machines where that's a natural hardware size, the natural code is generated, but on machines where that is not natural, then slower but still fast multiprecision code is generated. (Probably add/sub inline and mul/div via functions)

  1. Add float128 as a supported type. This is not so simple in terms how to implement.

You can do a fast but imperfect job with doubled-precision. I have libraries for this, several exist. 4x slower for 128 bits using the normal FP hardware and paired doubles. the precision is perfect and all IEEE modes are right, but, the exponent range is less than IEEE 128-bit FP.

Better would be the slower but IEEE-perfect big.Float in the 128-bit precision size. This is notably slower, but still fast, and harder to fit in. Maybe the answer is a special less-general internal implementation of big.float that is a fixed structure amenable to passing as a value, works as rvalue or rvalue, etc.

and by extension, consider float{number of bits = 82*bytes} (as in float32, float64, float128, float256, ...) to have the following meaning: on machines where that's a natural hardware size, the natural code is generated, but on machines where that is not natural, then slower call to math.Float is made, though perhaps reimplemented for thus purpose in a fixed-container of bytes form.

  1. Instead of reinterpreting 'int', i propose this: int stays as the natural hardware int size and new types, 'integer', 'floating' (or 'real'), and 'rational' are added. integer means arbitrary precision integer, floating (or real) means arbitrary precision floating point, and rational means the ratio of arbitrary precision integers.

This is not a fast path. The use of unsized integer, floating, and rational variables is a siren that what is happening may not be super fast. However, it is understood that in exchange they will be super general. Such an integer is what I think Rob wanted for generality...it is the typed variable counterpart of the way numbers are parsed in Go1.

There are subtleties to this. They are a challenge but surmountable:

First, type conversions might "fail" so it is necessary to consider how "x:=int32(integer(1<<800))" should work. Same for floating or float128 and above to float64, since the exponent range may overflow--should they go to zeroes and infinity? should it fail? should some form of type switch be used so that you can say convert if applicable else do this other thing?

Second, what do naked numbers mean. is x:=1<<70 an int128? an integer? a failure? Any can work, 'integer' seems natural but that's not a proposal, just an example of a question that needs answers as part of the design.

Summary:

This modification to the proposal at issue yields a wonderful programming environment, allowing code with integer, rational, and floating variables to be used naturally; supports fast-path hardware friendly coding with float128 and int128 (and larger sizes, handy for crypto and other purposes), and changes no meaning for the existing variable type int.

OCaml internally uses 63 bit ints, with the last bit indicating whether the 64 bit value should be interpreted as an int or a header for some boxed value. This is similar to some of the suggestions regarding how we should represent arbitrary precision ints in Go. Of course, such a choice has performance implications. We can learn from the experiences obtained using OCaml; for example, see:

https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/

These benchmarks show that we can expect a slowdown of at least a factor of 2-3 on regular 64-bit arithmetic. Personally, I think it's worth it.

  1. Should the encoding (2 or 3 bits or something else) to accomodate bigints be left up to the implementation? This is what Scheme/Lisp impls do.
  2. To avoid surprises it may be worth adding types integer and uinteger for arbitrary precision that any integral type can be converted to "without ceremony". Note that current {,u}int{,8,16,32,64} essentially do modular arithmetic (even though normally we tend to ignore this fact) so fundamentally different from arbitrary precision ints.
  3. Why have arbitrary precision unsigned integer type at all? Converting /to/ arb. prec requires no extra work. Converting /from/ arb. prec requires that such a number can be /narrowed/ to the lesser type, which can include sign checking.
  4. Might as well as add rational types (e.g 355/113)! Point being, rather than just look at arb prec. int and possibly do piecemeal evolution, why not take another look at all the numeric types (the "numeric tower") and learn/steal from what CL and Scheme have done?

@bakul Regarding your point 1: Different implementations may choose different internal representations, and implementations may evolve. If we were to do anything here, the encoding absolutely should be left to the implementation.

@bakul

Why have arbitrary precision unsigned integer type at all?

See bullet point 3 of the initial comment: Current int types can silently overflow. This is a common source of severe bugs. There are two options for avoiding silent (u)int overflow:

(1) make ints and uints error out on overflows (difficult, creates new problems, e.g. the need for testing for errors after every addition or multiplication), or

(2) make them arbitrary precision (u)ints (easier & more intuitive for the user).

(Algorithms that require overflow as part of their logic should use the sized variants anyway (from (u)int8 to (u)int64.)

@christophberger

make ints and uints error out on overflows … e.g. the need for testing for errors after _every_ addition or multiplication

My current proposal in #19624 would allow you to check for overflow on an entire “expression consisting of arithmetic operators and / or conversions between integer types” at once. You really only need one check for overflow anywhere in the expression, in much the same way that you can do arbitrarily many floating-point operations with only one NaN check at the end.

It's true that arbitrary-precision arithmetic lets you combine checks even more broadly (across function calls, for example), but that flexibility also increases the run-time overhead when values exceed the expected range.

@bcmills

Agreed, a behavior analogous to NaN (that is, allowing to postpone the overflow check until the end of a long calculation) simplifies handling overflow errors.

I also agree on the possible runtime overhead, but when calculations do exceed the range (which should occur pretty rarely in many common scenarios), I guess that most users would prefer observing a slowdown rather than errors.

@christophberger

To clarify, my point was that an arb prec. int would be a superset of arb prec uint so why have two types. Implementation wise I can see a difference (particularly if you use a tagged word). But I am coming at this from a different angle: adding bignums just to deal with overflow seems a big change for a little gain. Introducing bignums should be based on their overall usefulness. And from that perspective the rest of the numeric tower should also be considered (e.g. rationals and bigfloats). Finally from a user perspective unsigned bignum doesn't bring anything more useful (except that one more place you run into potential type mismatch).

@bakul

Thanks for clarifying, I missed the "unsigned" in your 3rd item somehow. Indeed, as per this comment, unsigned arbitrary precision integers are redundant and also fail when calculating uint(0)-1 (as the only sane result for this is a runtime panic).

Regarding signed arbitrary precision integers, I think the gain is more than just "little" - the initial comment already lists three advantages (the first one of the four-bullet list does not count).

In the end, it depends on whether this can be implemented with acceptable efforts and without complicating the runtime (the GC in particular) too much.

I don’t see this proposal and #19624 as mutually exclusive. Arbitrary precision integers would interop well with constants and avoid overflows as an easy to use numeric type. Meanwhile int64 and friends give more control over memory layout... and could be extended to give control over overflow behaviour (wrap, saturate, panic, error, etc.).

Possibly relevant: v8 just added an experimental BigInt type with first-class language support https://v8project.blogspot.com/2018/05/bigint.html

The announcement doesn't go too deep into implementation details, so it isn't clear whether they are using a tagged representation or not.

Don’t you need to do this for float/double as well then... analogous to BigInteger and BigDecimal in Java? Why does this need to be part of the language... could just copy those classes from the JDK, but sparse structs could be implemented to make them even more efficient.

@robaho IMO math.Log(x) is a natural operation to want to do for floating point numbers but not one that can be done with arbitrary precision. I.e. you won't get around the need to have a fixed precision "natural" floating point type.

No reason why you can't have math.Log(x, prec) to compute a log to specified precision using arbitrary precision values.

That's the same as having x already holding its precision.

mathew notifications@github.com ezt írta (időpont: 2018. aug. 16., Cs,
15:57):

No reason why you can't have math.Log(x, prec) to compute a log to
specified precision using arbitrary precision values.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623#issuecomment-413554173, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPoSqfVe4yZ57cJ2_56dw9CNXDGlsTKks5uRXpPgaJpZM4Mi7Wo
.

For that operation, yes. That's the point. A single operation being limited precision doesn't mean you have to cripple your data type to support it.

You'll have to "cripple your data type", as some numbers cannot be
represented with rational numbers or with finite number of digits / bits.
(Pi, 1/3, √2)

mathew notifications@github.com ezt írta (időpont: 2018. aug. 16., Cs,
16:07):

For that operation, yes. That's the point. A single operation being
limited precision doesn't mean you have to cripple your data type to
support it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623#issuecomment-413557329, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPoSu2ApCNArOT1HPhyVevuA76pZAllks5uRXydgaJpZM4Mi7Wo
.

By that argument, since we can't represent pi exactly as a BigDecimal, we should remove BigDecimal from the language and only have floats.

That's a silly argument against BigDecimal.

Let's please move the discussion about arbitrary size floating point types to a different issue. This issue is about changing int to be an arbitrary sized integer type. In particular note that the language has no float type. Thanks.

Since the language doesn't have an integer type either, how about making integer the type of arbitrary-precision ints rather than changing int? That would avoid breaking existing code.

@ianlancetaylor sorry, wasn't trying to hijack/confuse. I was only only trying to point out that why not have double and double64 then, with double being arbitrary sized. just seems wrong special casing it for int (rather than just using a BigInteger class). I would think operator overloading would be cleaner and more useful (although not a big fan of most uses of operator overloading...), but that is another can of worms

What would happen to big.Int if this gets accepted?

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed.

@ianlancetaylor On the other hand, it seems to me that one of the advantages of big.Int is that is provides greater control for managing memory. What would be the equivalent of Int.Set? Does = do deep or shallow copy? I'm asking because perhaps "dest ← op1, op2" of big.Int is more flexible but its functionality would overlap somewhat with this proposal.

This issue was a concern about big in early Go days but the pool management
scheme suggests a new try is in order.

On Thu, Nov 29, 2018 at 3:30 PM TuomLarsen notifications@github.com wrote:

@ianlancetaylor https://github.com/ianlancetaylor On the other hand, it
seems to me that one of the advantages of big.Int is that is provides
greater control for managing memory. What would be the equivalent of
Int.Set? Does = do deep or shallow copy? I'm asking because perhaps "dest
← op1, op2" of big.Int is more flexible but its functionality would overlap
somewhat with this proposal.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623#issuecomment-443032772, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHgypRSYph7CMslf5mAV9v4W_yWV_4egks5u0G4BgaJpZM4Mi7Wo
.

--

Michael T. Jonesmichael.[email protected] michael.jones@gmail.com

Upon reflection, I think this change would break to far from the languages C/system heritage. Better handled in libraries IMO.

@TuomLarsen I'd expect it'd work just like with strings today. Both strings and ints are immutable, so = would do a shallow copy, but all operations (+, +=, etc) would make a new instance by default. Of course, compiler would optimize cases where it's unnecessary to make a new instance and instead rewrite an old one.

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed. https://github.com/golang/go/issues/19623#issuecomment-443023550

What is the interplay between this proposal and #20654 math/big: support for constant-time arithmetic?

@nathany #20654 is about specialized large integer arithmetic where operations (such as +, *, etc.) always take the "same" amount of time independent of the input values. For instance, if huge represents a large integer value, huge + huge must take the same time as huge + 0 (even though in the latter the algorithm might be able to tell right away that there's nothing to do because one argument is 0). Making such operations' runtime independent of the arguments makes it impossible to "guess" arguments based on runtime. When such guessing is possible, this "side channel" (the timing of operations) becomes a valuable data source for crypto hackers.

There's no need for such constant-time operations in the language implementation. Just the opposite is true: we don't want constant-time operations because this proposal assumes that we can do all kinds of tricks to avoid the cost of a large integer operation: Most integers will be small, and in those cases the costs of operations should be close to what they are now (and inlined in the generated code).

I see #20654 as a proposal for a specialized library, built from the ground up (and tested as such) for crypto. I think it would be very hard to retro-fit big/int to support both (optimized and constant-time operations) and be sure to get it right in all cases.

I would like to add another data point which I just encountered. Provided there was an exponentiation operator, one could write

-3*x**5*y

instead of

new(big.Int).Mul(big.NewInt(-3), new(big.Int).Mul(new(big.Int).Exp(x, big.NewInt(5), nil), y))

@TuomLarsen I would like to point out that the latter example does not have to be that bad, it could easily be

NewInt(-3).Mul(Exp(x,NewInt(5))).Mul(y)

The required syntax you cite is somewhat a result of not using 'dot imports' for common core numerical classes (which I've discussed before). I think it makes sense, others argue against it, but anyway

And with method overloading it could easily be: (if x & y are already big.Int)

Int(-3).Mul(x.Exp(5)).Mul(y)

Still not great, but making code like this readable needs to be part of the API design from the start.

Not clear to me if this proposal is going anywhere but on rereading I realized I forgot to mention one point in my earlier comments: A separate arb. prec. integer type (hence forth called integer) would help the GC (at some cost to the programmer). With an explicit integer type with no automatic promotion of int variables to it, the GC knows there are no pointers in an []int. For []integer it would have to check each word if tagging is used to minimize allocation. But whether to use tagging or always allocate storage is up to an implementation and only affects the integer type.

A relevant case: Dart had switched from arbitrary precision integers in Dart 1 to fixed 64-bit integers in Dart 2. See their feature spec: https://github.com/dart-lang/sdk/blob/master/docs/language/informal/int64.md.

For precompilation it's not possible to generate the mint/bigint code lazily. Typically, the generated code only contains inlined SMI instructions and falls back to calls when the type is not a SMI. Before being able to use the SMI code, instructions have to be checked for their type (null, or Mint/BigInt) and after most operations they need to be checked for overflows. This blows up the executable and generally slows down the code. As a consequence, Dart 2.0 switches to fixed-size integers.

Seems like their issue was that the JIT could handle the overflow but precompilation resulted in overly bloated code. Anyone want to do a PoC for Go to see if it’s too slow/bloated or not?

i'd like to distinguish between two things that i think would both be useful:

  1. an integer type which can be of arbitrary size.
  2. the ability to declare arbitrarily-sized integer types.

i think these are both useful, and fill different needs. i have more than once wanted the ability to declare int128 or int256 objects and have them Just Work -- and that's much simpler than arbitrary-precision ints. and it might allow better performance, and work for many cases, but not solve some of the other cases.

both? both is good.

maybe allow every intxx/uintxx where xx has some constraints to it so the compiler can still generate code for it without needing a per-type implementation.

I've found myself often needing to do a single operation that overflows int64, and then immediately after, another that brings it back into the scope of int64, and I see this as being a reasonably common use case. I think operations such as

var int64 x,y, z
r := int64(bigint(x) * bigint(y) % bigint(z))

could be optimised more than blindly converting to a big int, performing an operation, and then converting to int64 again. In this case, it can be assured at compile time that x*y will not overflow 128bits, and an algorithm using an upper and lower 64bit integer could be used. I really like this idea, but would hate to see operations that just slightly overflow int64 reduced to the speed of typical big ints.

Stewart, your statement is a common truth from the FORTH language. There,
in early days with 16-bit integer stacks, it was often desirable and
sufficient to compute scales and rotations as:

z := pop()
y := pop()
x := pop()
t := int32(int32(x)*int32(y))
t /= int32(x)
push int16(t)

This is so very useful that it has had an operator from day one "*/"
("Star-Slash") and most memorably, a character (human personification) in
the panoply of Forth, Inc. gods. Star-Slash is a Japanese ninja, with
katana at hand:

[image: image.png]

He does the big multiply, then, using his sword, chops the giant number
back down to normal size:

[image: image.png]

Star-Slash has a friend, Star-Slash-Mod, who leaves the chopped off part on
the stack. Read all about it here:
https://www.forth.com/starting-forth/5-fixed-point-arithmetic/

Now in Go we have this:

var x,y,z uint64
:
hi,lo := bits.Mul64(x, y)
quo,rem := bits.Div64(hi, lo, z)

// quo is Star-Slash
// quo,rem is Star-Slash-Mod

Is this two-step, four variable solution enough for you, or do you want it
to be as clean as in Forth?

func StarSlash(x,y,z uint64) uint64{
hi,lo := bits.Mul64(x, y)
quo,_:= bits.Div64(hi, lo, z)
return quo
}

func StarSlashMod(x,y,z uint64) (uint64,uint64) {
hi,lo := bits.Mul64(x, y)
quo,rem := bits.Div64(hi, lo, z)
return quo,rem
}

Michael

On Sun, May 26, 2019 at 2:17 AM Stewart Borle notifications@github.com
wrote:

I've found myself often needing to do a single operation that overflows
int64, and then immediately after, another that brings it back into the
scope of int64, and I see this as being a reasonably common use case. I
think operations such as

var int64 x,y, z
bigint(x) * bigint(y) % bigint(z)

could be optimised more than blindly converting to a big int, performing
an operation, and then converting to int64 again. In this case, it can be
assured at compile time that x*y will not overflow 128bits, and an
algorithm using an upper and lower 64bit integer could be used. I really
like this idea, but would hate to see operations that just slightly
overflow int64 reduced to the speed of typical big ints.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19623?email_source=notifications&email_token=AB4DFJKK7LK4VP3REQHBVITPXJISZA5CNFSM4DELWWUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWIBR6I#issuecomment-495982841,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJPMPVYB3ZHJJVVGHPDPXJISZANCNFSM4DELWWUA
.

--

Michael T. Jonesmichael.[email protected] michael.jones@gmail.com

My additional(?) two cents: if we are not going to break stdlib compatibility in Go 2, then one reason to go through with this proposal would vanish: the weird inconsistency between uses of int and int64 when dealing with containers and I/O. Rob brought up containers in the original post when talking about len, so let me add the I/O bit:

Some interfaces of package io return an int, while others return an int64. This means there are type conversions involved when you try to use, for instance, one of the former inside the implementation of one of the latter. Actual example in code I'm working on while writing this: using io.Writer.Write in an implementation of io.WriterTo.WriteTo. I can remediate some of this by using a bytes.Buffer or bytes.Reader, of course, but that's just one example. Harder to remediate is using io.ReadFull in an implementation of io.Reader.ReadFrom, which will most certainly require a conversion somewhere.

As with the proposal on removing complex types from the language, I would have to again ask whether doing this in the compiler would improve performance compared to keeping things as a library, assuming that math/big.Int takes advantage of special arbitrary-precision-integer CPU instructions where available (Intel ADX is one such for x86, although I will always remember the MC68000's addx family of instructions instead).

If math/big.Int doesn't use these yet, I would encourage experimenting with seeing how much improvement we can get by using it within the standard library, as a proof of concept of whether this proposal is a good idea. (Of course, for compatibility reasons, actually doing this for real in the standard library is likely not going to happen.)

I still believe the advantages of this change are significant but I admit it's not going to happen.

Although I still think making int a real integer would be a valuable change to make, it's clearly not happening, certainly not as it needs to be to be successful, which is making it really the default integer type. People are terrified of the modest but largely avoidable expense of doing it, and it's just not the direction the language is going.

Maybe in the next language. Go got concurrency and garbage collection accepted in a compiled, efficient language. Maybe real integers will happen in the next one.

Upvote here for the 😢 (crying) reaction to the @robpike's comment, since that reaction is not available by GitHub.

Is it good to use an alternative name intn (or some alike names) to represent the arbitrary precision int?

@go101 If it's not the default integer type, it loses too much of its merit.

Was this page helpful?
0 / 5 - 0 ratings