Crystal: [RFC] Simplify integers and floats

Created on 23 Aug 2019 · 32Comments · Source: crystal-lang/crystal

Right now Crystal has a variety of integer and float types:

Integers: Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64
Floats: Float32, Float64

The default integer type when you don't use a suffix is Int32 and the default float type is Float64.

This kind of works but I imagine something better.

Int32 and Float64

Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.

So here's an initial idea: what if we name those types Int and Float? We would of course need to rename the existing base types Int and Float but that's not a problem, we can maybe call them IntBase and FloatBase or Integral and Floating, it doesn't matter much because those names won't be used a lot.

Then talking about ints and floats is so much simpler: just use Int and Float everywhere. In the case where you do need a specific limit, which is rare and usually only useful in low-level code such as interfacing with C or writing binary protocols, you can still use the names Int32, Int64, Float32 or whatever you need.

What to alias to

Now, we could make Int be an alias of Int32 and Float an alias of Float64, but maybe it's better if we make Int depend on the architecture. That means Int would be equivalent to Int64 in 64 bits architectures.

This is also how Go works. They recommend using int everywhere unless you have good reasons to use a specific size. It's probably the case that using Int64 by default instead of Int32 works equally fine (maybe even better because the range is bigger so overflow is less possible) without a real performance degradation.

Another nice thing is that if eventually 128 bit architectures appear all programs will automatically start using this bigger range (if we want to) without needing to change any code.

To alias or not

Now, we _could_ make Int be an alias of the respective underlying type, but I don't think that's a good idea. The reason is that if you have a program that does:

x : Int = 1
y : Int32 = 2
x = y

that would compile in 32 bits but would stop compiling in 64 bits. Ideally we'd like our programs to always compile regardless of the architecture.

So, we could make Int and Float be different types. To assign Int32 or Int64 to them you would need to call to_i first. Then programs on 32 and 64 bits will go through that explicit conversion process.

Another benefit is that we could start making collections use Int as the size. This increases their amount a bit but I think it's fine: it's probably not a huge performance/memory penalty (most of the memory is in the actual data). But then their limit becomes the limit of the architecture's memory (well, half of it if we use signed integers, but it's still a lot more than we can do right now). And, like before, this limit will automatically increase when the architectures improve (well, if the Amazon burning doesn't mean our imminent doom 😞).

Friction?

If we need these conversions between Int and all other integer types, and same for Float, wouldn't it make it really hard to write programs, having to convert between integer types all the time?

No, I don't think so. Because Int will be the default type everywhere, except for the few cases I mentioned before (C bindings and binary protocols) there would be no reason to use another integer type.

More benefits

Right now when we parse JSON and YAML we use Int64 because it would be a shame to parse to Int32 because we might lose some precision.

With this change the type would be Int, as everywhere else, and this can be assigned to everything else too if we stick to Int as a default. I know in 32 bits the limit will be smaller, but 32 bits machines are starting to become obsolete (for example I think Mac is dropping support for 32 bit apps).

Breaking change?

This is probably a breaking change, but a good one.

Summary

In summary if we do this change we get:

Simpler names: Int and Float
Bigger ints by default on 64 bits architectures
More room for collections (slices, arrays, hashes, deques) and file sizes
No need to think about the architecture if you always use Int
Still be able to use a different size if you need it

draft topicnumeric

Source

asterite

👍22 ❤4 👎2

Most helpful comment

If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users.

First, we could move to free up the Int and Float names (rename). Next release, they become aliases for Int32/Int64 and Float64. We can then push libraries to move to those aliases, so that when they become distinct types nothing breaks.

Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag.

RX14 on 25 Aug 2019

👍10

All 32 comments

A part of this RFC includes an older one https://github.com/crystal-lang/crystal/issues/6626, about making integer type depend on the platform.

j8r on 23 Aug 2019

I'm perfectly happy with the end result of this change, but I wonder how best to stage this change into the language. It doesn't seem like there's a way to incrementally apply this, the only way is to have a single release break all existing programs and libraries.

Which I'm fine with, since there doesn't seem to be an alternative.

RX14 on 23 Aug 2019

👀1 👍1

Yeah... it's even hard to develop because Int and Float are baked in the compiler so we'll first have to change their meaning, then compile a compiler with the existing primitives.cr file, but then change that file to define the new hierarchy (and use Int everywhere), and then compile the final new compiler.

In any case I think this can be delayed to the future, after we get parallelism and windows. But it's something I would definitely like to have before 1.0 because it's a big change.

asterite on 23 Aug 2019

It's curious that I'm also repeating myself (#6626) but I'm glad what I wrote here is what we ended up concluding there (though I don't know why I said it's impossible to do so).

asterite on 23 Aug 2019

😄2

I'll happily welcome the change. I grew to really dislike Int being the union of all signed integers, and wished it just some integer (32 or 64-bit or arch-dependent). It will break some programs, thought maybe not that much, and it can be quickly fixed by temporarily using an AnyInt alias or something.

Swift also has distinct Int and UInt types that are architecture dependent, and are the recommended and default integer types (https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html#ID317). Same for Nim with int and uint. Even C/C++ have long and ulong.

Yet, I can't find a language with architecture dependent floats. Swift has Float and Double, Go as float32 and float64. Nim has a float type that used to be platform dependent but now is merely an alias for float64 (https://nim-lang.org/docs/manual.html#types-preminusdefined-floating-point-types).

ysbaddaden on 23 Aug 2019

Yet, I can't find a language with architecture dependent floats

Good catch! Yeah, I think for float we should have Float be an alias of Float64, or even a distinct type, But I'd rather have something short like Float instead of having to type and read Float64 all the time.

asterite on 23 Aug 2019

👍4

If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users.

Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag.

RX14 on 25 Aug 2019

👍10

I like that idea!

Just note that:

I didn't discuss this yet with bcardiff and waj
I didn't play with a possible implementation yet (it shouldn't be hard, but trying it might uncover some issues)

So I guess the first thing for me will be to try this out and see how it works.

asterite on 25 Aug 2019

❤2

One thing to think about: when you want to map an integer to a database you usually want Int32 or Int64 (or even other integer types). Using Int then is a bit confusing because the DB column type will need to change if we are in 32 bits or 64 bits. Making it Int32 in the DB but exposing it to the use as Int works for reading but not for writing (if you try to write something bigger than Int32::MAX it will fail), and making it Int64 in the DB works for writing but not for reading, in 32 bits.

Another problem: the literal 1 will have the type Int and that's fine. But what about 2147483648 (Int32::MAX + 1)? It could be Int but then it won't compile in 32 bits, effectively making some programs stop compiling depending on the architecture. In fact I just tried this in Go and that's exactly the behavior you get. So maybe it's fine? 😅

asterite on 26 Aug 2019

Those are great counter-examples of having architecture-specific Int and UInt.

Literals

If I use something higher than Int32::MAX then I actually expect an Int64, not an Int, and it just happens to work on 64-bit targets. Having a compile time error for 32-bit targets seems appropriate?

It means Crystal can't infer 2147483648 as an Int64 or 9223372036854775808 as an Int128, and we'll have to manually type them (oh no), but does it happen much? maybe some explicitness ain't that bad?

Database

I believe database columns should be explicit, that is either Int32 or Int64, but if integers are usually an Int it may create some friction and require some explicit casts (oh no)...

ysbaddaden on 26 Aug 2019

👍2

Another point to coincider is to separate the notion of base integers and native integers. Currently, there are some operations and overloads that work only with native, but since BigInt < Int they match wrongly with BigInt.

The current alias to a union for primitives works on overloads but not on definitions in the base class.

bcardiff on 27 Aug 2019

I think that we could make all the std work with Int, that is, the architecture-dependent type. Then BigInt won't match that, nor Int32 nor Int64. You'll have to explicitly convert the values from those types.

That seems kind of bad but if Int is the default type everywhere then it's not bad. And we also reduce the number of method instantiations: right now a method accepting Int could get an instance for Int8, Int16, Int32, etc., but with this change it'll always be Int.

asterite on 27 Aug 2019

👍2

How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64? Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?

didactic-drunk on 29 Aug 2019

How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64?

I think that's also a problem right now with Int32 being the default integer type.

Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?

The answer is yes because that's something you also need to do now with Int32 being the default integer type.

asterite on 29 Aug 2019

Looks like I can pass any type of Int with the type preserved. With your proposal would this still work or will it convert to Int32?

def lib1_add(a : Int, b : Int)
  c = a + b
  lib2_func c
end

def lib2_func(x)
  p typeof(x)
end

x = 1_u64
lib1_add x, x

Output:

didactic-drunk on 29 Aug 2019

@didactic-drunk Int is currently an union alias (not aliased to Int32 only):

alias Int = Int8 | Int16 | Int32 | Int64

We can rename the alias AnyInt and keep the same behavior.

ysbaddaden on 29 Aug 2019

@didactic-drunk Int is currently an union alias (not aliased to Int32 only):
alias Int = Int8 | Int16 | Int32 | Int64
We can rename the alias AnyInt and keep the same behavior.

Based on my example doesn't that mean math functions (or most functions) should use AnyInt and we're right back where we started?

A major complaint when working in physics with c++ is Int sizes. Someone writes an algorithm using Int32 or Float32 for their problem and it's fine. Someone else attempts to use it with physics data and it over/underflows. SInce they're only half programmers they don't use things like version control. Instead they email files back and forth so things like Int128 never make it upstream. Each person who gets the file from the original programmer has to change Int32 to Int128.

They probably should have used a template but that's beyond them. They tend to use the default.

If Int32/64 is the default it will be wrong some portion of the time. They should use AnyInt? No. They'll copy and paste from an example they found on google which likely uses Int. When it's too small they'll change it to Int128 manually. When the first person refines the algorithm? They email it to a few of the people who change the types again.

Why? Int128 doesn't perform as well as Int32/64. It also requires much more memory/storage space. These run on huge clusters with > petabyte data sets. Each person wants the Int type for their specific problem space but the algorithms are generic.

AnyInt solves the problem, which is why I think it should remain the default named as Int.

didactic-drunk on 1 Nov 2019

@didactic-drunk Names are exchangeable. I won't go into details about pro and contra of which name.

The problem isn't names. It's default behaviour. A union type can't be used as type of an instance variable. But some type must be specified everywhere you need to store integers. Currently, we advocate to use Int32 everywhere by default because that's safe and fits for most use cases. It is also the default type of untyped integer literals.

Even your non-programmer algorithm writers need to pick data types for their integers. And it can't always be a union type, no matter whether it's called Int or AnyInt.

straight-shoota on 1 Nov 2019

+1, (in my opinion as a novice to Crystal) would be a good change.

I just asked this question how to hack crystal to use Int and Float everywhere and got link to that issue.

Clean, readable, compact code is one of the key feature of Ruby. Hard to justify 32 and 64 noise in codebase if they don't contribute or mean anything, at least in my projects as I use only those two everywhere.

al6x on 27 Dec 2019

+1 for making them the same on all platforms. Less confusion porting (and debugging somebody else's code). If they want to interface with C...maybe create a new type called "NativeInt" or something, that can be used as the parameter?

rdp on 3 Mar 2020

I attempted to ask here: https://forum.crystal-lang.org/t/int32-and-float64-why-the-defaults/1797 why are Int32 and float64 the defaults? Curious, since one is "32" and the other "64", thanks :)

rdp on 13 Mar 2020

I think it makes sense to have this before 1.0
It's much safer to use Int64 by default when dealing with native numbers in JSON and database.

cyangle on 9 Jul 2020

👍2

@cyangle This is not going to happen before 1.0. No other major changes are expected before 1.0

asterite on 9 Jul 2020

Really? I think this and #8872 are just as important as overflow checks. It changes _everything_ about numbers in the language...

ysbaddaden on 9 Jul 2020

👍1

The thing is that @waj just showed me a couple of benchmarks. For example this:

require "benchmark"
puts 1
a = Array(Int32).new(50_000_000) { rand(Int32) }
puts 2
b = Array(Int64).new(50_000_000) { rand(Int64) }
sa = 0_i32
sb = 0_i64
Benchmark.ips do |ips|
  ips.report("Int32") { sa = a.reduce(0_i32) { |s, i| s &+ i } }
  ips.report("Int64") { sb = b.reduce(0_i64) { |s, i| s &+ i } }
end
puts sa
puts sb

It's slower for Int64. The reason is that even though math operations take probably the same time, the data that you can put on a cache line or bus is smaller, so there's that performance loss with Int64.

What we are considering, though, is adding a Size type that's a different type than Int32 and Int64, and that would be used as the type of size in collections. That way you can have bigger collections in 64 bits machines. But the default integer type still stays Int32 for performance reasons (same decision as, for example, Rust).

asterite on 9 Jul 2020

I'm not sold on that. When 32-bit vs 64-bit performance matters (and 32-bits are big enough to hold the data) you can simply optimize your code by using Int32 explicitly. But that's actually an edge case for heavy math operations.
For the vast majority of use cases the performance difference is completely negligible. But usability would greatly improve if we just had a simple default integer data type that works for (almost) everything. You would only have to resort to explicit types for binary interfaces, optimizations and maybe some other special cases.

straight-shoota on 9 Jul 2020

👍5

Does the Rust-way fits Crystal? I think Crystal is closer to Go and Swift: abstract details but give access to low-level _when needed_. In that benchmark, if Int32's are enough, then you can optimize (cool), thought we're talking of 190MB vs 380MB arrays. That's kinda big, and the performance hit ain't so bad (1.28× slower) given that the CPU caches are busted twice as many times.

Having a specific Size type for collection sizes introduces friction (or weird type changes/overflows) whenever we'll want to compute anything with them (not cool). It also requires to continue to type as Int32 instead of a simpler Int —using Size for integers is weird, and not the recommended way to interact with libraries.

ysbaddaden on 9 Jul 2020

👍3

Personally I think discussion new integer types right now is entirely missing the point of 1.0.

The original plan was to release 1.0-pre1 as 0.35.0+bugfixes and now we're discussing this? Even #9357 can be implemented after 1.0 by adding a long_size instead of changing size, which is originally why I stopped working on it.

RX14 on 9 Jul 2020

👍1

I personally wouldn't mind having a default integer type that's Int64 in 64-bit machines. I think the same was you, @ysbaddaden . But not everyone thinks the same so we have to come to some consensus.

We've also been talking about making the @size of collections (maybe only Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64 with size64. That's similar to how it's done in C#, where arrays have a LongLength property. This way, if you really need big collections or slices you can still work with them, but for the general case collections with less than Int32::MAX elements are probably enough for most use cases.

However, nothing is set in stone yet, this is what we've been discussing so far.

asterite on 9 Jul 2020

Ask MIT

On Thu, Jul 9, 2 Reiwa at 3:52 PM Ary Borenszweig notifications@github.com
wrote:

I personally wouldn't mind having a default integer type that's Int64 in
64-bit machines. I think the same was you, @ysbaddaden
https://github.com/ysbaddaden . But not everyone thinks the same so we
have to come to some consensus.

We've also been talking about making the @size of collections (maybe only
Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64
with size64. That's similar to how it's done in C#, where arrays have a
LongLength
https://docs.microsoft.com/en-us/dotnet/api/system.array.longlength?view=netcore-3.1
property. This way, if you really need big collections or slices you can
still work with them, but for the general case collections with less than
Int32::MAX elements are probably enough for most use cases.

However, nothing is set in stone yet, this is what we've been discussing
so far.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/crystal-lang/crystal/issues/8111#issuecomment-656319108,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/APXLM3LF5H63FA3NDGGFXKLR2YNWJANCNFSM4IO7LCVQ
.

Darimac1 on 10 Jul 2020

We've also been talking about making the @size of collections (maybe only Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64 with size64.

That's even worse :sob:

ysbaddaden on 16 Jul 2020

👍3 😕1

Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.

I think being specific about the type in a statically typed language is a positive. It shouldn't feel redundant, it should feel good because _it's explicit_. Not against an Int or Float alias that is platform-dependent though.