Right now Crystal has a variety of integer and float types:
Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, UInt64
Float32
, Float64
The default integer type when you don't use a suffix is Int32
and the default float type is Float64
.
This kind of works but I imagine something better.
Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.
So here's an initial idea: what if we name those types Int
and Float
? We would of course need to rename the existing base types Int
and Float
but that's not a problem, we can maybe call them IntBase
and FloatBase
or Integral
and Floating
, it doesn't matter much because those names won't be used a lot.
Then talking about ints and floats is so much simpler: just use Int
and Float
everywhere. In the case where you do need a specific limit, which is rare and usually only useful in low-level code such as interfacing with C or writing binary protocols, you can still use the names Int32
, Int64
, Float32
or whatever you need.
Now, we could make Int
be an alias of Int32
and Float
an alias of Float64
, but maybe it's better if we make Int
depend on the architecture. That means Int
would be equivalent to Int64
in 64 bits architectures.
This is also how Go works. They recommend using int
everywhere unless you have good reasons to use a specific size. It's probably the case that using Int64
by default instead of Int32
works equally fine (maybe even better because the range is bigger so overflow is less possible) without a real performance degradation.
Another nice thing is that if eventually 128 bit architectures appear all programs will automatically start using this bigger range (if we want to) without needing to change any code.
Now, we _could_ make Int
be an alias of the respective underlying type, but I don't think that's a good idea. The reason is that if you have a program that does:
x : Int = 1
y : Int32 = 2
x = y
that would compile in 32 bits but would stop compiling in 64 bits. Ideally we'd like our programs to always compile regardless of the architecture.
So, we could make Int
and Float
be different types. To assign Int32
or Int64
to them you would need to call to_i
first. Then programs on 32 and 64 bits will go through that explicit conversion process.
Another benefit is that we could start making collections use Int
as the size. This increases their amount a bit but I think it's fine: it's probably not a huge performance/memory penalty (most of the memory is in the actual data). But then their limit becomes the limit of the architecture's memory (well, half of it if we use signed integers, but it's still a lot more than we can do right now). And, like before, this limit will automatically increase when the architectures improve (well, if the Amazon burning doesn't mean our imminent doom 😞).
If we need these conversions between Int
and all other integer types, and same for Float
, wouldn't it make it really hard to write programs, having to convert between integer types all the time?
No, I don't think so. Because Int
will be the default type everywhere, except for the few cases I mentioned before (C bindings and binary protocols) there would be no reason to use another integer type.
Right now when we parse JSON and YAML we use Int64
because it would be a shame to parse to Int32
because we might lose some precision.
With this change the type would be Int
, as everywhere else, and this can be assigned to everything else too if we stick to Int
as a default. I know in 32 bits the limit will be smaller, but 32 bits machines are starting to become obsolete (for example I think Mac is dropping support for 32 bit apps).
This is probably a breaking change, but a good one.
In summary if we do this change we get:
Int
and Float
Int
A part of this RFC includes an older one https://github.com/crystal-lang/crystal/issues/6626, about making integer type depend on the platform.
I'm perfectly happy with the end result of this change, but I wonder how best to stage this change into the language. It doesn't seem like there's a way to incrementally apply this, the only way is to have a single release break all existing programs and libraries.
Which I'm fine with, since there doesn't seem to be an alternative.
Yeah... it's even hard to develop because Int
and Float
are baked in the compiler so we'll first have to change their meaning, then compile a compiler with the existing primitives.cr
file, but then change that file to define the new hierarchy (and use Int
everywhere), and then compile the final new compiler.
In any case I think this can be delayed to the future, after we get parallelism and windows. But it's something I would definitely like to have before 1.0 because it's a big change.
It's curious that I'm also repeating myself (#6626) but I'm glad what I wrote here is what we ended up concluding there (though I don't know why I said it's impossible to do so).
I'll happily welcome the change. I grew to really dislike Int
being the union of all signed integers, and wished it just some integer (32 or 64-bit or arch-dependent). It will break some programs, thought maybe not that much, and it can be quickly fixed by temporarily using an AnyInt
alias or something.
Swift also has distinct Int
and UInt
types that are architecture dependent, and are the recommended and default integer types (https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html#ID317). Same for Nim with int
and uint
. Even C/C++ have long
and ulong
.
Yet, I can't find a language with architecture dependent floats. Swift has Float
and Double
, Go as float32
and float64
. Nim has a float
type that used to be platform dependent but now is merely an alias for float64
(https://nim-lang.org/docs/manual.html#types-preminusdefined-floating-point-types).
Yet, I can't find a language with architecture dependent floats
Good catch! Yeah, I think for float we should have Float
be an alias of Float64
, or even a distinct type, But I'd rather have something short like Float
instead of having to type and read Float64
all the time.
If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users.
First, we could move to free up the Int
and Float
names (rename). Next release, they become aliases for Int32
/Int64
and Float64
. We can then push libraries to move to those aliases, so that when they become distinct types nothing breaks.
Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag.
I like that idea!
Just note that:
So I guess the first thing for me will be to try this out and see how it works.
One thing to think about: when you want to map an integer to a database you usually want Int32
or Int64
(or even other integer types). Using Int
then is a bit confusing because the DB column type will need to change if we are in 32 bits or 64 bits. Making it Int32
in the DB but exposing it to the use as Int
works for reading but not for writing (if you try to write something bigger than Int32::MAX
it will fail), and making it Int64
in the DB works for writing but not for reading, in 32 bits.
Another problem: the literal 1
will have the type Int
and that's fine. But what about 2147483648
(Int32::MAX + 1
)? It could be Int
but then it won't compile in 32 bits, effectively making some programs stop compiling depending on the architecture. In fact I just tried this in Go and that's exactly the behavior you get. So maybe it's fine? 😅
Those are great counter-examples of having architecture-specific Int
and UInt
.
Literals
If I use something higher than Int32::MAX then I actually expect an Int64, not an Int, and it just happens to work on 64-bit targets. Having a compile time error for 32-bit targets seems appropriate?
It means Crystal can't infer 2147483648 as an Int64 or 9223372036854775808 as an Int128, and we'll have to manually type them (oh no), but does it happen much? maybe some explicitness ain't that bad?
Database
I believe database columns should be explicit, that is either Int32 or Int64, but if integers are usually an Int
it may create some friction and require some explicit casts (oh no)...
Another point to coincider is to separate the notion of base integers and native integers. Currently, there are some operations and overloads that work only with native, but since BigInt < Int
they match wrongly with BigInt
.
The current alias to a union for primitives works on overloads but not on definitions in the base class.
I think that we could make all the std work with Int
, that is, the architecture-dependent type. Then BigInt
won't match that, nor Int32
nor Int64
. You'll have to explicitly convert the values from those types.
That seems kind of bad but if Int
is the default type everywhere then it's not bad. And we also reduce the number of method instantiations: right now a method accepting Int
could get an instance for Int8
, Int16
, Int32
, etc., but with this change it'll always be Int
.
How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64? Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?
How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64?
I think that's also a problem right now with Int32
being the default integer type.
Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?
The answer is yes
because that's something you also need to do now with Int32
being the default integer type.
Looks like I can pass any type of Int
with the type preserved. With your proposal would this still work or will it convert to Int32
?
def lib1_add(a : Int, b : Int)
c = a + b
lib2_func c
end
def lib2_func(x)
p typeof(x)
end
x = 1_u64
lib1_add x, x
Output:
@didactic-drunk Int
is currently an union alias (not aliased to Int32
only):
alias Int = Int8 | Int16 | Int32 | Int64
We can rename the alias AnyInt
and keep the same behavior.
@didactic-drunk
Int
is currently an union alias (not aliased toInt32
only):alias Int = Int8 | Int16 | Int32 | Int64
We can rename the alias
AnyInt
and keep the same behavior.
Based on my example doesn't that mean math functions (or most functions) should use AnyInt
and we're right back where we started?
A major complaint when working in physics with c++ is Int
sizes. Someone writes an algorithm using Int32
or Float32
for their problem and it's fine. Someone else attempts to use it with physics data and it over/underflows. SInce they're only half programmers they don't use things like version control. Instead they email files back and forth so things like Int128
never make it upstream. Each person who gets the file from the original programmer has to change Int32
to Int128
.
They probably should have used a template but that's beyond them. They tend to use the default.
If Int32/64
is the default it will be wrong some portion of the time. They should use AnyInt
? No. They'll copy and paste from an example they found on google which likely uses Int
. When it's too small they'll change it to Int128
manually. When the first person refines the algorithm? They email it to a few of the people who change the types again.
Why? Int128
doesn't perform as well as Int32/64
. It also requires much more memory/storage space. These run on huge clusters with > petabyte data sets. Each person wants the Int
type for their specific problem space but the algorithms are generic.
AnyInt
solves the problem, which is why I think it should remain the default named as Int
.
@didactic-drunk Names are exchangeable. I won't go into details about pro and contra of which name.
The problem isn't names. It's default behaviour. A union type can't be used as type of an instance variable. But some type must be specified everywhere you need to store integers. Currently, we advocate to use Int32
everywhere by default because that's safe and fits for most use cases. It is also the default type of untyped integer literals.
Even your non-programmer algorithm writers need to pick data types for their integers. And it can't always be a union type, no matter whether it's called Int
or AnyInt
.
+1, (in my opinion as a novice to Crystal) would be a good change.
I just asked this question how to hack crystal to use Int and Float everywhere and got link to that issue.
Clean, readable, compact code is one of the key feature of Ruby. Hard to justify 32
and 64
noise in codebase if they don't contribute or mean anything, at least in my projects as I use only those two everywhere.
+1 for making them the same on all platforms. Less confusion porting (and debugging somebody else's code). If they want to interface with C...maybe create a new type called "NativeInt" or something, that can be used as the parameter?
I attempted to ask here: https://forum.crystal-lang.org/t/int32-and-float64-why-the-defaults/1797 why are Int32 and float64 the defaults? Curious, since one is "32" and the other "64", thanks :)
I think it makes sense to have this before 1.0
It's much safer to use Int64 by default when dealing with native numbers in JSON and database.
@cyangle This is not going to happen before 1.0. No other major changes are expected before 1.0
Really? I think this and #8872 are just as important as overflow checks. It changes _everything_ about numbers in the language...
The thing is that @waj just showed me a couple of benchmarks. For example this:
require "benchmark"
puts 1
a = Array(Int32).new(50_000_000) { rand(Int32) }
puts 2
b = Array(Int64).new(50_000_000) { rand(Int64) }
sa = 0_i32
sb = 0_i64
Benchmark.ips do |ips|
ips.report("Int32") { sa = a.reduce(0_i32) { |s, i| s &+ i } }
ips.report("Int64") { sb = b.reduce(0_i64) { |s, i| s &+ i } }
end
puts sa
puts sb
It's slower for Int64. The reason is that even though math operations take probably the same time, the data that you can put on a cache line or bus is smaller, so there's that performance loss with Int64.
What we are considering, though, is adding a Size
type that's a different type than Int32
and Int64
, and that would be used as the type of size
in collections. That way you can have bigger collections in 64 bits machines. But the default integer type still stays Int32
for performance reasons (same decision as, for example, Rust).
I'm not sold on that. When 32-bit vs 64-bit performance matters (and 32-bits are big enough to hold the data) you can simply optimize your code by using Int32 explicitly. But that's actually an edge case for heavy math operations.
For the vast majority of use cases the performance difference is completely negligible. But usability would greatly improve if we just had a simple default integer data type that works for (almost) everything. You would only have to resort to explicit types for binary interfaces, optimizations and maybe some other special cases.
Does the Rust-way fits Crystal? I think Crystal is closer to Go and Swift: abstract details but give access to low-level _when needed_. In that benchmark, if Int32's are enough, then you can optimize (cool), thought we're talking of 190MB vs 380MB arrays. That's kinda big, and the performance hit ain't so bad (1.28× slower) given that the CPU caches are busted twice as many times.
Having a specific Size
type for collection sizes introduces friction (or weird type changes/overflows) whenever we'll want to compute anything with them (not cool). It also requires to continue to type as Int32
instead of a simpler Int
—using Size
for integers is weird, and not the recommended way to interact with libraries.
Personally I think discussion new integer types right now is entirely missing the point of 1.0.
The original plan was to release 1.0-pre1 as 0.35.0+bugfixes and now we're discussing this? Even #9357 can be implemented after 1.0 by adding a long_size
instead of changing size
, which is originally why I stopped working on it.
I personally wouldn't mind having a default integer type that's Int64
in 64-bit machines. I think the same was you, @ysbaddaden . But not everyone thinks the same so we have to come to some consensus.
We've also been talking about making the @size
of collections (maybe only Slice
for now) be Int32
or Int64
, exposed as Int32
with size
and as Int64
with size64
. That's similar to how it's done in C#, where arrays have a LongLength property. This way, if you really need big collections or slices you can still work with them, but for the general case collections with less than Int32::MAX
elements are probably enough for most use cases.
However, nothing is set in stone yet, this is what we've been discussing so far.
Ask MIT
On Thu, Jul 9, 2 Reiwa at 3:52 PM Ary Borenszweig notifications@github.com
wrote:
I personally wouldn't mind having a default integer type that's Int64 in
64-bit machines. I think the same was you, @ysbaddaden
https://github.com/ysbaddaden . But not everyone thinks the same so we
have to come to some consensus.We've also been talking about making the @size of collections (maybe only
Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64
with size64. That's similar to how it's done in C#, where arrays have a
LongLength
https://docs.microsoft.com/en-us/dotnet/api/system.array.longlength?view=netcore-3.1
property. This way, if you really need big collections or slices you can
still work with them, but for the general case collections with less than
Int32::MAX elements are probably enough for most use cases.However, nothing is set in stone yet, this is what we've been discussing
so far.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/crystal-lang/crystal/issues/8111#issuecomment-656319108,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/APXLM3LF5H63FA3NDGGFXKLR2YNWJANCNFSM4IO7LCVQ
.
We've also been talking about making the
@size
of collections (maybe only Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64 with size64.
That's even worse :sob:
Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.
I think being specific about the type in a statically typed language is a positive. It shouldn't feel redundant, it should feel good because _it's explicit_. Not against an Int
or Float
alias that is platform-dependent though.
Most helpful comment
If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users.
First, we could move to free up the
Int
andFloat
names (rename). Next release, they become aliases forInt32
/Int64
andFloat64
. We can then push libraries to move to those aliases, so that when they become distinct types nothing breaks.Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag.