Crystal: BUG: multiplication error when switching order of variables

Created on 21 Aug 2018  Â·  26Comments  Â·  Source: crystal-lang/crystal

I've been converting my primes-utils rubygem into Crystal and have been trying to trackdown this bug for a loooong time. :( The major problem has been getting variable type casting right so the arithmetic operations produce the correct results. I finally seem to have traced one recurring math bug to the method pcs_to_num. I've reproduced this bug now on my native system and play.crystal.org, using 0.26.0. My development system is a System76 laptop, i7-6700HQ.

def pcs_to_num(nom, residues, lte)

        #k: UInt32 | UInt64
        #num: UInt32 | UInt64
        #modk: UInt32 | UInt64 
        #modpg: UInt32 | UInt64

        num = nom.to_u64

        modpg, rescnt = (residues[-1] - 1).to_u32, residues.size

        puts "in pcs_to_num: num = #{num}, modpg = #{modpg}, rescnt = #{rescnt}"

        num = 2.to_u64 if num < 2
        num -= 1; lte ? (num |= 1; k = num/modpg) : (k = (num - 1)/modpg)

        #num -= 1_i64; lte ? (num |= 1; k = num.abs/modpg) : (k = (num - 1).abs/modpg)

        #puts "in pcs_to_num: k = #{k}, modpg * k = #{(s = 0.to_u64; modpg.times do s += k end; s)}"
        puts "in pcs_to_num: k = #{k}, modpg * k = #{modpg.to_u64 * k}"

        modk = 0_u64; modk = modpg.to_u64 * k.to_u64; r = 0
        #modk = modpg * k; r = 0
        #modk = 0.to_u64; modpg.times do modk += k end; r = 0
        while num >= modk + residues[r]; r += 1 end

        puts "in pcs_to_num: in_num = #{nom}, num = #{num}, modk = #{modk}, r = #{r}"

        #numout = (rescnt * k)  + r
        #numout = (rescnt * k).to_u64  + r
        #numout = (k * rescnt).to_u64  + r
        #numout = 0.to_u64; rescnt.times do numout += k end; numout += r
        #{numout, r, modk} # {num pcs, r index, num modulus}

        {rescnt * k + r, r, modk} # {num pcs, r index, num modulus}
        #{k * rescnt + r, r, modk} # {num pcs, r index, num modulus}
end

puts pcs_to_num(2_145_848_349, [7,11,13,17,19,23,29,31], true)
puts
puts pcs_to_num(11_037_857_400, [7,11,13,17,19,23,29,31], true)

The problem is the multiplication in the first element of the 3 element output tuple.
If the multiplication order is: rescnt * k + r I get this output.
https://play.crystal-lang.org/#/r/4sgk

in pcs_to_num: num = 2145848349, modpg = 30, rescnt = 8
in pcs_to_num: k = 71528278, modpg * k = 2145848340
in pcs_to_num: in_num = 2145848349, num = 2145848349, modk = 2145848340, r = 1
{572226225, 1, 2145848340_u64}

in pcs_to_num: num = 11037857400, modpg = 30, rescnt = 8
in pcs_to_num: k = 367928579, modpg * k = 11037857370
in pcs_to_num: in_num = 11037857400, num = 11037857399, modk = 11037857370, r = 7
{-1351538657, 7, 11037857370_u64}

When the order of multiplication is changed to: k * rescnt + r I get this output.
https://play.crystal-lang.org/#/r/4sgj

in pcs_to_num: num = 2145848349, modpg = 30, rescnt = 8
in pcs_to_num: k = 71528278, modpg * k = 2145848340
in pcs_to_num: in_num = 2145848349, num = 2145848349, modk = 2145848340, r = 1
{572226225_u64, 1, 2145848340_u64}

in pcs_to_num: num = 11037857400, modpg = 30, rescnt = 8
in pcs_to_num: k = 367928579, modpg * k = 11037857370
in pcs_to_num: in_num = 11037857400, num = 11037857399, modk = 11037857370, r = 7
{2943428639_u64, 7, 11037857370_u64}

So when inputs need to be 64-bit (all unsigned values) I get wrong errors, which, of course, is wrecking havoc in the code from which this method is called, multiple times.

I had been pulling my hair (figuratively) forever before I decided to just see if changing the order of the variables made a difference, and sure enough it does.

And in the first output the first element multiplication is 32-bits, and 64-bits for the second case.
And the really, really, really strange thing is the outputs from the first case gives correct answers in the methods they're called from, but errors for the second case. This has been driving me batty!

This is obviously a bug, as multiplication is supposed to be communicative.

Most helpful comment

There is actually an implicit type conversion for Int#+(Float) which returns the float type of the argument.

IMO it would make sense to change operators with only integer or only float arguments to return the widest data type as well. AFAIK this is similar in most other programming languages (for example C, Java and derivatives).

Yes, it will mean that a += b may change the data type of a and that might go unnoticed unless its type restricted. But honestly, I don't see a huge issue with that, it is at least not as weird as (almost) always returning the type of the first argument.

All 26 comments

In Crystal, the type of a + b has the type of a (replace + with any operator, lik *).

That means, if a is Int32 and b is Int64, the result is Int32, and of course there's a chance of overflow. If you switch the order, the result is Int64. There's still a chance of overflow but it's less probable.

This will be solved in the future when these operations raise on overflow. Then you will get an overflow exception instead of a wrong, hidden, behavior.

The reason for this type resolution is that:

a += b

is expanded to:

a = a + b

and you want the type of a to remain the same, not change.

(to be clear, this is not a bug, just a missing feature)

And yes, the mathematical definition of multiplication is commutative, but Crystal's * operator is not (at least not completely).

However, a different implementation could be that *, + etc. return the biggest data type of both operands and += etc. would convert the type of the result.
But this is obviously much more complicated and doesn't provide a huge benefit. If you no about this behaviour, it's easy to work with it. Once we have raise on overflow this is even less an issue.

Essentially solved by #6223.

I found the solution to one set of errors -- subtle, oh so subtle. :-(
Changed this as output: k * rescnt + r to this: (k * rescnt).to_i64 + r
Now just one more (I hope) type casting hell problem to solve.

It would have definitely been better to have generated overflow errors.
I've had to literally debug a who section of calculations code line-by-line to see where the overflow was.
Can you say when #6233 will be incorporated? Next release? In development branch?

OK, now all my test cases pass for this particular function. :-)

I had to change the order of multiplication in 2 other instances AND addition in another to make the larger value be the first operand so the result was of that type. Now I have to go over all the other methods and create large number test case for them as well.

I wish this had been prominently documented as a gotcha.

Essentially solved by #6223.

The core issue of missing commutativity is not solved. It will just raise an exception at runtime if an invalid result is calculated. That's better than failing silently, of course.

@straight-shoota that's not a bug it's just one of those things that needs to be documented. The behaviour that caused this overflow isn't going to change.

The core issue of missing commutativity is not solved

With that argument you could say that raising on overflow is a bug because in math there's no such thing as overflow, numbers are infinite.

This is not pure math, this is programming and there are some differences and limitations compared to math.

An alternative is to disallow a type with a larger magnitude on the right side. So for example this won't compile:

a = 1_i32
b = 1_i64
a + b # error

That is, we would define Int32+#(Int8), Int32#+(Int16), Int32#+(Int32), but not Int32#+Int64.

In fact, maybe we could always just define T + T, given that something like a + 1 will usually work because of automatic type casts.

I'd be really interested in seeing the diff for restricting addition to self actually. Obviously it doesn't avoid overflow at all, so i wonder if it'd just be more annoying or if it'd end up people just casting to one type sooner, which might result in cleaner code.

@RX14 I know this is not a bug. But it is still unexpected behaviour that the operator + is not commutative.

In math, there is actually the same concept as raising on overflow. Each integer type essentially represents a subset of integers, for example Int8 is the set { x ∈ Z | 128 <= x <= 127 }. Because of this, it is not closed under any standard operator. Any result of such an operation can be outside the set and thus not defined or not representable. That's exactly the reason for an overflow exception.

As a user, and coming from Ruby, it was certainly unexpected, non-intuitive, and violated Ruby's POLS (principle of least surprises) to have the fundamental operators +, -, and * not behave as normal commutative operators. This is such a basic expectation of doing math, for the purpose of making programmers lives easier, especially when not used to having to deal with the artifacts of number types and casting, it is essential to prominently documents the consequences of this for newbie Crystal programmers.

Since Crystal is compiled, would it be possible for the compiler to detect the operand types and do the necessary switching in the AST, and then continue on as normal? With - maybe you would need to do for a - b a negation and add: ~b + a. The point being take the manual burden off the programmer to have to manually parse code and put that burden on the compiler. Once it's done it's done for everyone.

Also, in a lot of (most?) cases, a programmer won't know from one iteration to the next which variables will be the bigger|smaller in a process, so no mater how the variables are ordered, at some point an erroneous result will be given. The only possible work around for this, but a humongous PITA, is just to cast all numbers/variables to the largest possible type (which is probably very inefficient from a memory and performance perspective).

In my case, I started trying to translate my primes-utils gem to Crystal starting in 2015, but it was so frustrating (especially back then with the state of Crystal) I just took a break, and came back to it when I felt motivated to try again. In 2018, I had gotten 90% translated and working, so I just decided recently to drill down into this problem, since it was the last major thing left to get working. In my case I'm lucky, because in the algorithms I'm using I know (mostly) the relative size of the variables|numbers.

And what made the process soooo frustrating was, about 90% of the Ruby code didn't need any syntac changes, it was the unknown unknowns the other 10% of the code that caused all the problems. :-(

So, please consider things from the humble and naive programmers perspective going forward, and follow the POLS as much as possible, and when not possible, just tell us what to expect and how to work with doing things the Crystal way. If I know what (all) the rules are I can follow them. I can't follow what I don't know.

Thanks! :-)

There is actually an implicit type conversion for Int#+(Float) which returns the float type of the argument.

IMO it would make sense to change operators with only integer or only float arguments to return the widest data type as well. AFAIK this is similar in most other programming languages (for example C, Java and derivatives).

Yes, it will mean that a += b may change the data type of a and that might go unnoticed unless its type restricted. But honestly, I don't see a huge issue with that, it is at least not as weird as (almost) always returning the type of the first argument.

Let's leave this discussion until after we've implemented overflow detection, and then we can properly evaluate whether this is still an issue.

Well, Crystal isn't Ruby.

Expecially numbers. In ruby they're an union of all possible number types, so an operation can be executed again after an overflow is detected on grown types. This is more complex in Crystal because an Int64 is just a 64 bit number, it can't grow (otherwise numbers would be huge, ugly and slow unions).

With raise on overflow on number operators, you'd have those same issues, but it would fail with an exception at runtime, allowing to track issues. I guess it won't be as frustrating as having to deep dive.

Note that using Int64 everywhere in your library could probably avoid the issue altogether —unless you outgrow 64 bit numbers.

@RX14 I'm afraid restricting number operators on self would only end up with manual & laborious casts —especially int32/int64 and sometimes int8/16/32/64 when dealing with hashes, digests, binary protocols... Not very Crystal :-/

@straight-shoota the conversion rules for integer conversions in C are really ugly and complex to predict. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf (page 53). In practice, one should always cast :-/

Another design issue why operators are not commutative is that in order to avoid generating a union as a result of var *= exp, var * exp needs to keep the type of lhs.

@ysbaddaden I'm actually not so sure it's as common as you think. But the only way to find out is to try, and i'm not sure it's worth it.

@bcardiff How does any of these expressions generate a union type?

@straight-shoota For example:

a = 1
b = 1_i64
if condition
  a += b
end
# now a silently is Int32 | Int64, which makes all further operations slower

I think restricting operations for T op U to types where U is "less" than T might be a good solution for this. That is, you can do Int32 + Int16, but not Int32 + Int64.

Now this is interesting. I'm trying to restrict all operations to just T op T (no lesser types, like in Go) and while doing it and adding explicit casts, I found this:

>  4.           registers.address += sequence.minimum_instruction_length *
   5.             ((registers.op_index + operation_advance.to_u32) / sequence.maximum_operations_per_instruction.to_u32)

no overload matches 'UInt8#*' with type UInt32
Overloads are:
 - UInt8#*(other : UInt8)

This is in the code that decodes line numbers. Of course UInt8 multiplied by an UInt32 is likely to overflow and give an incorrect error. Maybe it's a bug...

Anyway, I'm trying to do it but it's never-ending. There's so much code we'll have to change. I don't know, maybe raising on overflow is simpler: it's more flexible. The only problem is that you'll find it out at runtime, not at compile time. However, if we only allow, say, T op T, then Int32 + Int32 can still overflow, and the compiler won't detect/prevent that. It's something that can only be detected at runtime.

We'll see, once we have automatic overflow detection, if the dwarf code fails sometimes, and maybe with that we can fix the line number offset that we have from time to time.

On one of the versions of overflow I needed to change the dwarf parser. Maybe what you point was the issue.

I don’t think is there is a need to restrict over same types. Runtime overflow will always be needed for native ints.

@asterite how much performance are we talking about? I assume the percentage of performance gained is probably not worth the effort to offset the need having to deal/worry with int8/16/32/64 and which kind of integer they are using. It also makes the application/library having to deal with these kind of edgy stuff, which imo is bad. 2cents

Closing because this will be solved with overflow checks.

@asterite Overflow checks don't provide type safety. They help to discover the problem but only if it happens to overflow (in the specs or other regular run code).
This might be good enough, sure.

But what about changing the allowed types to perform mathematical operations? This would give more safety. Still not 100% because operating in two values of the same type can also overflow.

@straight-shoota also multiplication isn't safe unless you double the width of the largest type, which is extra confusing. It's not 100% on avoiding overflows, it's not even 50%. So it's definitely not worth it to even attempt to enforce this in the type system.

@straight-shoota You can try changing the std/language to do that, send a PR and we can review it.

Was this page helpful?
0 / 5 - 0 ratings