Crystal: Behaviour of #[], #first, #last and similar methods

Created on 25 Jul 2015  Â·  21Comments  Â·  Source: crystal-lang/crystal

Problem

Ruby allows us to write programs in _exception-free_ style. For example we can do:

hash = {1 => "one", 2 => "two"}
puts hash[5] || "default"

# or
while item = array.pop
  # do something
end

# or
if item = array.first
  puts "we have an item"
end

We doesn't have to worry about exceptions here.

But in Crystal we have to add ? sign to avoid exceptions and the same code looks less natural and a little dirty:

hash = {1 => "one", 2 => "two"}
puts hash[5]? || "default"

# or
while item = array.pop?
  # do something
end

# or
if item = array.first?
  puts "we have an item"
end

Moreover I have the following arguments against current Crystal behaviour:

  • ? sign mostly using for methods which checks some condition.

    • Object#frozen? checks for an "_object frozen or not_"

    • Array#any? checks for an "_array contains any items or not_"

    • Array#include? checks for an "_array includes specific item or not_"

My initial expectations from Array#first? - it checks for a "_specific item is first in array or not_". But it does totally different thing.

So my opinion - Crystal have great inconsistency about that.

  • Dangerous because of any reason methods historically marked by ! mark.
    Currently in Crystal, methods #[], #first, #last, etc generates exception, but doesn't have exclamation mark. So they looks harmless, but will raise unexpected exceptions if something went wrong. I think this is pretty error-prone.
    Moreover every one newcomer rubrist certainly will try to use array or hash #[] method in one of described above ways. And every new one will stumble again, again... and again.

Proposal

  • Recover default ruby behaviour about #[], #first, #last and similar methods.
  • Get rid of #[]?, #first?, #last? etc methods.
  • Create #[]!, #first!, #last!, ... methods for a cases when we need to raise exceptions.
draft stdlib

All 21 comments

Related (if not duplicate): #406 Some existing discussion in https://github.com/manastech/crystal/issues/483#issuecomment-81084234 (and I think somewhere else but I'm unable to find it).

Yes, we definitely need to normalize all the method names.

The problem with [] is that the case where you are doing an index out of bounds, or fetching a non-existing key, are the less often ones. We chose [] to mean "the index/key exists, raise otherwise" because otherwise most of the code will end up with []! everywhere.

There are other things I'd like to improve. For example there's IO#gets which might return nil, and IO#read_line which is the same but raises EOF instead of returning nil. Maybe one should be called gets and the other gets?, or gets and gets!, etc.

What is the reason to have exceptions take precedent over returning nil? It would be cool to have the same functionality as ruby here unless there a performance implication I'm not thinking of. I love ruby and crystal is rapidly becoming my new favorite thing. I completely understand that somethings have to be different because of the compiled nature of crystal. I'm just trying to figure this one out though. Thanks a lot.

@isaacsloan

Lets assume, that #[] may return nil. Its type is T|Nil then. The code like this will no longer work:

array[index] * 2
# Compile Error: method `*(Int32)` is not found on Nil

You would have to do:

if value = array[index]
  value * 2
else
  # handle nil case
end

@isaacsloan It's just a decision. If we change the meaning I'm sure someone will come and complain about this:

a = [1, 2, 3]
a[0] + a[2] # doesn't compile

Nothing is set in stone yet, we could change this to be nil and raise with a syntax like a[0]!.

But check some samples:

There are many [] accesses, but no []?. We think the more common case is when you know a key/array-index is there and you want to access it, and accessing incorrectly is a program bug. Filling the code with []! won't help for that. And using a[0] || raise "BUG" is equally bad.

But those are just some samples, we have to see some real code and do some stats.

@waterlink, @asterite Thanks for the explanation. I guess after 10 years of ruby I just expect and deal with that in the few cases I would actually try to access elements of an array via index instead of each.
If I were to try doing something like your above example I'd do it like (a[i] || 0) * 2 or [a[i].to_i * 2.

I don't think that everyone using [] instead of []? really means a lot since in most cases you use [] to return a value and aren't going to be calling something out of range. In all of my code the only time this has been a problem is in calling ARGV.first or ARGV[0]. The rest of the time it behaved exactly as I would have expected so I didn't even notice that it didn't return nil.

Here is the code example I ran into and issue with:

n = ARGV.size > 0 ? ARGV[0].to_i : 100
# n = (ARGV.first || 100) #works in ruby and when first exists in this case.
sieve = Array.new(n + 1, true) 
sieve[0] = sieve[1] = false
2.step(Math.sqrt(n)) do |i|
  next unless sieve[i]
  (i*i).step(n, i){|ii| sieve[ii] = false}
end
last = 2
sieve.each_with_index do |a, i| 
  last = i if a
end
puts last

Perhaps a good solution would be to leave [] how it is but change first, last etc to call []? instead of [].

@isaacsloan You could use #first? and #last?, they will return T|Nil and not raise.

In all of my code the only time this has been a problem is in calling ARGV.first or ARGV[0].

This makes me think, that behavior of #first and #last should stay the same way, and in some special cases, yes, you will have to add ? at the end.

@isaacsloan You could use #first? and #last?, they will return T|Nil and not raise.

I mean:

ARGV.first? || 100

I understand that, but as OP mentioned first? and last? imply different functionality like ary.first?(1) => true

I don't really see any useful reason that someone would ever need first or last to raise an exception, where as I do see your point that [] should exception if out of bounds. Anyway regardless crystal awesome, it would just be cool if it stayed as similar to ruby as it could unless there's a performance reason it can't. I really appreciate everything you guys do.

first/first? are as is for consistency with []/[]?

No performance issue in this side for one or the other.

As the std come together we are discussing which alias to keep, use ! Or ?
at the end. We even come and go with some decisions.

On dom, 10 de abr de 2016 at 00:07 Isaac Sloan [email protected]
wrote:

I understand that, but as the OP mentioned first? and last? imply
different functionality like ary.first?(1) => true

I don't really see any reason useful reason that someone would ever need
first or last to raise an exception, where as I do see your point that []
should exception if out of bounds. Anyway regardless crystal awesome, it
would just be cool if it stayed as similar to ruby as it could unless
there's a performance reason it can't. I really appreciate everything you
guys do.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/crystal-lang/crystal/issues/1034#issuecomment-207854021

Brian J. Cardiff - Manas Technology Solutions
[ar.phone] 4796.0232 #BCR(227)
[us.phone] 312.612.1050 #BCR(227)
[email] [email protected]
[web] www.manas.com.ar
[weblog] http://weblogs.manas.com.ar/bcardiff/

Personally I'd like to see:

  • [] => raise if out of bounds
  • []? => nil if out of bounds
  • []! => no safety belts! crash if out of bounds (check and raise in devel only)

@ozra Just curious: Do you have a concrete use case for #[]! ?

@waterlink - Just for low level optimizing in low level modules working with big chunks of data, where you have checks done at an earlier, "colder", stage. Perhaps LLVM is smart enough to optimize away unnecessary checks in most scenarios though - I'll be the first to admit, I haven't done deep research.

Simplest possible example:

list = get_some_list()
i = 0
while i < list.size
   do_stuff_with list[i]!
   i += 1
end

For an array, in those cases I usually do array.to_unsafe[i] so I avoid bound checks and take a risk of being wrong. An option is to have something like unsafe_at(i) that would do bound checks in non-release mode... but I usually don't like code that behaves different when compiled in release mode.

This works perfectly in ruby, and compiles in crystal but when I try running it, I get index out of bounds with no indication of what line is causing it. How would I go about debugging this?

def eratosthenes2(n)
  # For odd i, if i is prime, nums[i >> 1] is true.
  # Set false for all multiples of 3.
  nums = [true, false, true] * ((n + 5) / 6)
  nums[0] = false  # 1 is not prime.
  nums[1] = true   # 3 is prime.

  # Outer loop and both inner loops are skipping multiples of 2 and 3.
  # Outer loop checks i * i > n, same as i > Math.sqrt(n).
  i = 5
  until (m = i * i) > n 
    if nums[i >> 1] 
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_2
        nums[m >> 1] = false
        m += i_times_4  # When i = 5, skip 45, 75, 105, ...
      end
    end
    i += 2
    if nums[i >> 1] 
      m = i * i
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_4  # When i = 7, skip 63, 105, 147, ...
        nums[m >> 1] = false
        m += i_times_2
      end
    end
    i += 4
  end
  #
  primes = [2]
  nums.each_index {|i| primes << (i * 2 + 1) if nums[i]}
  while primes.last > n 
    primes.pop 
  end
  primes.last
end 

p eratosthenes2(1000)

If fails on the last line that looks like nums[m >> 1] = false. To find that line I put puts 1; ...; puts 2 between code that I though would fail (I'm a puts debugger too 😊). This will be easier to find out once we have correct line numbers for raised exceptions.

The problem here is a different one, though, you are assigning to an array with an index that is out of bounds:

a = [1, 2, 3]
a[10] = 0 # raises IndexError

In Ruby the above would make the array implicitly grow to be of size 11, putting nil in every position before that.

Since eratosthenes2 is supposed to be an efficient algorithm, you wouldn't want an array resize happening without you wanting that to happen, so that's why in Crystal we raise an error in this case.

The easy fix is to do:

nums[m >> 1] = false if m >> 1 < nums.size

With that it works. We could maybe add a method to Array that would set a value in a given index, but silently ignore it if it's out of bounds.

Thanks so much for your fast response. That's what I ended up doing. You mentioned that we'll have correct line numbers for errors at some point? That will be great.

Slightly unrelated but I found these results interesting. This found primes under 100 million in 0.53 seconds and 7 seconds in ruby. Interesting that the optimizations helped a ton with ruby but not so much with crystal. I can benchmark the unoptimized sieve in 0.98 seconds in crystal and 54 seconds in ruby.

@asterite - yes differing functioning isn't really something to strive for, I was thinking mostly for more helpful debugging (given a fix to line-nums for exceptions), but you're right: if doing such unsafe things, one should debug it heavily and spend the time required to get i right. An exception could in a bad situation go unnoticed in some rescue. Going the unsafe_at-route and letting it crash is probably the right thing rather than making it deviously accessible via []!.

@isaacsloan - It's almost twice as fast - not a good speed up? :-)

@ozra - You missunderstand me. It's totally a good speed up. I just found it interesting that it took ruby from 54 seconds to 7 seconds but crystal only went from 0.98 to 0.53 seconds.

@isaacsloan - was tongue in cheek. Yes, dynamic languages often have some extra weak spots, so changes can be drastic in those :-)

I recommend reading http://joeduffyblog.com/2016/02/07/the-error-model/ to understand why index out of bounds and missing keys are usually non-recoverable ("abandonments" in the article, and panics in other languages (Go and Rust, for example)).

Was this page helpful?
0 / 5 - 0 ratings