Crystal: Thoughts on instance variables wrt nil-able variables

Created on 9 Nov 2016  路  14Comments  路  Source: crystal-lang/crystal

I have a class where an instance-variable is basically defined as @locationKey: String | Nil (with a getter also defined), and then in a method I happened to write:

    lsize = locationKey.nil?  ? 0 : locationKey.size

That generates an error message of undefined method 'size' for Nil (compile-time type is (String | Nil)). I looked at that error for a few seconds, and then remembered that crystal complains about the second reference to locationKey because it is an instance-variable. I realize that the goals for crystal include future support for excellent parallel processing, and when that happens then the value of @locationKey could turn into a nil between the first and second reference. And I also remembered that for this specific example, there's a standard-crystal way to handle that:

   lsize = location.try &.size || 0

Okay, so I understand the reasons for that. But then I looked at all the other code I have in the same method. I have a lot of code which does something like:

   if location == otherEntry.location
      #   Do a lot of processing using the values
      #   of "location" and "otherEntry.location".
   end

But then I got to thinking: If the complier believes that location could be changed into a nil between an if and the code inside that if, then it must be true that location could also be changed to some new string value. That new value will be a valid string, so there won't be any runtime errors due to it being nil. However, if the value did change then it may not be true that location == otherEntry.location. Both location and otherEntry.location could change to new values, and the processing inside the if might run into run-time problems because that code assumes the two variables have the same value.

And if that's true, then doesn't it mean that instance-variables are going to be very awkward to work with in a totally safe manner? Will we always have to copy instance-variables into method-local variables if the values are referenced multiple times inside the method?

All 14 comments

Will we always have to copy instance-variables into method-local variables if the values are referenced multiple times inside the method?

In multi-thread environments, you always need to obtain a local copy of the value you want to work with as the shared reference might be changed by another thread.

While Crystal is _not yet multi-thread_, it has no GIL (Global Interpreter Lock) like Ruby or other languages that warrant you that the value obtained in the if condition will be the same used or passed along in the then block.

You can find more about this reasoning in the documentation:

https://crystal-lang.org/docs/syntax_and_semantics/if_var.html

On more system level languages (like C) you need to adopt local variable + atomic-write/read functionality to avoid parallel threads from affecting the same value.

I guess could be possible for the compiler to detect a if @a and hold a local value to it when referenced inside the block, however it might get tricky if in your code you're also assigning a new value to it.

Example:

if @a
  puts @a
end

Might be able to be converted by the compiler:

if __temp1 = @a
  puts __temp1
end

But it might be tricky on _assign_:

if @a
  @a = new_value
end

As another answer to the question: yes, it can happen that between reading location for the first time and reading it for the second time it could be a different string value. Or even something like this:

@location = "Hello"
# Here we maybe switch to another thread,
# which sets @location to "Bye"
puts @location # => "Bye"

So yes, you can get wrong behaviour if you don't synchronize well your code. But in the previous snippet at most we got a different value than what we expected. However, if we don't do this for types, this might happen:

@location = "Hello"
# Now another thread sets @location to nil
# Let's assume that the compiler still thinks that
# @location is a string
puts @location # ...???

If the compiler assumes that @location holds a string, which is basically represented as a pointer, it will say "Great! This isn't nil, I can directly dereference this". Well, if it's nil, the pointer will be the null pointer, and dereferencing it will give you a segmentation fault. And we can't really let a program crash just like that. This is what types can't change under the hood for the compiler. But it's OK if values change across the same type.

If you assign an instance variable to a local variable, no other thread can change it, and the compiler can assume and know that the type won't change, and do optimizations like not having to check for null to dereference a pointer.

Heh. In re-reading this today, I realize that I merely stated the obvious, and stated it very emphatically. In my defense, when I was about halfway through writing that, I started thinking about how I should be checking the election results...

I'd argue that if the program operates on a different value than the code "believes" that it has, then you can still get run-time failures. For instance, in the above I was getting the length of that string because I expected to operate on subsets of the string. And those operations were going to key off of "lsize". So what happens if the code references subsets of what-it-thinks is a 1000-character string, but that is now a 10-character string?

Index out of bounds (IndexError)
[4401291906] *CallStack::unwind:Array(Pointer(Void)) +82
[4401291809] *CallStack#initialize:Array(Pointer(Void)) +17

Note that I agree (almost) completely with how the compiler treats composite types. My claim is that it isn't doing enough. It's saving the programmer from this one case, but there's an infinite number of other cases where problems will come up for the exact same reason.

So... What I'm wondering is whether the compiler could do something to point out the general problem, instead of the one special case of nil. I didn't write that up last night, because I realized that I did not have a good idea of how the compiler could do that.

In the method I was writing yesterday, I'm comparing two objects which were created by reading in some JSON strings, and I want to compare the individual values of each object. So I'm doing a lot of code similar to if location == otherEntry.location. The first location is an instance variable in the current object, and the otherEntry.location is a method call into the other JSON object.

What I realized last night is that the only really safe way to do that is to copy all of those instance variables and method-calls into local variables, and then do the actual processing using just those copies. When I look at the crystal code that I'm writing, it seems to kinda ugly due to all those copies. I might need to create 50 or more local variables, and then keep track of which local-copies go with which original variables, and when the copies needed to be made.

So I guess what I'm asking is if there is any way the compiler could handle that for me. While I have some vague ideas, I have come up with no idea which I really like when I look at it in detail.

Here's one vague idea, consider a 'localize' function which was supported and implemented by the compiler itself. Not by macros. Not by the programmer remembering to make local copies.

  def someMeth
      localize @location, otherEntry.location

      # lots and lots of code.  Somewhere in that code, I
      # reference both variables:
      if @location == otherEntry.location
         #   Do a lot of processing using the values
         #   of "location" and "otherEntry.location".
      end

      # and maybe later on I have more code which
      # references those same variables.
   end

My vague idea is that once I tag a variable as "localized", then the first time that I reference that variable, the compiler makes a local copy of the value. And all subsequent references in that same method will see that local copy, without me having to create dummy variables to do that work.

I want that done as a statement at the top of the method, and not have to remember where in the method I need to make the local copies. In the method that I was writing yesterday, I had written code which was already compiling and working fine, without the compiler warning me that it some cases those variables might be nil even though my code assumed it was not nil. It was only when I added an earlier statement for lsize = locationKey.size that the compiler forced me to realize that locationKey could have been nil at that point in the method.

I can think of some possible problems with this (*), but that's the kind of thing I'm thinking about. I don't want the compiler to stop doing the checking that it already does!

(* - in fact, I had to edit this example to avoid one such problem! 馃槃 )

_[ Sorry for the length of these comments, but I tend to be a wordy person once I get focused in on some topic. ]_

Note that it may be that the compiler would also have to add a second invisible variable to keep track of when the copy had been made. Such as:

   if someThing == true
      first8 = location[0...8]
   else
      first8 = otherEntry.location
   end
   last8 = location[-8..-1]
   puts first8, location

The thing I'm thinking about here is not the values of first8 or last8. It's the issue where the copy of location might need to be done when first8 is created, but if the copy isn't done there then the copy would need to be made when last8 is created.

I realize that this "solution" may create as many problems as it solves, but I wanted to suggest some way that the crystal compiler could be even more helpful than it currently is.

Maybe this is the kind of problem that you would like to solve at a higher level with strong standard library support for concurrency services (like Actors in Erlang/OTP). Crystal will let you blow your leg if you use concurrency primitives and provide much saner guarantees if you stick to higher level constructs. Just a thought!

Nothing actionable here, so I'm closing this.

Is it only me who find this "multi threading context" insane? I thought this is why fibers exists in crystal, not to deal with this. Two different threads should NOT be allowed to share memory by default.

@jsaak the samples exposed in this issue can be extrapolated to fibers also. If a fiber performs I/O between reads of the same ivar it might be paused and anither fiber might change the ivar.

Yes, it is true. But you have much more control when your current fiber yields. There is no way that after a simple if statement the data will change.

I hope you know what you are doing. I feel, it is a recipe for disaster.

@jsaak it is not always obvious if method call will trigger a pause of the current fiber. Unless crystal flags methods that might do that. In that case the compilation time will increase because then that information in sensible to the actual method. If the method is in an object that might be of different types then you might even find that code that one compiles then it doesn't.

So, it is safer this way.

Yes, it is true. But you have much more control when your current fiber yields. There is no way that after a simple if statement the data will change.

It's possible. Consider:

@in_var : String | Nil
def meth1
   if @in_var == "Some String" && other_condition?
      printf "%s\n", @in_var[5..-1]
   fi
end

def other_condition?
   #  Think of all the things which could happen
   #  in this method, and which the compiler won't
   #  know about when it is compiling meth1.  This
   #  method can do anything it wants with the value
   #  of @in_var, and in a world of fibers it could
   #  do anything it wanted with switching fibers.
end

I consider the compiler's concern about "multi threading context" to be a good idea. My only complaint is that it should notice more than just when the type of an instance variable could change. And I think that it should be possible to add features to the compiler which would make it easier for the programmer to address these issues.

Perhaps I've just written more crazy code in Ruby than many people, but in thinking over some of the code that I've written, I do understand and appreciate what crystal is trying to do here. Let me add that all of my code is perfectly valid Ruby code if run in a single thread, but some things I've written would be extremely fragile (i.e., loaded with intermittent and obscure bugs) if it ran in a multi-threaded environment!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lbguilherme picture lbguilherme  路  3Comments

asterite picture asterite  路  3Comments

oprypin picture oprypin  路  3Comments

asterite picture asterite  路  3Comments

costajob picture costajob  路  3Comments