Crystal: Ivar type guesser is over ambitious (should _only_ guess types)

Created on 21 Sep 2016 · 19Comments · Source: crystal-lang/crystal

Consider this:

class Foo
  @a : Int32
  @b : Int32

  def initialize()
    @a = 1
    reset
  end

  def reset()
    @b = 2
  end
end

f = Foo.new

It will fail with "Error in line 3: instance variable '@b' of Foo was not initialized in all of the 'initialize' methods, rendering it nilable". Which is of course incorrect.

The type guesser should not try to enforce things it doesn't have the capability to analyze.

It is good that it tries to _guess the type_ of ivars _that aren't type declared_,
It is bad that it tries to _guess_ whether type-declared ivars are used correctly - this _should_ be checked in main type inference, where it would be clear that above code actually does work out.

If a ivar is declared as Int32, it should be taken at face value. Next please...

Source

ozra

Most helpful comment

Hmm. Okay @ozra ... Here's an idea based on some code I noticed going by on the gitter channel. Look at your code, and then consider a slight twist on that code:

class Foo
  @a : Int32
  @b : Int32
  @c : Int32

  def initialize()
    initialize(true)
    @a = 1
  end

  def printvals
    printf " a=%d b=%d c=%d\n", @a, @b, @c
  end

  def reset()
    initialize(true)
  end

  private def initialize(fakeflag : Bool)
    @a = 9
    @b = 2
    @c = 3
  end

end

f = Foo.new
f.printvals
f.reset
f.printvals

result:

 a=1 b=2 c=3
 a=9 b=2 c=3

Note that I'm using the fakeflag just to indicate the alternate version of initialize to call. There is no boolean logic in that initialize(). This trick wouldn't work for all the classes where I use a common 'reset' method, but it might be useful for some of them. And some slightly different approaches might handle some more of those classes. For instance, imagine if the initialize methods were:

  def initialize()
    initialize(true, a = 1)
  end

  private def initialize(fakeflag : Bool, @a : Int32 = 9)
    @b = 2
    @c = 3
    if a != 9
        # do some other stuff
    end
  end

It might also be that crystal developers will be horrified when they see this. 😃

drosehn on 13 Oct 2016

👍3

All 19 comments

i also need this many times, usually i just repeat in initialize and in reset method, but in reset can be many variables, so code repeat itself

kostya on 21 Sep 2016

Exactly my use-case too! Often when doing instance-pooling, I commonly use the "reset-pattern". The non-DRY-ness is sickening (already had a sinister bug because of it [missing to duplicate change into reset-func that is])

ozra on 21 Sep 2016

@ozra, did you created a macro with reset, or any other workaround to keep the dry-ness?.

raydf on 21 Sep 2016

@raydf : No, there was some snag _as I remember it_, causing it to fail, unfortunately don't remember it right now. In any event the implications are further reaching than just that: if typing all instance vars (which I personally usually always do) it would mean "pretty much anything" goes in the initializer, as long as it's correctly types code _according to the common main inference_.

ozra on 21 Sep 2016

Is not ambition, but safety. Maybe it's not unsafe to have an integer value uninitialized but it is for other types, like references. Exactly the same happens in Swift, just with a different error. I think we should just change the error message here. The rule is, if I remember correctly, self can't be passed or call methods on it before all the variables has been initialized.

In your example you can solve the problem adding a default value along with the instance var declaration.

waj on 21 Sep 2016

Yes, @waj, of course we want to ensure that the demands are fulfilled. I only propose that validation is postponed until main inference. And that the guess-type semantic phase only guesses types. It's not about _if_, but about _when_.
In the most general sense, any code called after an allocate, and until the end of that method (which by convention and automatic generation will almost invariably be a new class method), it would be so that the bound flow of all ivars is Nil, and if at any point in the initialize a non matching type is assigned, or when reaching the end of it, any ivar is still bound to nil value (and where Nil isn't allowed by the type, naturally), it _then_ errors. This was a fuzzy description of it, no doubt you and @asterite are far better apt at shaping the exact algorithm.

As I understand it, the important aspect of the guessing phase is simply to establish the types of all members of a class before further phases.

I might still be completely in the blue, but I don't think so :-)

ozra on 21 Sep 2016

@ozra this is the same that was asked/answer in https://github.com/crystal-lang/crystal/pull/2443#issuecomment-216690733

if other than initialize is considered for the type inference of the ivars it boils down to analyze all the type hierarchy.

bcardiff on 21 Sep 2016

Hmm, maybe I've gotten it all wrong - but doesn't every single little millimeter of the _entire final program_ go through the main inference?

The initializers methods just happen to go through the _additional_ type guesser _first_ - in order to ensure that all classes / structs have complete known typed structures before the _following phases_. No? And when a type is declared on an ivar - it's set in stone - and main inference should be able to pick it up from there and handle erroring on mis-use - in a much more elaborate way then in the guesser. No? I'm shamefully aware that am still faaaar from well acquainted in those semantic stages of the compiler, I will improve my knowledge with time. This is just the impression I've gotten.

I just suggest leaving _typed_ ivars alone, and let it error at the main stage _instead_. _Untyped_ ivars _obviously_ has to be guessed and errored already when the guesser can't figure it out.

ozra on 21 Sep 2016

I think this _could_ be done, but it's better if we keep it simple. For example right now an initialize is analyzed without taking care of calls or other methods, in the type guesser. In the main logic it's true that we do analyze a bit more, but this is only a result of the way the compiler worked in the past. Imagine if we want to have incremental compilation, we'd have to store the whole tree of methods invoked by initialize, and if those change, even if their type remain the same, we'd still have to re-analyze the initialize methods to see if those other methods stopped initializing an instance variable.

asterite on 21 Sep 2016

I have this _feeling_ (ain't worth much in computing ;-) ) that incremental compiling _can be solved well_, given some thought, _without infringing on_ the quality of _the language_. I'm trying, day by day to improve my understanding of the flow of the semantic internals, but I have superficial matters still on the table (today I spent hours chasing three bulls all over the neighbourhood as they escaped on my watch (while getting cigarettes :facepalm:), haha. Well, obviously the matters I refer to primarily are working and too many coding endeavours. I'm gonna try to get into it enough so that I _might_ be of help for sorting out perhaps some performance issues or the like - that's something that I could do even without a grok-level understanding.

I'd _really_ like to see the language shine without imposed limitations because of _implementation_ limitations and performance gripes. Primarily, above mentioned, and also Precise Unions - which I find to be superior way to handle typings - out of a _language_ perspective.

So, well, I'll just have to put up, or shut up - because I know you guys have a heavy load on your shoulders already, and are all making a fantastically impressive job on this project! I must applaud you on that!

I just wanted to say this - in hope that you don't _hastily or prematurely_ dismiss great language capabilities.

ozra on 22 Sep 2016

The initial example is very short and simple for a human to look at and think "this should work". In larger and more complicated programs it isn't so simple. I can say that I had one case where crystal insisted that I initialize some instance variable when I did not think it needed to be, but it turned out that it was possible for the variable to be accessed before I set it.

In ruby I've created reset-like methods which included a parameter to indicate how much should be reset, and then called that reset method from initialize. What does the compiler do when the method is:

  def initialize()
    reset_vars(0)
  end

  def reset_vars(how_much)
    if how_much < 3
      @b = 2
    end
  end

I share your desire to follow DRY principles, and to also avoid the compiler claiming some instance-variable is nil-able when in fact I never want that variable to hold a nil value. However, I think that for now it's best for the compiler to follow a strategy which it is simple and consistent.

drosehn on 22 Sep 2016

Let me also give a thumbs-up 👍 to the comment of "_Imagine if we want to have incremental compilation_...". I certainly would like that to be supported, at least eventually, once the developers run out of other things to do. 😄

drosehn on 22 Sep 2016

@ozra As I see it: as less type annotations and more the type inference can do, the better.

But incremental compilation is a must. Otherwise it will eventually be a pain for large projects.

Some constraints help us feel confident that incremental compilation will be possible because the rules try to limit how adding more and more code could force to throw intermediate type inference deductions.
But, as soon as we get there and we can relax some constraints it will be done.

This way, if incremental compilation comes after 1.0, we avoid breaking changes.

bcardiff on 22 Sep 2016

@ozra I think it's OK to try to push the limits, and you and others are always forcing us to do that, and we like that!

In this precise case, @drosehn gave the example I was going to write that might be hard to deal with: if or while inside an invoked method. Of course we _could_ just apply the rule for instance variables that are assigned outside if/while, and that might work, but then this rule, and the exceptions, must be documented and users eventually have to learn them so they can understand why the compiler means when they get an error.

Right now the rule is pretty simple for a human and then compiler: the instance variable must be initialized directly in the initialize method. In a way, this is similar to how we simplified type inference for instance variables: the errors the compiler gave you were sometimes pretty cryptic, and simplifying the rules, although sometimes a bit more tedious to write, lead to simpler error and faster coding in the end.

asterite on 22 Sep 2016

Allowing arbitrary method calls is a hard problem. Maybe it can be solved in a good way while still having incremental compilation, but I believe it is far better to have simple rules and incremental compilation working than to have a very complex flow algorithm and only have incremental compilation in a far future because it got too complex to do. Release early, release often, and improve. I consider this more complex flow analysis (if it is possible at all) to be a post-1.0 feature (as far as I can tell, it can be made without breaking existing code).

lbguilherme on 22 Sep 2016

The practical arguments put forth here regarding actually getting to 1.0 in mannerly time are good.
I fully agree that incremental compilation is important. I believe the best of both can be combined, however, I'll close this particular issue on the subject, since obviously, as mentioned, it is easier to relax constraints later than to introduce them should it prove impossible to solve.

ozra on 22 Sep 2016

Hmm. Okay @ozra ... Here's an idea based on some code I noticed going by on the gitter channel. Look at your code, and then consider a slight twist on that code:

class Foo
  @a : Int32
  @b : Int32
  @c : Int32

  def initialize()
    initialize(true)
    @a = 1
  end

  def printvals
    printf " a=%d b=%d c=%d\n", @a, @b, @c
  end

  def reset()
    initialize(true)
  end

  private def initialize(fakeflag : Bool)
    @a = 9
    @b = 2
    @c = 3
  end

end

f = Foo.new
f.printvals
f.reset
f.printvals

result:

 a=1 b=2 c=3
 a=9 b=2 c=3

  def initialize()
    initialize(true, a = 1)
  end

  private def initialize(fakeflag : Bool, @a : Int32 = 9)
    @b = 2
    @c = 3
    if a != 9
        # do some other stuff
    end
  end

It might also be that crystal developers will be horrified when they see this. 😃

drosehn on 13 Oct 2016

👍3

@drosehn I've actually used dummy parameters before, though not for this particular scenario, and I think it's an acceptable trick. I usually do:

private def initialize(param1, param2, *, dummy = true)
end

initialize(1, 2, dummy: true)

so dummy can never conflict with other overloads.

asterite on 14 Oct 2016

👍2

@drosehn - Yes I've gone this route too :-) but with a value-less "marker type" instead of bool for the "alternative signature".
@asterite - Thanks for that tip! Your solution is a lot simpler and cleaner than the "marker type"-variant.

It's still lacking for resets, but you can't have everything (yet ;-) ).

ozra on 17 Oct 2016

Was this page helpful?

0 / 5 - 0 ratings