Crystal: [RFC] Let symbols and enums be comparable

Created on 22 Mar 2020  Â·  19Comments  Â·  Source: crystal-lang/crystal

Right now we have autocasting of enums to symbols, and that's great!

enum Color
  Red
  Green
  Blue

def show(color : Color)
  puts color
end

# This works!
show :red

However, it can lead to confusion because you can't actually compare enum to symbols:

Color::Red == :red # => false

Also, you have to remember to use the question method, otherwise you'll have bugs:

color = Color::Red
case color
when :red # Oops!
end

You need to do:

color = Color::Red
case color
when .red? # Okay!
end

What if we make Enum and Symbol comparable?

Pros:

  • You can use symbol or question method interchangeably
  • It will be used in exhaustiveness check if we implement it
    Cons:
  • It _might_ lead to unions of enums and symbols, but I don't expect that to happen because a symbol can't be assigned to an enum, it can only be compared against it

A naive implementation can be:

struct Enum
  def ==(other : Symbol)
    to_s.underscore == other.to_s.underscore
  end
end

struct Symbol
  def ==(other : Enum)
    other == self
  end
end

In a benchmark that's like 150 times more expensive.

However, we can easily represent the underscore representation of an enum member with macros, avoiding one runtime "underscore", and then we could have a table of symbol => underscore, similar to the one we only have for symbol => to_s, which would make comparing things really fast. I quickly benchmarked just comparing the to_s representation and it's just 1.6x times slower. However, we could also have all the literal strings in a same pool of strings and we could compare them with same?, which leads to the same performance as comparing enums vs. enums.

We can, however, start with the naive implementation and later, even after 1.0, make it work much faster.

I'm still not entirely convinced this is a good idea.

What do you think?

discussion stdlib

Most helpful comment

I'd still prefer removing symbols, we've already done some pre-work on removing usages from the stdlib.

All 19 comments

I'd still prefer removing symbols, we've already done some pre-work on removing usages from the stdlib.

If the use case is to allow then in the when statements, I would say we could add autocasting there when the type of the cond is an enum. Same rule as in method dispatch.

I guess, if the condition is a symbol literal, we can rewrite it like this:

case cond
when :symbol
end

would be rewritten to:

if (cond.is_a?(Enum) && cond.is?(:symbol)) || cond == :symbol
end

We'll need to add Enum#is?, but that's it.

That's at least the simplest way to make it work right now, that's also efficient.

Eventually we can delay expanding case until the type of cond is stable, and then we can do a simpler (but same efficiency) check. But it can be done after 1.0.

However, allowing symbols to be compared in case means it's still even more likely someone will use == to compare enums against symbols. But maybe that's fine? Or we could still provide that == overload just in case?

We can translate it as:

if (cond.is_a?(Enum) && typeof(cond).new(:symbol) == cond) || cond == :symbol
end

I think is fine (ie: not worst) to allow autocasting in the when allowing it in method calls & assignments as we are doing it already...

I'm fine with this solution but I was under the impression that wanting symbols gone was a fairly popular proposal, was I wrong?

I think that impression might be valid, but removing symbols before 1.0 is probably a big breaking change that we want to avoid.

It seems 50/50 so far XD.

I like symbols, and I hope I can keep using them. But we will see what the future has in store.

I think that impression might be valid, but removing symbols before 1.0 is probably a big breaking change that we want to avoid.

Wouldn't you want a big breaking change _BEFORE_ 1.0? The reality is there isn't really a reason to use a symbol over an enum, or string for that matter. So I'm deff in favor of removing symbols and making them just be shortcuts to enum members.

Wouldn't you want a big breaking change BEFORE 1.0?

No if presumably there's a big project that needs to be released soon and breaking changes would delay it.

Let's stay focus on the topic of this issue.
We can discuss the enum/symbol thing in other issue/forum.

Another benefit is that in a spec you will now be able to do:

color.should eq(:red)

That's not type-safe (you could have a typo in the symbol), but since this is a test, type-checking is less important and it will fail anyway (either because it doesn't compile or because the test fails at runtime).

Remove the autocasting? :)

I think there are a couple possibilities. Some of us were just discussing this in Gitter.

1) Remove symbols as a type, but allow the syntax they use to stay on as an alias for enums in certain cases. I'm thinking specifically when testing an enum value:

Foo::BAR == :bar # true

and the current symbol auto-casting

def foo(bar : Foo)
end

foo(:bar)

2) Keep symbols as a type, but make them equivalent to Strings. This way they can be used interchangeably with Strings, thus avoiding their pitfalls.

Option 1 would be my preferred method, as it makes things a lot less complicated in the compiler. In Ruby symbols kind of have a purpose, as identical symbols share the same memory while identical strings do not. But Crystal doesn't seem to do this so they're mostly pointless outside of being a nice syntax for keys.

I wouldn't mind this as long as the existing use cases for symbols overwhelmingly still work without any syntax changes @watzon.

That would require option 2. The main issue that I can see with option 2 is compiler bloat. Suddenly there are a lot of cases you need to cover in the compiler to differentiate when a symbol should be considered a String vs an Enum. Symbol auto-casting already suffers from some issues in this area.

My personal wish would be something like Option 2, but where Symbol is essentially a Union type between a symbol ID (if the string was added to the symbol table at compile time) or a String (if we call .to_sym on something that isn't in the compile-time symbol table. So something like:

bar = :something
"something".to_sym # => :something, stored as only a symbol ID since this was present at compile time
"something_else".to_sym # => :something_else, stored as a String because this wasn't present at compile time

This would probably be crazy hard to do compiler wise, but syntax wise this would be my preference.

An even better version would be a heap-allocated symbol table that we add to at runtime -- so Symbol is still globally compact in that it's just an index into a table, but you can dynamically instantiate symbols. Best of both worlds.

following up on discussion in gitter, I would advocate for a redesign of Symbol where there is a heap-allocated global hash table for symbol names that gets added to at runtime, giving us a working "something".to_sym. Storage wise, a variable of type Symbol would still just be a symbol ID that is an index into this hash table. This is similar to what Ruby ended up doing, and has the advantage of opening up additional functionality and not breaking tons and tons of existing API code. Symbols look sexy would hate to lose them!

Alternatively, I'd be ok with still using the same compile-time symbol table, but adding .to_sym on String and having it check the compile-time symbol table (make it available as a constant at runtime) and throw an error if the String you have isn't in there. That wouldn't break any existing libraries and still adds a lot of useful functionality imo.

As a word of caution, if somebody calls #to_sym on user input it can
sometimes be viewed as a denial of service (since the global hash table
only grows...), just something to be aware of. Then again Ruby survived
with this "flaw" for quite awhile...

On Wed, May 13, 2020 at 3:37 PM Sam Johnson notifications@github.com
wrote:

following up on discussion in gitter, I would advocate for a redesign of
Symbol where there is a heap-allocated global hash table for symbol names
that gets added to at runtime, giving us a working "something".to_sym.
This is similar to what Ruby ended up doing, and has the advantage of
opening up additional functionality and not breaking tons and tons of
existing API code. Symbols look sexy would hate to lose them!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/crystal-lang/crystal/issues/8929#issuecomment-628259047,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAADBUEMHF6YNTDPUMLBKNLRRMHH3ANCNFSM4LRIVSYA
.

Yeah personally I like the idea of still having symbols continue to be compile time known rather than having an in memory hash table. Each symbol could reference a position in memory that's unique to that symbol value. That avoids the DOS problem, as well as actually making symbols a more performant key type than a String.

Granted like @straight-shoota mentioned in gitter, none of this is likely to change pre v1.0

Was this page helpful?
0 / 5 - 0 ratings