Language: Enums are too heavy for large-scale use, but are the only feature affording easy completion.

Created on 17 Oct 2018  路  15Comments  路  Source: dart-lang/language

The Dart IDE integration allows completing EnumClass.name where something of type EnumClass is required. However, Dart enums are objects, and a large system using complex protobufs might need to allocate ~10000 such enum objects (this is actually happening). That is a serious memory and start-up time impact on the application.

The currently available alternative is to just use constant integers, like:

abstract class EnumClass {
  factory EnumClass._() => null;  // prevents extension.
  static const int foo = 1;
  static const int bar= 2;
  static const int baz = 3;
}

However, that pattern is less readable, less writable and doesn't complete well in the editor. You have to write EnumClass before it lets you complete to .foo.

We could introduce a special kind of type-aliased enum, say:

enum EnumClass on int {
  foo = 1,
  bar = 2,
  baz = 3;
}

That is equivalent to the above class, and lets EnumClass be used as an alias for int.
It also lets the code completion recognize it as an enum.

We could generalize that and introduce type aliases for any type, and allow static declarations on
those types:

typedef EnumClass = int  {
  static const int foo = 1;
  static const int bar= 2;
  static const int baz = 3;
}

and improve code completion to include all static members returning something of the same type as the containing type. So, if the context type is EnumClass, it would propose completing to EnumClass.foo, EnumClass.bar and EnumClass.baz, just as for an enum, and we would also complete static factory methods, not just constructors.

Or maybe we can use static extension types (#42) to get the type alias, and maybe even cast functions to/from int on EnumClass. If we plan to get static extension types,, we should make sure that whetever we do here will work with that syntax too.

request

All 15 comments

Would these approaches allow to add static methods like

List<T> values = [foo, bar, baz];
List<T> get evenValues => values.where((e) => e.value % 2 == 0).toList(); 

to be used like

print(EnumClass.values);
print(EnumClass.evenValues);

or be used with extension methods to make such additional methods available?

Would it not be better to make the more powerful and flexible old-style enums

The currently available alternative

more convenient to write and use instead of adding different limited ways to define enums?

Since enums are all sealed classes, couldn't we continue to pretend they are as they are now, but just compile them down to more-or-less-just integers?

That is to say, _why_ are enums necessarily heavy, as currently specified?

That is to say, why are enums necessarily heavy, as currently specified?

In large part because the following program must print "true" and "false", which implies that in general enums must support dynamic dispatch (which de facto means they pretty must be represented as heap allocated values with vtables).

enum Color { red, green, blue }

class NotAColor {
  String index = "hello";
}

void foo(dynamic a) {
  print(a.index is String);
}

void main() {
  foo(new NotAColor());
  foo(Color.red);
}

Can we use the same strategy as we use to make this work?:

class NotAnInt {
 String sign = "hello";
}

void foo(dynamic a) {
  print(a.sign is String);
}

void main() {
  foo(new NotAnInt());
  foo(2);
}

@Hixie ints are special. It is an interesting idea to make other values special like this. Implementations of LISP often have multiple kinds of special values.

On the VM ints have two representations, one is a boxed value (an allocated object), the other uses a bit in the pointer representation to distinguish it from a real pointer (Smi).
This makes the call a.sign more complex that it would be otherwise, even when a is typed as an int, and makes a.index more complicated because a might be a Smi, so the VM needs to generate code to check the bit even though .index is not defined on int.

There could be other values shoe-horned into the pointer bits, it will add complexity everywhere, just like the current Smi representation makes a.index more complex.

When compiling to JavaScript, there is no kind of value we can play this kind of trick with. If we can come up with a scheme that works well when compiling to JavaScript, that would be preferable to one that where JavaScript is significantly worse.

It is important to note that the 'Enums' from protobufs are not the same type as declared by the enum declaration. protobufs have some validation and reflective operations, so the reduction will not be O(10000) to O(1), but to a lesser O(10000).

Enums are ... the only feature affording easy completion.

I don't believe that's true.

The currently available alternative is to just use constant integers ... However, that pattern is less readable, less writable and doesn't complete well in the editor. You have to write EnumClass before it lets you complete to .foo.

The completion issue could be fixed. We could easily add all static fields (or some subset of static fields) to the list of suggestions, but what we'd need to validate is that doing so actually improved the UX.

On the VM ints have two representations, one is a boxed value (an allocated object), the other uses a bit in the pointer representation to distinguish it from a real pointer (Smi).

FWIW, I think SMI is one of Dart's vestigial features that should be removed. To be fair, it's a very clever trick, and it made a lot of sense back when Dart was a dynamic programming language, where it actually had a huge performance benefit.

In the statically typed AOT world I think all it does is make performance worse and hinders new language features. ints should be non-nullable by default (if fact all types should be non-nullable by default), it should be represented by 64-bit integers embedded in the object representation and in the stack frames, and it should be no slower than C++'s 64-bit integers. The same applies to doubles and booleans.

The language should also allow defining _custom_ unboxed types of known size with the same level of efficiency, including enums, grapheme clusters, Size, Offset, Array<UNBOXED_TYPE>, various "measure" values such as length in pixels, time, temperature, currency, percentage. These simple things are boxed in Dart which leads to higher memory usage, higher GC pressure, and likely also bloated native code (I believe @mraleph's Vipunen experiment confirmed that).

As a litmus test Dart should be powerful enough that you could implement your own String or https://github.com/dart-lang/language/issues/34 very efficiently.

@yjbanov
It's not Smis that are the problem, it's boxed integers in general (MInt is also a boxed integer).

Dart can already do many of the optimizations you mention. We can unbox integers in many places, but without whole-program analysis and optimization, we can't do so on API boundaries.
Being nullable is one of the problems, but not the only one. Integers are objects, and int is a sub-type of Object.
Generic classes will need to be specialized for unboxed values. If I make a Queue<int>, and I only want to compile the Queue code once, then the elements of all instances need to be homomorphically represented (aka, boxed). To get around that, we can specialize the class, using a completely different implementation and internal representation for Queue<int> than we do for Queue<Object>. That means generating twice the code, though, or having some extra abstraction layer converting between internal and external representation. And since a Queue<int> is assignable to Queue<Object>, any use at the latter type will need to box and unbox at the API level.
For AOT compilation, you need to know all the specializations ahead of time.

And then there are dynamic invocations. We can probably do something clever, but at the cost of making dynamic invocations even more expensive - they will basically have to match the function signature to the argument list signature at run-time to see which arguments can/must be passed unboxed, and then do boxing/unboxing as necessary.

FWIW, I think SMI is one of Dart's vestigial features that should be removed.

Smi is not a feature. It's an optimization of boxed integers. If there would be no need to box integers - there would be no need for smi's.

As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.

FWIW, I think SMI is one of Dart's vestigial features that should be removed.

Smi is not a feature. It's an optimization of boxed integers. If there would be no need to box integers - there would be no need for smi's.

Performance is a feature :)

As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.

The goal is not to eliminate dynamism. The goal is to have predictable performance in code that requires it, which is usually _very small_ portion of the code, and therefore the cost of having to deal with some "advanced" language features (if we have them) is relatively low, and at the same time it's where performance really really matters.

As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.

Even if we eliminated dynamic entirely, we'd still have to box because int is a subtype of Object. Removing dynamic would force you to cast an Object to int before you use it, but it's still possible to override a method that takes an int with one that takes Object.

That in turn means a callsite that's passing an integer needs to use a calling convention that can handle that being stored as an Object parameter. That's also a source of trouble for us, isn't it?

We could potentially do what Java and C# do and move numbers out of the object hierarchy. int wouldn't be a subtype of Object, but you could do a conversion to it to explicitly box.

That has its pros and cons, but might be a better trade-off than Dart's current approach which is to force all users to pay the cost of flexibility all the time, everywhere. I'd be really interested to see some data on how often real code actually uses the fact that numbers are a subtype of Object.

That in turn means a callsite that's passing an integer needs to use a calling convention that can handle that being stored as an Object parameter. That's also a source of trouble for us, isn't it?

Let's take this example:

class A {
  foo(int a);
}

class B extends A {
  @override
  foo(Object a) {
    print(a);
  }
}

main() {
  var b = B();
  b.foo(5);
  b.foo('hello');
}

Can we compile it to the following?

class B extends A {
  foo(int a) {
    foo$Object(box<a>);
  }

  foo$Object(Object a) {
    print(a);
  }
}

main() {
  var b = B();
  b.foo(5);
  b.foo$Object('hello');
}

We could potentially do what Java and C# do and move numbers out of the object hierarchy. int wouldn't be a _subtype_ of Object, but you could do a conversion to it to explicitly box.

This would be the cleanest approach IMHO. An alternative to boxing is to encode Object and dynamic pointers as (type, value) tuples, where value is either the value itself or a pointer to an object on the heap. This might actually be better for performance because of data locality, and if values are unique enough (particularly for doubles), they should be strictly better than boxing because each tuple is 2x64 bits long vs each boxed object that requires a _minimum_ of 3x64 bits: (pointer) -> (header, value). You do get the benefit of being able to share values, but something tells me that's not very effective.

That has its pros and cons, but might be a better trade-off than Dart's current approach which is to force all users to pay the cost of flexibility all the time, everywhere. I'd be really interested to see some data on how often real code actually _uses_ the fact that numbers are a subtype of Object.

+馃挴

Can we compile it to the following?

Maybe. If you have a method like foo(int x, int y, int z), you would have to generate eight versions. That's a very large size overhead for ahead-of-time compilation (like JavaScript compiled programs, which you would want to be small).

If you have whole-program analyses, you might be able to cut down on some of those cases.

Doing auto-boxing is what Java did. It means that generics only work for objects, so a List<Integer> will contain boxed integers. C# went the other direction, and did generic specialization for value types, so a List<int> is running different code than List<Object>.
You get more code duplication and faster code

Both Java and C# are just-in-time compiled, so you only pay for what you use.
Dart is ahead-of-time compiled to JavaScript, so we don't get that luxury.

As for using tuples, I don't see how Object and dynamic values differ from other types of values.
We have first class functions in Dart. A Object Function(String) f; variable can contain an String Function(Object) value. Doing Object o = f("42") would not statically know that the function expects an Object value, nor that it returns an String. You would need to make all values tuples, and then you have effectively just introduced tagged values. For smis, which is the overwhelming majority of numbers, that's a doubling in size.

That's a very large size overhead for ahead-of-time compilation (like JavaScript compiled programs, which you would want to be small).

A couple of options:

  • It's actually not a very large size overhead because widening unboxed types to Object is rare
  • JavaScript doesn't need to change its calling convention because everything is dynamic anyway, you cannot unbox in JS

If you have whole-program analyses, you might be able to cut down on some of those cases.

I cannot rely on whole-program analysis when optimizing performance-sensitive parts of my code. Whole program analysis optimizations are useful as a last-minute vacuum sealer. I need language primitives that are predictable. For example, I should be able to make performance assumptions that cannot be invalidated by an unrelated part of the code in a large application. Unfortunately, _by definition_, using non-local information for optimizations is precisely what whole program analysis does.

Doing auto-boxing is what Java did. It means that generics only work for objects, so a List<Integer> will contain boxed integers. C# went the other direction, and did generic specialization for value types, so a List<int> is running different code than List<Object>.
You get more code duplication and faster code

I'm totally fine paying up to maybe 4x code duplication to gain performance of the 5% of code where performance matters.

Both Java and C# are just-in-time compiled, so you only pay for what you use.
Dart is ahead-of-time compiled to JavaScript, so we don't get that luxury.

My gut feeling is that it won't impact JS much, but it will likely not have any benefits either. To get benefits from unboxing in JS we should start thinking about WebAssembly (high time we did tbh).

As for using tuples, I don't see how Object and dynamic values differ from other types of values.

This is not observable by the developer, so there's no difference. I was only suggesting an alternative in-memory representation. Instead of boxing the value, you can use fat pointers, like Go does.

We have first class functions in Dart. A Object Function(String) f; variable can contain an String Function(Object) value. Doing Object o = f("42") would not statically know that the function expects an Object value, nor that it returns an String. You would need to make _all_ values tuples, and then you have effectively just introduced _tagged_ values. For smis, which is the overwhelming majority of numbers, that's a doubling in size.

The size overhead is only proportional to the amount of numerics that are boxed. My guess is that most of them do not need to be boxed. I think Golang is a good source of stats for stuff like that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lrhn picture lrhn  路  4Comments

Cat-sushi picture Cat-sushi  路  3Comments

eernstg picture eernstg  路  5Comments

wytesk133 picture wytesk133  路  4Comments

mit-mit picture mit-mit  路  3Comments