afaca9f7cfdf0a49bcba8985afec59e97e3019ba speeds up a hot map by using a field hidden behind a Map interface.
The implementation of the map indexer is:
ClassHierarchyNode operator [](Object cls) {
if (cls is ClassHierarchyNodesMapKey) {
return cls._classHierarchyNode;
}
throw new UnimplementedError('ClassHierarchyNodesMap for $cls');
}
I'd rather use the simpler formulation:
ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) {
return cls._classHierarchyNode;
}
However, the performance of the 'better' covariant form is bad enough to negate the benefit of using a field.
ClassHierarchyNode operator [](Object cls) checks for the two ClassID for the concrete classes that have the field, and then loads field:
0x7ff763ab50a0 movq r13,[r12+0x17]
0x7ff763ab50a5 push rbp
0x7ff763ab50a6 movq rbp,rsp
0x7ff763ab50a9 push r12
0x7ff763ab50ab push pp
0x7ff763ab50ad movq pp,r13
0x7ff763ab50b0 movq rcx,[rbp+0x10]
0x7ff763ab50b4 testb rcx,1
0x7ff763ab50b7 jz 0x00007ff763ab50e4
0x7ff763ab50bd movzxwq rdx,[rcx+0x1]
0x7ff763ab50c2 cmpl rdx,0x40a
0x7ff763ab50c8 jz 0x00007ff763ab50d6
0x7ff763ab50ca cmpl rdx,0x40c
0x7ff763ab50d0 jnz 0x00007ff763ab50e4
0x7ff763ab50d6 movq rax,[rcx+0xf]
0x7ff763ab50da movq pp,[rbp-0x10]
0x7ff763ab50de movq rsp,rbp
0x7ff763ab50e1 pop rbp
0x7ff763ab50e2 ret
0x7ff763ab50e4 <deopt>
ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) also tests for the two ClassIDs, but then does a lot more work, including calling Subtype1TestCache (according to profiles).
All this work is unnecessary, since both tested ClassIDs implement the declared covariant type.
0x7fe6e58d71c0 movq r13,[r12+0x17]
0x7fe6e58d71c5 push rbp
0x7fe6e58d71c6 movq rbp,rsp
0x7fe6e58d71c9 push r12
0x7fe6e58d71cb push pp
0x7fe6e58d71cd movq pp,r13
0x7fe6e58d71d0 movq rbx,[rbp+0x10]
0x7fe6e58d71d4 testb rbx,1
0x7fe6e58d71d7 jz 0x00007fe6e58d72c2
0x7fe6e58d71dd movzxwq rax,[rbx+0x1]
0x7fe6e58d71e2 cmpl rax,0x40a
0x7fe6e58d71e7 jz 0x00007fe6e58d71f4
0x7fe6e58d71e9 cmpl rax,0x40c
0x7fe6e58d71ee jnz 0x00007fe6e58d72c2
0x7fe6e58d71f4 movq rax,rbx
0x7fe6e58d71f7 movq rdx,[thr+0x68] null
0x7fe6e58d71fb movq rcx,[thr+0x68] null
0x7fe6e58d71ff cmpq rax,[thr+0x68] null
0x7fe6e58d7203 jz 0x00007fe6e58d72b0
0x7fe6e58d7209 test al,1
0x7fe6e58d720b jz 0x00007fe6e58d726a
0x7fe6e58d7211 movzxwq r10,[rax+0x1]
0x7fe6e58d7216 movzxwq r11,[rax+0x1]
0x7fe6e58d721b movq r10,[thr+0x18]
0x7fe6e58d721f movq r10,[r10+0x50]
0x7fe6e58d7223 addq r11,r11
0x7fe6e58d7226 movq r10,[r10+r11*8]
0x7fe6e58d722a movq r13,[r10+0x57]
0x7fe6e58d722e movq r13,[r13+0xf]
0x7fe6e58d7232 cmpq r13,0x874
0x7fe6e58d7239 jz 0x00007fe6e58d72b0
0x7fe6e58d723f movq r9,[pp+0xf] SubtypeTestCache
0x7fe6e58d7246 movq r12,[pp+0x17] [Stub] Subtype1TestCache
0x7fe6e58d724d movq r11,[r12+0x7]
0x7fe6e58d7252 call r11
0x7fe6e58d7255 cmpq r8,[thr+0x68] null
0x7fe6e58d7259 jz 0x00007fe6e58d726a
0x7fe6e58d725b cmpq r8,[thr+0x70] true
0x7fe6e58d725f jz 0x00007fe6e58d72b0
0x7fe6e58d7265 jmp 0x00007fe6e58d726a
0x7fe6e58d726a push [thr+0x68] null
0x7fe6e58d726e push rax
0x7fe6e58d726f movq r11,[pp+0x1f] ClassHierarchyNodesMapKey
0x7fe6e58d7276 push r11
0x7fe6e58d7278 push rdx
0x7fe6e58d7279 push rcx
0x7fe6e58d727a movq r11,[pp+0x27] 'cls'
0x7fe6e58d7281 push r11
0x7fe6e58d7283 movq rax,[pp+0x2f] SubtypeTestCache
0x7fe6e58d728a push rax
0x7fe6e58d728b push 4
0x7fe6e58d728d movq rbx,[thr+0x200]
0x7fe6e58d7294 movl r10,7
0x7fe6e58d729a movq r11,[thr+0x108]
0x7fe6e58d72a1 movq r12,[thr+0xa8] [Stub] CallToRuntime
0x7fe6e58d72a8 call r11
0x7fe6e58d72ab addq rsp,0x38
0x7fe6e58d72af pop rax
0x7fe6e58d72b0 movq rcx,[rbp+0x10]
0x7fe6e58d72b4 movq rax,[rcx+0xf]
0x7fe6e58d72b8 movq pp,[rbp-0x10]
0x7fe6e58d72bc movq rsp,rbp
0x7fe6e58d72bf pop rbp
0x7fe6e58d72c0 ret
0x7fe6e58d72c2 <deopt>
/cc @mraleph @aartbik
Explicit is checks and implicit casts are presented in two different ways in IR. is checks do type profiling and are specialized based on that. Implicit casts don't do type profiling.
0x7fe6e58d71e2 cmpl rax,0x40a
0x7fe6e58d71e7 jz 0x00007fe6e58d71f4
0x7fe6e58d71e9 cmpl rax,0x40c
0x7fe6e58d71ee jnz 0x00007fe6e58d72c2
This chunk of code comes from ClassCheck that we insert above AssertAssignable (this is done by FlowGraphTypePropagator::StrengthenAsserts).
The expectation is that AssertAssignable would be then eliminated by type propagation - however our type propagation lattice is not smart enough to represent the type we need here.
StrengthenAsserts work as expected for polymorphic class checks. is and as.I made a standalone Dart program that illustrates the issues reported in this bug. Hopefully helpful while analyzing and fixing this further.
Expected output:
Instance of 'B'
Instance of 'F'
passed
Program:
import 'dart:collection';
class A {}
class B {}
abstract class C implements A { B _b; }
class D extends C { }
class E extends C { }
class F extends B { }
class X extends MapBase<A, B> {
// Pick this one:
//B operator [](Object c) {
// if (c is C) {
// return c._b;
// }
// throw new UnimplementedError('no [] C: $c');
//}
// Or this one:
B operator [](covariant C c) {
return c._b;
}
operator []=(Object c, B b) {
if (c is C) {
c._b = b;
return;
}
throw new UnimplementedError('no []= C: $c');
}
// Don't care.
Iterable<A> get keys { return null; }
B remove(Object key) { return null; }
void clear() {}
}
main() {
A a = new A();
B b = new B();
D d = new D();
E e = new E();
F f = new F();
X x = new X();
x[d] = b;
x[e] = f;
print("${x[d]}");
print("${x[e]}");
try {
print("${x[a]}");
} catch (e, s) {
print("passed");
}
}
As observed above, under JIT, X vs. Y both do the check-class, but Y does not stop there:
2: B3[target]:0 ParallelMove rcx <- S+2
4: CheckClass:12(v3 Cids[1: D etc. cid 1026-1027])
6: v18 <- LoadField(v3, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
7: ParallelMove rax <- rax
8: Return:26(v18)
2: B1[target]:0 ParallelMove rbx <- S+2
4: CheckClass:12(v3 Cids[1: D etc. cid 1026-1027])
5: ParallelMove rax <- rbx, rdx <- C, rcx <- C
6: AssertAssignable:12(v3, Type: class 'C', 'c', instantiator_type_args(v0), function_type_args(v0)) T{Type: class 'C'?}
7: ParallelMove rcx <- S+2
8: v4 <- LoadField(v3 T{Type: class 'C'?}, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
9: ParallelMove rax <- rax
10: Return:18(v4)
With a better lattice, type for v3 is improved, and the code for Y becomes the same as for X (generated CheckClass removes the AssetAssignable):
2: B1[target]:0 ParallelMove rcx <- S+2
4: CheckClass:12(v3 Cids[1: D etc. cid 1026-1027])
6: v4 <- LoadField(v3 T{Type: class 'C'}, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
7: ParallelMove rax <- rax
8: Return:18(v4)
I made two improvements:
type propagation now uses class hierarchy during union operation, which provides a better type lattice, e.g. taking the union of B,C now yields A rather than resorting to "dynamic", which immediately benefits propagation over phis, but also helps the next improvement
A
/ \
B C
check classes with cid ranges now take union too rather than just dealing with singletons
Overall this yields a 50-80% improvement on the co-variant benchmarks, bringing co-variant on par with explicit is-check (and both improved).
Slava, are your points 2 and 3 still worth pursuing?
armv8 improvements on benchmark (note that both improve, and both are "on par" afterwards):
TypeChecks.IsCheck (Odroid-C2) 22.61%
TypeChecks.CovariantCheck (Odroid-C2) 88.61%
@aartbik I think it might be worth investigating - but probably in the larger context of unifying how assert-assignable-s and is checks work in general. Right now I'd put this on back burner unless we discover more cases where it matters.
(I am considering to keep this as a starter project for the next member of the VM team)
Over to Slava in case he wants to re-assign this as starter.
Most helpful comment
I made two improvements:
type propagation now uses class hierarchy during union operation, which provides a better type lattice, e.g. taking the union of B,C now yields A rather than resorting to "dynamic", which immediately benefits propagation over phis, but also helps the next improvement
A
/ \
B C
check classes with cid ranges now take union too rather than just dealing with singletons
Overall this yields a 50-80% improvement on the co-variant benchmarks, bringing co-variant on par with explicit is-check (and both improved).