Presto: Improve Type Equality and Hashcode

Created on 2 Nov 2020  路  8Comments  路  Source: prestosql/presto

As discussed in #5768, hashCode/equals methods in Type subclasses should be implemented and optimized for improving performance of HashSet and HashMap related operations for classes like ConnectorExpression that rely on types for comparison.

All 8 comments

Maybe ConnectorExpression classes should precalculate hash code? Would that be sufficient?

Maybe we should make equals and hashCode abstract in AbstractType to force implementations to implement it.

I like the simplicity of how AbstractType delegates to TypeSignature for equality.
Would be enough if we e.g. precompute hashCode in TypeSignature constructor?
(or rather, lazy compute, maintaining "effectively immutable" behavior)

Maybe ConnectorExpression classes should precalculate hash code? Would that be sufficient?

I think that _might_ work, but doing it for types seems useful for current/future usecases where types are involved.

Maybe we should make equals and hashCode abstract in AbstractType to force implementations to implement it.

This is a good way of letting people know that not implementing it can degrade performance, however it can break plugins too.

precompute hashCode in TypeSignature constructor? (or rather, lazy compute, maintaining "effectively immutable" behavior)

the lazy compute idea seems interesting. However, I feel that we should also have type-specific lazily computed implementations for RowType, ArrayType etc rather than falling back to TypeSignature every time. We already do it for some types. It's not "forced", so the fallback can be useful for types installed through the plugins. This would be a hybrid of what @dain and @findepi suggested, but not sure how clean it'd look. Thoughts?

However, I feel that we should also have type-specific lazily computed implementations for RowType, ArrayType etc rather than falling back to TypeSignature every time

Where would you store lazily computed hash?

Where would you store lazily computed hash?

@sopel39
In a non-volatile private int hashCode field, which technically would be mutable, but would not have apparent effects of mutations.

However, I feel that we should also have type-specific lazily computed implementations for RowType, ArrayType etc rather than falling back to TypeSignature every time.

@phd3 what would be the benefit?
fwiw RowType holds onto to final TypeSignature instance, so it doesn't really matter where you keep the value, does it?

@findepi If I understand correctly, the hashcode implementation you suggested means that for every type object, we calculate hashCode at most once and store it in a private field in TypeSignature object, which can be retrieved with hashCode() invocation.

Computing a hashcode naively for a type like row(a bigint,b array(bigint),c row(a bigint)) triggers computing a hash on the tree of TypeSignature and TypeSignatureParameter objects. Whereas, if we implement field-based hashcode for RowType, and element-type based hashcode for ArrayType, it's much simpler.

With precalculation, this doesn't matter much in terms of performance. So for the problem at hand, keeping this limited to TypeSignature class may be okay. But I feel that it might still be unnecessarily indirect for complex types.

I see where you're coming from. Yes, doing this in Type has the benefit it cuts the calculation earlier -- if i did ask for RowType.hashCode(), the elements hashes have been calculated by now and if i ask for one of them, i have the answer ready. Doing this in TypeSignature doesn't provide this benefit. Yet, I think it probably doesn't matter. I am guessing that the problem we're solving _is a problem_ only when we do those calculation over and over again, which won't be the case regardless which path we choose. Thus, I would prefer the simpler solution of two, unless we see a case where additional complexity brings some additional (not purely theoretical) benefit.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anismiles picture anismiles  路  3Comments

yourmain picture yourmain  路  4Comments

ChethanUK picture ChethanUK  路  4Comments

findepi picture findepi  路  4Comments

JamesRTaylor picture JamesRTaylor  路  5Comments