Quarkus: Add equals/ hashcode to PanacheMongoEntity

Created on 6 Feb 2020 · 16Comments · Source: quarkusio/quarkus

Description

I would like it if the PanacheMongoEntity would have its equals and hashcode methods implemented, in order to more easily compare Mongo entity objects that utilize this class. Without the methods defined by the panache entity class, it makes using lombok to simplify the code written for extending classes more difficult.

Implementation ideas

I wrote my own wrapper class for PanacheMongoEntity as a stopgap, and the following seems to work: (the equals implementation is pulled from a delombok, and I now realize is missing a null check on the id, but it's essentially what I would say we want)

public abstract class OurMongoEntity extends PanacheMongoEntity {
    @Override
    public int hashCode() {
        return Objects.hash(this.id);
    }

    @Override
    public boolean equals(Object o) {
        if (o == this) return true;
        if (!(o instanceof User)) return false;
        final User other = (User) o;
        if (!other.canEqual((Object) this)) return false;
        if (!this.id.equals(other.id)) return false;
        return true;
    }
}

It seems like a fairly trivial thing to add to me, having not actually looked at the panache entity class. I can probably add it in if it is determined to be something we want to add.

arepanache kinenhancement triagwontfix

Source

GregJohnStewart

Most helpful comment

So what would the consensus be here? (Just a bump, and curious to see where this goes)

I don't what this applied to JPA entities, but I have no opinion regarding DTO's going into MongoDB - mainly as I have no experience with it, I can't speak about the implications in that case.

If this is only about PanacheMongoEntity as the title states, I think it should be @loicmathieu to decide.

@nimo23 I agree in some situations using UUID generators is adviseable, but again I wouldn't make it an absolute rule: generating secure ids has a cost, and if applied to all entities on a system it would quickly lead to exhaustion of the random pool.
Sequences are commonly supported (now even MysQL has them!); but they need a DB roundtrip to the DB - a blocking operation for each object being stored - this is usually not a problem when one of Hibernate's sequence generation optimisers can be applied (hi/lo) but their semantics breaks absolute ordering...

_just mentioning some more reasons why such details are better left to the user to make an explicit choice - in the case of Hibernate._

Sanne on 9 Feb 2020

👍2

All 16 comments

@FroMage WDYT ?
If we do it, we will do it also for Hibernate with Panache ...

According to be, providing default hashcode/equals for PanacheEntity and PanacheMongoEntity is easy but as the ID are only provided after the entity was persisted to the database, this means the hashcode and equals method will always return false when the entity is created but not yet persisted. This can be very surprising in some cases ...

loicmathieu on 7 Feb 2020

I think @Sanne had an opinion about hashCode/equals for entities, but I don't remember it. I've never ever needed it myself because I've never had to deal with collections of unpersisted entities, or multiple instances of the same persisted entity.

FroMage on 7 Feb 2020

👍1

indeed, the question of how to properly implement entity equality is a subject of strong debates.

Personally I strongly believe that entities managed by Hibernate ORM are best compared via object identity: having no overrides for equals/hashcode is the only solution that gives no strong headaches to users - provided they have no "special needs".

The problem with those users who do have "special needs" most often will also require a customization of such equality contracts, so I'd say it's best to not generate any at all.

Identity is correctly managed automatically for managed entities; it also so happens that the default identity is the most efficient.

However my thoughts mostly apply to JPA entities, as they are statuful and actually triggering lazy loading is not something you want from such a method. I guess PanacheMongo might be a different context.

Question: does PanacheMongo deal with de-duplication of "same instance" in complex object graphs?

Also: the missing null check on the ID is actually not that trivial - in JPA at least. Suppose you have multiple objects into some collection, it's unclear if multiple objects having id set to null are referring to "the same" or not, and if they are "the same", which state is the right one to be written?

Sanne on 7 Feb 2020

Also like @FroMage said, it's all really a lot easier when dealing exclusively with bags of persisted entities, so that one can stop being bothered by the devilish details of getting the equality contract wrong.

Sanne on 7 Feb 2020

Question: does PanacheMongo deal with de-duplication of "same instance" in complex object graphs?

@Sanne The answer is no.

MongoDB with Panache is stateless, so identity equals is less usable.

loicmathieu on 7 Feb 2020

the missing null check on the ID is actually not that trivial - in JPA at least. Suppose you have multiple objects into some collection, it's unclear if multiple objects having id set to null are referring to "the same" or not, and if they are "the same", which state is the right one to be written

I always provide equals/hashCode in jpa entities because I always use Set for my relationships instead of Lists. Sets does not allow duplicates, but with lists you must care of that for example with something like this:

@OneToMany
@Fetch(FetchMode.SELECT) // avoids duplicates
private List<User> users = new ArrayList<>();

But with Sets you need a propert equals/hashCode-contract.

And if the entity has natural identifiers (for example, all unique fields or IDs) then I put them into my equals/hashCode. The point is: equals/hashCode-contract of an entity should only rely on non-changing data (non-changing properties or "immutable business keys").

If the ID is generated by the database, then ID is null before commiting - so the only solution is to provide those natural identifiers in hashCode/equals. If you dont have those identifiers then the solution is to use the one described in

https://vladmihalcea.com/the-best-way-to-implement-equals-hashcode-and-tostring-with-jpa-and-hibernate/ and https://vladmihalcea.com/how-to-implement-equals-and-hashcode-using-the-jpa-entity-identifier/ with something like:

// use this only if you cannot provide any immutable business keys or if the id is generated from the database
@Override
    public boolean equals(Object o) {
        if (this == obj) return true;
    if (obj == null || getClass() != obj.getClass()) return false;
        final User other = (User) u;
        return id != null && id.equals(other.getId());
    }
@Override
public int hashCode() {
        return 31;
}

If PanacheEntity should simplify this then it should maybe consider to have another ID-generation plan by default. Actually, I think its @Sequence to the IDs are generated by the database. If PanacheEntity provides id generation within application, for example:

@Id
// or other id generatings based on SecureRandom
private Integer id = UUID.randomUUID().toString()

then it makes sense to add equals/hashCode for those entities by default. So user does not need to add it manually. This would save a lot of code for users needing those equals/hashCode-things..

nimo23 on 7 Feb 2020

as I said, I realize that for JPA to discuss what is "the right way" of doing equals/hashcode is always controversial.

I strongly believe it's a recipe for disaster to work with non-persisted and non-managed entities, but sure there's various ways to use it. Basing things on Sequence like @nimo23 mentions is a good idea, but it doesn't apply to all databases.

Which is precisely why I believe the framework _should not_ generate such methods, as it's important for the users to understand the implications - while the default methods are great for many use cases as they are safe.

@loicmathieu

Question: does PanacheMongo deal with de-duplication of "same instance" in complex object graphs?

@Sanne The answer is no.

MongoDB with Panache is stateless, so identity equals is less usable.

Ok so this might be less important for MongoDB. It's not clear to me what would be best in this case, I just don't want those methods generated by default for JPA :)

I'd be cautious though, it doesn't seem to bring much value. Isn't it the point to allow people to use any object they want?

If we are to generate the full code of any DTO, one might as well expose a different language for people to define their structures, and then generate the whole POJO+properties+equals noise

Sanne on 7 Feb 2020

Isn't it the point to allow people to use any object they want

Yes. I also think that this makes more sense.

@Sanne one subjective question: The user does not need to use the database ID generation which is the default for PanacheEntity. Do you think that client ID generators (for example, by UUID) makes more sense? With client ID generation, the user does only need to provide the id-property within equals/hashCode, nothing else. I am thinking of removing all my equals/hashCode and Sequence-ID-generators and only use a custom client side ID generator with the help of SecureRandom. Do you think that this is a good idea?

nimo23 on 7 Feb 2020

So what would the consensus be here? (Just a bump, and curious to see where this goes)

GregJohnStewart on 9 Feb 2020

So what would the consensus be here? (Just a bump, and curious to see where this goes)

I don't what this applied to JPA entities, but I have no opinion regarding DTO's going into MongoDB - mainly as I have no experience with it, I can't speak about the implications in that case.

If this is only about PanacheMongoEntity as the title states, I think it should be @loicmathieu to decide.

_just mentioning some more reasons why such details are better left to the user to make an explicit choice - in the case of Hibernate._

Sanne on 9 Feb 2020

👍2

First, a digression on UUIDs. UUID are good identifiers because they are client generated (so no risk of contention on INSERT on the database as with database generated sequence) and random so safer to use (a sequence is predictable so a user can fetch the entire database easily).
Concern about performance and entropy needs to be taken seriously if you plan to generate a lot of them (on modern JVM/OS this should not be an issue ... or just google the issue and you will find workarounds).

Regarding default equals/hascode on PanacheMongoEntity I have serious concerns as even if it can be useful for 80% use cases I still think, as @Sanne said, that it's better to let the user define equals/hashcode when needed.

Default equals/hascode in Java are identity based (based on the reference identifier of the object), when a user defines its own, it needs to implement it based on the meaning of equality for this particular type.
I agree that in 80% of cases it could be the ID field as defined in PanacheMongoEntity but sometimes not (you can have duplicates documents in a collection for example).
And as the ID field is only populated at insertion time, the equals/hascode will be incorrect for not yet persisted entity.

That's why I'm not in favor of defining default equals/hascode of PanacheMongoEntity as I didn't see a way to define it that will be ubiquitous.

But I propose to let this issue open for discussion, maybe I'm wrong :)

loicmathieu on 10 Feb 2020

👍1

Concern about performance and entropy needs to be taken seriously if you plan to generate a lot of them

Yes, this is one of the most important things when using cliend id generation. It does not need to be an UUID, look at https://neilmadden.blog/2018/08/30/moving-away-from-uuids/. Combining this with pooling (such as the pooling which does hibernate with sequence and its allocationSize?) or by using ThreadLocal on a static class level, for example:

// could introduce a thread local memory leak
private static final ThreadLocal<SecureRandom> RANDOM = new ThreadLocal<>();

the performance can be improved. (I dont know if quarkus makes internally use of any UUID for identifying instances or the like.)

And as the ID field is only populated at insertion time

Only if you use sequence generators. The ID is available before database insertion time, if you use something like this:

// custom performane wise generation of ID
@Id
private String id = SecureRandomString.generate();

@Id
private Integer id = UUID.randomUUID().toString();

the id-value is available immediatly after object creation or when user calls persist().

If user overrides the id-generation provided by PanacheEntity, then the user should also override equals/hashCode. If user takes the id-generation from its parent PanacheEntity and PanacheEntity makes use of such client id generation, then PanacheEntity can provide correct default equals/hashCode-implementation based only on id-property.

Btw, I think the best to avoid the id-generation is to provide composite Ids based on business keys. And this can only be done by the user.

nimo23 on 10 Feb 2020

right, ids being assigned explicitly by the user are the best :)
But sometimes it's not practical.

In conclusion, looks like we all agree that this is an area in which one solution doesn't fit all needs.

I hope you don't mind @GregJohnStewart , but I'd close this. Honestly, might be great to completely abstract how the "user domain" maps to objects, but we're not ready for it yet :)
Sometimes I wonder if high-producitivity frameworks like Quarkus shouldn't steer away from Java POJOs for domain representation and use some meta-description from which to generate the "right details" based on all details we can gather; but that's probably a discussion for a different issue as there's many more such details that need to be _universally_ solved.

Sanne on 10 Feb 2020

All good! The issue was deeper than I imagined. Not a big deal, thought I would suggest. Thanks for all your time!

GregJohnStewart on 11 Feb 2020

👍1

Hi @all,
I'm sorry, I didn't know about this issue. I know it is already closed, but I think it good to have the equals and hashcode methods on the PanacheMongoEntity, because it can be cached or worked with another feature that does not guarantee the same memory address to the same entity. Now, If we use cache and read an entity from cache, the equals and hashcode will be different to the same entity, as the memory address are different. I identify it in the PanacheEntity and I sent a PR as well.

I agree that in 80% of cases it could be the ID field as defined in PanacheMongoEntity but sometimes not (you can have duplicates documents in a collection for example).

Then 80% will not need implement it and can use the default, and when needed, the 20% can override it. It turns the PanacheMongoEntity to be cached as default.

The PR: #8241

rhuan080 on 29 Mar 2020

hi @rhuan080 sorry but I disagree, it's not merely about looking at "helping the majority of use cases": adding a default equals / hashcode is at most a _convenience_ for _some_ people, while providing a wrong implementation would be cause of serious bugs.

Such bugs are also hard to identify, while when encouraging people to implement their own, people have at least to think about what they are doing.

What are you doing with such a cache? It doesn't sound sensible to use such a mutable object as a key for any map - especially not if it's a map meant to share state with other threads. For any cache usage, use an immutable identifier - the ID of the object could be a good candidate (assuming it's immutable), otherwise you'll have the framework extract an immutable tuple.

Sanne on 30 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings