Generator-jhipster: All hashcode methods return 31

Created on 16 Jul 2019  ·  19Comments  ·  Source: jhipster/generator-jhipster

Overview of the issue

All hashcode methods on generated entities return 31.

Motivation for or Use Case

yeah...

Reproduce the error

just generate a project.

Related issues
Suggest a Fix

JHipster Version(s)
[email protected] /Users/orirabi/git-repos/dummy/clean-gen
└── [email protected] 

##### **JHipster configuration, a `.yo-rc.json` file generated in the root folder**
.yo-rc.json file
{
  "generator-jhipster": {
    "promptValues": {
      "packageName": "com.mycompany.myapp"
    },
    "jhipsterVersion": "6.1.2",
    "applicationType": "monolith",
    "baseName": "jhipster",
    "packageName": "com.mycompany.myapp",
    "packageFolder": "com/mycompany/myapp",
    "serverPort": "8080",
    "authenticationType": "jwt",
    "cacheProvider": "no",
    "websocket": false,
    "databaseType": "sql",
    "devDatabaseType": "mysql",
    "prodDatabaseType": "mysql",
    "searchEngine": false,
    "messageBroker": false,
    "serviceDiscoveryType": false,
    "buildTool": "gradle",
    "enableSwaggerCodegen": false,
    "jwtSecretKey": "bXktc2VjcmV0LXRva2VuLXRvLWNoYW5nZS1pbi1wcm9kdWN0aW9uLWFuZC10by1rZWVwLWluLWEtc2VjdXJlLXBsYWNl",
    "useSass": true,
    "clientPackageManager": "npm",
    "skipClient": true,
    "testFrameworks": [],
    "jhiPrefix": "jhi",
    "entitySuffix": "",
    "dtoSuffix": "DTO",
    "otherModules": [],
    "enableTranslation": false,
    "enableHibernateCache": false
  }
}

JDL for the Entity configuration(s) entityName.json files generated in the .jhipster directory


JDL entity definitions

entity Book {
  name String
}
paginate Book with pagination
service Book with serviceClass
filter Book

Environment and Tools

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

git version 2.21.0 (Apple Git-120)

node: v8.12.0

npm: 6.4.1

Docker version 18.09.2, build 6247962

docker-compose version 1.23.2, build 1110ad01

Browsers and Operating System

Irrelevant.

  • [x] Checking this box is mandatory (this is just to show you read everything)

Most helpful comment

Two things:

  1. I just had the case where I used a lot of JHipster entities (50k, async processing of large datasets) first as a Map key and later I put them in a Set. Implementing a useful hashcode() method reduced the Map step from 5min to 0.3s! Returning a constant reduces every hash collection to a LinkedList with linear search. Should be a no-go.

  2. Using Objects::hashcode for entities as a very bad idea. Since two instances representing the same DB line will have different hash codes. Don't do that! HashSets and HashMaps will not be slow, but will stop working as intended.

All 19 comments

sorry, just saw the conversation here: https://github.com/jhipster/generator-jhipster/issues/8656

I see that we put a fix hash code also for Cassandra. Since with Cassandra, the id is not generated by the DB, it would probably be better to use Objects::hashcode there.

Two things:

  1. I just had the case where I used a lot of JHipster entities (50k, async processing of large datasets) first as a Map key and later I put them in a Set. Implementing a useful hashcode() method reduced the Map step from 5min to 0.3s! Returning a constant reduces every hash collection to a LinkedList with linear search. Should be a no-go.

  2. Using Objects::hashcode for entities as a very bad idea. Since two instances representing the same DB line will have different hash codes. Don't do that! HashSets and HashMaps will not be slow, but will stop working as intended.

I just had the case where I used a lot of JHipster entities (50k, async processing of large datasets) first as a Map key and later I put them in a Set. Implementing a useful hashcode() method reduced the Map step from 5min to 0.3s! Returning a constant reduces every hash collection to a LinkedList with linear search. Should be a no-go.

Processing large data sets in-memory versus doing that in the DB is often a code smell. Why would you want to fetch 50K rows only to aggregate them in the application when almost all RDBMS (e.g., Oracle, SQL Server, PostgreSQL, MySQL) support Window Functions, CTE, Recursive queries, and many other features that can help you for analytics tasks.

Could you elaborate on your use case? I'm curious because I've never seen a use case in an OLTP or batch processing task where you need to fetch and change more than 50k records in a single database transaction. Usually, you do that in smaller batches to avoid long-running transactions or saturating the IO resources (disk, network).

Hi @vladmihalcea & @DerJochen,

I was waiting for such a remark.

"31" is hard to understand at first sight, and make people wonder "why", so I should have commented this line with the link to @vladmihalcea's post.

For rare use cases where lots of entities are processed in collections, I think that you can write specific classes, override hashCode and achieve your goals while keeping a constant hashCode on the base Entity class.

I all my projects, I'm now using the class hashCode, so that each entity type has a different hashCode, and there is no hard-coded random integer.
@Override public int hashCode() { return getClass().hashCode(); }

What do you think about that ?

You can surely use the associated Java Class hashCode as that's going to be a constant as well.

I think your approach is more elegant than the hard-coded value.

I also updated my GH repository and my articles to include this change so that's going to be consistent with your change.

Its only going to actually help with performance if the collections consists of objects of mostly varying types though. Performance will still be linear at best with the maxiumum frequency of any concrete type in the collection.

So it only helps in a very specific use-case.

Beyond that it just eliminates a magic constant ;-)

@jwgmeligmeyling People have been constantly asking me "why 31" on my blog, so I can relate with @pblanchardie. This change will make the implementation more idiomatic and raise fewer eyebrows.

Fewer raised eyebrows also means that people will be less aware of the hashcode side effects and what the workaround is there for. Even with the getClass().hashCode() implementation you probably still want to put a comment near it anyway. People might otherwise replace the implementation with a "smarter"/"faster" only to run into the same old bugs ;-) People should just read the article about it and if still unclear ask one or two questions about it.

And in fact, when dealing with very large collections, you may want to use an ID based hashcode function instead. Only be sure to never add an unmanaged/unpersisted entity to a collection when you do so.

That's a good observation. In my articles, there's also the explanation. So, a comment above the return call is a good idea.

@jwgmeligmeyling About this comment:

Only be sure to never add an unmanaged/unpersisted entity to a collection when you do so.

I think this a common scenario when doing merging. Someone is loading the entity with a collection. Then, while the entity is detached, they add new child entities (so their id is null), update some child entities, and remove others, and then they call merge on the parent entity. So, it's not bad if unmanaged entities are passed to a collection.

I never expected this to pop-up again in my mailbox...

IMHO the best generic hashCode() method for an entity class template is one that uses getClass().hashCode() and combines it with the ID (if not null) or Object.hashCode() (if ID is null).

This is quite fast and will give a nice spread of hashcodes. It might be a problem if multiple unpersisted instances of the same entity exist - but I cannot think of a scenario where that would be useful.

IMHO the best generic hashCode() method for an entity class template is one that uses getClass().hashCode() and combines it with the ID (if not null) or Object.hashCode() (if ID is null).

The situation where this will not work if you have a managed entity with a Set collection where the child has a @GeneratedValue identifier.

Now, when you add a child entity to it and call flush, you won't be able to find the child entity as the hash of the id has changed. The only way to make it work is to call persist on the entity before adding the child entity to the collection.

Processing large data sets in-memory versus doing that in the DB is often a code smell. Why would you want to fetch 50K rows only to aggregate them in the application when almost all RDBMS (e.g., Oracle, SQL Server, PostgreSQL, MySQL) support Window Functions, CTE, Recursive queries, and many other features that can help you for analytics tasks.

Could you elaborate on your use case? I'm curious because I've never seen a use case in an OLTP or batch processing task where you need to fetch and change more than 50k records in a single database transaction. Usually, you do that in smaller batches to avoid long-running transactions or saturating the IO resources (disk, network).

Sorry I did not answer this when you ask. I agree that normally you should not load a large amount of entities and put them in a hash collection. In that case (as far as I remember now) I was processing a very large amount of entities for a reporting system. The 50k was not the "very large amount" but basically a key for even more child entities. So loading them in memory and using them to aggregate data below them was a sound solution.

And with a useful hashCode() method it also was quite fast. ;-)

IMHO the best generic hashCode() method for an entity class template is one that uses getClass().hashCode() and combines it with the ID (if not null) or Object.hashCode() (if ID is null).

The situation where this will not work if you have a managed entity with a Set collection where the child has a @GeneratedValue identifier.

Now, when you add a child entity to it and call flush, you won't be able to find the child entity as the hash of the id has changed. The only way to make it work is to call persist on the entity before adding the child entity to the collection.

True. But there is a lower limit of what developers should know about the collections they use. And if you change the hashcode relevant values while the entity is in a hash collection it basically will be lost. And, like you said, if you know what you are doing there is a non-hack solution how to handle that correctly.

@DerJochen 50k might be just fine for a DB having a lot of hardware resources available. The root cause comes from not having a fixed limit on the query result set. If data keeps on accumulating, what was working fine some time ago, now works slower and slower. And, unfortunately, this is a very common performance issue.

If the report always fetches the same amount of data and the performance is good, then, of course, there's nothing to worry about. However, I also witnessed how an SQL query doing the processing in the DB outperformed my Java aggregation when data was fairly large, so I think it's good for people to take both options into consideration and based on their requirements pick the right one for their particular system.

@vladmihalcea Correct again. At the time we weight our options and got an estimate from our customer of how many entities would be added per month/year. Going by that the in-memory solution would work just fine for a long time. I am a bit vage, since I have not been working in that project for over a year now.

Also, the report logic "grew" over a decade or two, so it was quite complicated with a lot of exceptions. Implementing that in SQL would have been a bad idea. I like to select and filter in SQL (with Spring Repositories, I even try to skip writing SQL completely), but IMHO (pre-)processing in SQL leads to code that is harder to test and to maintain.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sdoxsee picture sdoxsee  ·  4Comments

pascalgrimaud picture pascalgrimaud  ·  3Comments

frantzynicolas picture frantzynicolas  ·  3Comments

pascalgrimaud picture pascalgrimaud  ·  4Comments

ahmedeldeeb25 picture ahmedeldeeb25  ·  3Comments