Sometimes, when starting instances simultaneously, services fail to discover each other and so several Hazelcast clusters are created and never merge. One piece of evidence to prove this is that all the different instances Hazelcast log include:
Members {size:1, ver:1} [
Member [10.68.6.109]:5701 - 693df0fa-63a9-4015-97ec-48feea1cf483 this
]
All of the other instances show a similar cluster size and this never gets updated. Another hint that this is happening at discovery is that if I restart 3 out of 4 instances, after about 1min they all suddenly join the older instance (cluster size goes to 4).
I have done a bit of investigation, and the critical code path seems to be in CacheConfiguration, where all the discovery happens at startup:
config.getNetworkConfig().setPort(5701);
config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true);
for (ServiceInstance instance : discoveryClient.getInstances(serviceId)) {
String clusterMember = instance.getHost() + ":5701";
log.debug("Adding Hazelcast (prod) cluster member {}", clusterMember);
config.getNetworkConfig().getJoin().getTcpIpConfig().addMember(clusterMember);
}
this code sets the instance's own Hazelcast network configuration, then iterates over the discovery client's other visible instances. Unfortunately, as this is quite early in the lifecycle of the microservice, other services *may* not have registered yet, so the clusters never join.
##### **Motivation for or Use Case**
For those using a second level cache, instances that do not join mean inconsistencies, possibly even security critical ones. For instance, a user's role may be removed, but eviction of the user's authorities only happens on that 'cluster'.
##### **Reproduce the error**
For me (perhaps my configuration is specific) this happens anytime I launch many instances simultaneously. I am using Kubernetes, so starting a deployment with several replicas is simple.
If just one instance is visible when the other instances are started, the problem goes away (e.g. start one instance, wait for it to be properly started, then increase replica count).
##### **Related issues**
##### **Suggest a Fix**
If others agree that this is an issue, we should decide a way forward. There are several considerations:
- Hazelcast supports a Kubernetes discovery mode, which uses Kubernetes API for service discovery. I think that implementing it may be the cleanest way for users on Kubernetes. See https://github.com/hazelcast/hazelcast-kubernetes for details
- Perhaps the JHipster Registry itself needs to guarantee that registration is only given to services once they are visible on all instances of the registry (not sure how it works to be honest!)
- There could be a scheduled runner, as a workaround, that regularly probes to ensure all instances visible in the Discovery Client are properly merged in a single Hazelcast cluster.
- My own temporary workaround: I added an Init Container that waits until at least one instance is live, then starts all the other instances. An ugly solution that works 90% of the time... but not always! I think the JHipster Registry's responses are not always reliable.
Once a decision is made, I'll be happy to help implement any changes.
##### **JHipster Version(s)**
v6.5.1
##### **JHipster configuration**
{ "generator-jhipster": { "promptValues": { "packageName": "redacted" }, "jhipsterVersion": "6.5.1", "applicationType": "microservice", "baseName": "simplifier", "packageName": "redacted", "packageFolder": "redacted", "serverPort": "9050", "authenticationType": "uaa", "uaaBaseName": "uaa", "cacheProvider": "hazelcast", "enableHibernateCache": true, "websocket": false, "databaseType": "no", "devDatabaseType": "no", "prodDatabaseType": "no", "searchEngine": false, "messageBroker": "kafka", "serviceDiscoveryType": "eureka", "buildTool": "gradle", "enableSwaggerCodegen": false, "testFrameworks": [], "jhiPrefix": "jhi", "entitySuffix": "", "dtoSuffix": "DTO", "otherModules": [], "enableTranslation": false, "clientPackageManager": "npm", "skipClient": true, "skipUserManagement": true, "blueprints": [], "embeddableLaunchScript": false, "directoryPath": "../", "dockerRepositoryName": "", "dockerPushCommand": "docker push", "kubernetesNamespace": "test", "kubernetesServiceType": "LoadBalancer", "ingressDomain": "", "monitoring": "no", "istio": false } }
There is a related issue https://github.com/jhipster/generator-jhipster/issues/9521 but not enough info was provided there.
Here are steps to reproduce locally through Docker with just the Registry and a Monolith (no database or client). I wrote these up a while ago but then suddenly couldn't reproduce it. It seems to happen more often with a new Registry container
.yo-rc.json file
{ "generator-jhipster": { "promptValues": { "packageName": "com.mycompany.myapp" }, "jhipsterVersion": "6.5.1", "applicationType": "monolith", "baseName": "jhipster", "packageName": "com.mycompany.myapp", "packageFolder": "com/mycompany/myapp", "serverPort": "8080", "authenticationType": "jwt", "cacheProvider": "hazelcast", "enableHibernateCache": false, "websocket": false, "databaseType": "no", "devDatabaseType": "no", "prodDatabaseType": "no", "searchEngine": false, "messageBroker": false, "serviceDiscoveryType": "eureka", "buildTool": "maven", "enableSwaggerCodegen": false, "jwtSecretKey": "Y2I3MzA3ZjIyM2MzOTI1YzZjNGUwODYxY2U5NjkzNGQ3NDc5YTUyNjBiYzlmZTY3YmRmMDVkMWVhNjg1NjM3ZGMzZWY1MWJhYmU2YzEzZjg3ZTgyZmI3NDYzYTEwMWRiZGQyN2JiYWUxMTE3MThhMDUxODM5YmU2NTU1MDY1ZGE=", "embeddableLaunchScript": false, "useSass": true, "clientPackageManager": "npm", "skipClient": true, "creationTimestamp": 1576104597610, "testFrameworks": [], "jhiPrefix": "jhi", "entitySuffix": "", "dtoSuffix": "DTO", "otherModules": [], "enableTranslation": false, "blueprints": [] } }
.yo-rc.json
ports
key and value from src/main/docker/app.yml
./mvnw package -Pprod verify jib:dockerBuild -DskipTests
docker-compose -f src/main/docker/app.yml up -d --scale jhipster-app=2
docker logs -f docker_jhipster-app_1
Members {size:1, ver:1} [
Member [172.18.0.4]:5701 - 07a5e6c6-0c9c-48cb-a8a2-bac717295c56 this
]
@ruddell Thanks for the minimal example, when you say it happens more often with a new Registry container, what do you mean?
If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.
Interesting. I haven’t noticed that, it’s consistent even without touching the registries.
On 12 Dec 2019, at 14:40, Jon Ruddell notifications@github.com wrote:

If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
I experienced similar issue with Hazelcast and Eureka service discovery. I got it working by replacing JHipster's code using Eureka to join Hazelcast cluster by code using hazelcast-eureka-one.
So, using hazelcast-kubernetes might help too in your case.
I'm adding a bounty on this, as it seems to be a real bug
Yes this important, and it's logical we have this kind of issue.
I'm currently working with the Hazelcast team (for my "real" work!!) - for the record, I'm having a look at https://github.com/hazelcast/hazelcast-azure so maybe I can find some idea.
Not sure we can easily find a generic solution.
In the case of Eureka, I am using metadata for host and port and I think that the main difference is in the way cluster join is hooked to Spring events. I followed the Hazelcast sample that injects an EurekaClient while JHipster's code uses a generic DiscoveryClient, it's supposed to make no difference but maybe the issue comes from Spring Cloud bootstrapping. My project is still in SB 1.5 so maybe it does no longer apply.
I tend to think that JHipster configuration code should follow the code samples provided by Hazelcast even though it requires adding specific dependency on hazelcast-eureka, hazelcast-azure or hazelcast-kubernetes. This configuration could easily be isolated in specific classes.
For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.
Sadly, it is easy to reproduce for me - in fact, it happens anytime I launch all instances simultaneously (e.g. kill all instances, wait, and start again).
Perhaps my configuration is slightly different though.
On 31 Dec 2019, at 16:56, Pierre Besson notifications@github.com wrote:

For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
I have the same problem with kubernetes. Even adding the dependency kubernetes-hazelcast
and configuring it. My current 'solution', I launch a replicas and once launched, I scale...
I plan to test with redis ..
This issue is stale because it has been open 30 days with no activity.
Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted.
We are accepting PRs :smiley:.
Comment or this will be closed in 7 days
I believe the issue is related to the readiness probes. Let's revisit this when the Spring Boot 2.3. release of JHipster is published.
This issue is stale because it has been open 30 days with no activity.
Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted.
We are accepting PRs :smiley:.
Comment or this will be closed in 7 days
Most helpful comment
In the case of Eureka, I am using metadata for host and port and I think that the main difference is in the way cluster join is hooked to Spring events. I followed the Hazelcast sample that injects an EurekaClient while JHipster's code uses a generic DiscoveryClient, it's supposed to make no difference but maybe the issue comes from Spring Cloud bootstrapping. My project is still in SB 1.5 so maybe it does no longer apply.
I tend to think that JHipster configuration code should follow the code samples provided by Hazelcast even though it requires adding specific dependency on hazelcast-eureka, hazelcast-azure or hazelcast-kubernetes. This configuration could easily be isolated in specific classes.