Generator-jhipster: ngx-couchbase intermittent Test Failures

Created on 10 Oct 2019  Β·  48Comments  Β·  Source: jhipster/generator-jhipster

Overview of the issue

This issue was created from the discussion at https://github.com/jhipster/generator-jhipster/issues/10589. It seems that ngx-couchbase tests on our CI seems to hang and fail sometimes. Here are some examples;

https://github.com/jhipster/generator-jhipster/pull/10498/checks?check_run_id=254018117

https://dev.azure.com/jhipster/jhipster/_build/results?buildId=6316

Motivation for or Use Case

Getting the CI tests in working condition without problems.

Reproduce the error

Submitting a PR or making a commit so that the CI is triggered might recreate the issue.

Related issues

10589

Suggest a Fix

Currently not known.

  • [x] Checking this box is mandatory (this is just to show you read everything)
$$ bug-bounty $$ $100 area couchbase

Most helpful comment

@SudharakaP Are you downgrading the couchbase java-client version?
The dependency is already defined in jhipster bom project : see https://github.com/jhipster/jhipster/blob/master/jhipster-dependencies/pom.xml#L274
Current version defined is 2.7.9 : https://github.com/jhipster/jhipster/blob/master/jhipster-dependencies/pom.xml#L46

All 48 comments

Thanks.
I'm adding a bounty on this as it is important to be fixed quickly.

@vishal423 @pascalgrimaud : I've just noticed that couchbase runs on the prod profile. Is there a specific reason to do so? :smile:

The best would be to test both: dev and prod
But it would be too much configurations to test.

There are no specific reason, that couchbase is tested in prod profile.
It could be in dev and Cassandra could be in prod

@pascalgrimaud : That is indeed interesting; because when you run in dev it works; https://github.com/SudharakaP/generator-jhipster/runs/255327289 :thinking:

Have to think more on why it runs correctly only in dev. :thinking: :thinking:

no idea. Never tested couchbase lol !

I can do some more tests today evening to find the root cause; but if someone wants to take this and fix it before that; please feel free to go ahead. πŸ˜„

@pascalgrimaud @vishal423 : I believe I found the problem here. It seems that the javadoc (via the maven javadoc plugin) runs on java 1.8.0_22; because we manually set JAVA_HOME in 00-init-env.sh. But notice that on Github Actions when Java 11 is installed (via the initial setup-java@v1) JAVA_HOME is already set to /opt/hostedtoolcache/Java/11.0.3/x64 and this value is overwritten by our 00-init-env.sh script and thus the javadoc ultimately uses the default java version installed in the build machine; which seems to be 1.8.0_22.

What we need to do is for GitHub CI not change the JAVA_HOME value. I've done some tests with this and it works. I will do this modification to the current PR soon. :smile:

Okay I've completed the changes. Let me know if you guys see any issues :smile:

I am not very sure on this solution. However, if it solves current issue, then, there is no harm to try this out.

@pascalgrimaud : I've created a pull request for the daily builds as well to replace it with OpenJDK instead of the default Zulu. It seems to work in my Azure builds.

@vishal423 : I don't think highly of this solution as well; but it does indeed seems to work; and my theory is that the JAVA_HOME needs to be set correctly when it comes to TestContainers. We shall see..... :thinking:

@SudharakaP, this theory doesn't explain on why it worked intermittently :smile:

@vishal423 : It does not. Then again it is also difficult to exactly predict what would happen when we set JAVA_HOME to something and use some other JDK in the CI. Therefore regardless whether the change fixes this issue completely or not; I think it should be made just so we use the same JDK for everything and JAVA_HOME is set property. I am waiting to see more PRs and what would happen.... :smile:

I was looking at these Couchbase issues more to find out the exact underlying problem; mainly because I felt there's something I am missing that is fundamentally common to both our daily builds problem and this one.

I found that we handle the docker image startup differently for Couchbase. I think the main problem lies in 20-docker-compose.sh where we use a different logic only building the image for Couchbase not starting it. This results in the following stack trace for both here and on the daily builds;

image

As seen there's not docker container up and running. This I believe is one of the things we should address for both of these CI pipelines.

@pascalgrimaud : Is there a specific reason that we are not able to use the CouchBase docker container like the others? I see the comment in 20-docker-compose.sh which suggests that the image is only built but not started due to problems with tests. However in my Github CI, I don't see any problems with running it; https://github.com/SudharakaP/generator-jhipster/runs/257408009

if I remember well, the problem with Couchbase was:

  • if you started the Couchbase database with docker-compose file, then, launch backend tests, it will fail as testcontainer will use the same port and can't start
  • it's not nice for the developer experience, but it was like this
  • it was probably 1 year ago, when Couchbase support was introduced, I don't know if it has been changed or not

@pascalgrimaud : Thanks for the clarification. So as you might have seen, with my recent change to OpenJDK the Java 8 build is now successful.

So I believe we've found a workaround without starting the Couchbase database with docker-compose. Is this good enough? Or we can also give it a try to reintroduce the docker-compose method of starting Couchbase since I don't see any failures in that method (at least in my Github CI; https://github.com/SudharakaP/generator-jhipster/runs/257408009). Maybe as you said Couchbase has changed since then. This solution would be more elegant and if we could make it work should work for both the daily build and here. However if we plan to do this we need to do some tests as I've noticed that sometimes things work on my Github CI but not in the JHipster CI pipeline.

My goal was to start all docker container before e2e, to speed up the builds
About Couchbase, if we can start the container before the tests, so yes, we should be consistent with other builds

@pascalgrimaud : Okay how about this. I will do a PR just putting back the code to start Couchbase database with docker-compose. I think this should work, since I don't understand why it wouldn't if it works on my fork's Github CI pipeline. We give it some time to see if it creates any problems, and if it doesn't work we revert that commit to fall back to this one. WDYT? :smile:

yes, let's try !

@pascalgrimaud : I've done three PRs. In the first two I've made OpenJDK 8 the default on Azure pipelines. Given our experience with Couchbase I don't think the Azul JDK that ships with Azure can be trusted, and we should test everything with OpenJDK.

My third PR reintroduces Couchbase docker start. We'll merge this and see how it goes. In the worst case we can just revert it back so that everything will work as usual :smile:

@pascalgrimaud : As you see the couchbase problem is reoccurring. However some of my investigation suggest that its not related to the docker container start but rather the PR, #10608. Could you please revert #10608 ? There's differences between azure ci and GitHub ci and I believe with testcontainers they tend to be finicky.

@pascalgrimaud : I've done the revert pull request. :smile:

I should explain a bit more here. That pr (#10608) was a mistake actually; this is okay for Azure builds but as I talked about earlier Github CI users the JAVA_HOME differently and that PR points to a wrong JAVA_HOME in the case of JAVA 8. My mistake; sorry about that. :smile:

I'm reopening this, as we don't have the real reason of the failures

See https://twitter.com/whichrich/status/1183348440803221504?s=19

I'll priorize it and will investigate.

@pascalgrimaud : Yes, it seems that #10609 gives doesn't work either I believe since the intermittent problem has reoccurred. So we'll revert this too if you don't mind; I've created the revert #10614

This will in my opinion give a stable build with my workaround for the time being. Somehow the end to end test starts failing here as soon as we introduce docker start. This is very strange given that it works on my Github CI.

And thanks for linking that issue, I've put my comment there as well linking to this issue. :smile:

Just tried tonight and I manage to reproduce the issue only 1 time.
Then, in Azure Daily builds:

cc @tchlyah if you can help here
For me, I gave up, as I'm not a Couchbase user

If it's not fixed, another solution would be to simply remove Couchbase builds from official CI here.

@pascalgrimaud : The one that failed yesterday (which you quoted) I think doesn't count since it's caused by our testing, #10609. Just to make it absolutely clear, can we restart that build several times to see if it happens again?

Ok, restarting the Official.openjdk8 right now

@pascalgrimaud : So it seems my fix on the Azure pipeline with JDK 8 didn't work; it's still flaky. I will try to recreate this issue on the testcontainers couchbase module itself to see if we can find the problem. :smile:

@pascalgrimaud : I have done a number of tests on Couchbase with JDK 8 and it seems to me that the culprit here is a Couchbase not starting due to memory for the container not being adequate.

I've created a PR above and if you aren't too tired with my countless requests to restart the daily builds; could you please merge that and restart it. :smile:

I have a very strong feeling this will work. :smile_cat:

@SudharakaP : do you want me to merge #10628, then restart the daily builds for OpenJDK8.Official ? Am I correct ?

@pascalgrimaud : Correct. :+1:

Also I would like to restart it several times if the first time is successful; to see if it's consistent.

@SudharakaP, I have noticed a few things while looking into this failure (maybe these can help):

  1. Couchbase added Ubuntu 18.04 support in the v6.0.1. It would be good to try to upgrade the couchbase image (considering we already have 18.04 in our build infrastructure). I do see some couchbase image upgrade attempts by @DanielFran, but, not sure on why they were reverted.
  2. Couchbase has recently released v6.0.3. We can try to see if that helps to improve the situation on this part.
  3. Every time the build stalled, I have observed below message:

{"app_name":"travisCouchbaseNoCache","app_port":"-1","level":"WARN","logger_name":"c.c.c.c.e.Endpoint","message":"[localhost:32775][KeyValueEndpoint]: Authentication Failure: Bucket not found on Select Bucket command","thread_name":"cb-io-1-2","timestamp":"2019-10-16T03:40:25.596Z"}

  1. I believe test container support was tested with couchbase v5.5.1. I don't see any change on that part, so, not very sure if that code will work flawlessly with the latest couchbase images.

@SudharakaP : the first run failed :-(

@vishal423 I try to upgrade to this version but it was not working...

@vishal423 : Thanks for the information. :smile: Let me try those; but I am baffled, because I could recreate this scenario on my CI where for Java 8 fails intermittently. It used to fail for Java 11 as well but the JAVA_HOME seems to have fixed it. Then I increased the memoryQuota and it started running successfully not only just once but almost all the time (I attempted like 10 times and all were successful). As soon as I made a pull request it fails. Its quite an interesting problem. Let me try your suggestions to see if we can do something.

@pascalgrimaud : We'll do some incremental changes.

As @vishal423 suggested;

  1. Couchbase added Ubuntu 18.04 support in the v6.0.1. It would be good to try to upgrade the couchbase image (considering we already have 18.04 in our build infrastructure). I do see some couchbase image upgrade attempts by @DanielFran, but, not sure on why they were reverted.

However I see a dependency here. On Github CI we run Couchbase on ubuntu-latest but on Azure Java 8 we run it on Ubuntu 16.04. I've created a pull request to update Azure to be the same as our Github CI. If this doesn't work I'll look into upgrading the Couchbase images.

@pascalgrimaud : I have a more refined fix after investigating thoroughly what @vishal423 mentioned in his third point; I didn't notice this before. Thanks @vishal423 Γ°ΕΈΛœβ€ž Γ°ΕΈΛœβ€ž

Every time the build stalled, I have observed below message:
{"app_name":"travisCouchbaseNoCache","app_port":"-1","level":"WARN","logger_name":"c.c.c.c.e.Endpoint","message":"[localhost:32775][KeyValueEndpoint]: Authentication Failure: Bucket not found on Select Bucket command","thread_name":"cb-io-1-2","timestamp":"2019-10-16T03:40:25.596Z"}

Searching through the internet I found this; https://forums.couchbase.com/t/java-sdk-can-not-access-query-node-in-couchbase-5-0/14855 which I believe seems to be our issue (or at least part of it). This leads to the following fix; https://issues.couchbase.com/browse/JVMCBC-564.

Now notice that we use com.couchbase.client version 2.5.9. But the above change went live in version 2.6.2 as per; https://docs.couchbase.com/java-sdk/current/sdk-release-notes.html#version-2-6-2-4-september-2018

Therefore I've created the above pull request updating our couchbase java-client version. πŸ˜„

Please feel free to merge this and we can do some testing as before when you have time. πŸ˜„ πŸ˜„

@SudharakaP Are you downgrading the couchbase java-client version?
The dependency is already defined in jhipster bom project : see https://github.com/jhipster/jhipster/blob/master/jhipster-dependencies/pom.xml#L274
Current version defined is 2.7.9 : https://github.com/jhipster/jhipster/blob/master/jhipster-dependencies/pom.xml#L46

@DanielFran : Thanks for pointing out; you are correct. I didn't even knew we had a separate repository for the dependency versions. Closed the Pull request. Sorry about that. πŸ˜•

Hello, sorry guys I couldn't reproduce the problem on my end, I've updated Couchmove and Couchbase image version, but I don't think it will change anything

@tchlyah : Thanks for trying though; we'll know for sure when the daily builds run several times. :smile: Did you try with Java 8? This is where it fails; java 11 works almost all the time with our recent changes.

Also have a feeling this might be related to a timeout similar to https://github.com/testcontainers/testcontainers-java/issues/715; which I've pointed out in; https://github.com/testcontainers/testcontainers-java/issues/1453#issuecomment-541930826. Do you think this might be worth trying. I could do a pull request on that repository if you think this is worthwhile. :smile:

I'm reopening this, and wait few days our daily builds before closing this definitely

@pascalgrimaud : Being nit-picky, but I noticed that yesterday the Official.JDK8 scheduled build didn't run for some reason. :thinking: :thinking:

I'm restarting it, as you need to see the result, right ?

@pascalgrimaud : Thanks for restarting. I was just curious why it didn't start in the first place. :smile:

After our fixes and upgrading to Ubuntu 18, Java 8 builds are fairly stable now I believe; although I think once in a while you might see a failure (this is common to Java 11 couch-base as well). Notice that in the final 8 runs in our daily builds only 1 time the couch-base tests timed out; this is the same as our Github CI builds (which also timed out once during the last 7 or 8 runs).

Also I am not sure about the linked issue; https://github.com/testcontainers/testcontainers-java/issues/1453 addresses our exact issue; I think there's multiple points of failures in the test-containers couch-base module.

Given the reasonably stable nature, maybe we should close this issue off? Or is there anything we could do more? πŸ˜„

Ok, let's close this.
Plz @SudharakaP, claim the bounty, it's well deserved !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Steven-Garcia picture Steven-Garcia  Β·  3Comments

sdoxsee picture sdoxsee  Β·  4Comments

chegola picture chegola  Β·  4Comments

SudharakaP picture SudharakaP  Β·  3Comments

kaidohallik picture kaidohallik  Β·  3Comments