Testcontainers-java: CouchbaseContainer high CPU usage when more than one bucket is created

Created on 28 May 2020  路  17Comments  路  Source: testcontainers/testcontainers-java

We are looking to replace our custom couchbase docker image (handles node setup, bucket creation, user provisioning, etc..) that we use for our internal testing with the stock image and CouchbaseContainer from this project.

However we've noticed a blocker for us, when creating more than one bucket, CPU usage in the container is consistenty high, on my setup (MBP 2018, 32GB with 6 cores) two buckets cause memcached to use 75% cpu and three buckets casue it to use 100+% cpu..

This does not happen with our custom container, which is using the exact same base image + custom startup script to configure everything via the CLI.

I have tried to dig though how CouchbaseContainer sets everything up via REST APIs but sometimes it's not obvious the relationship to the CLI tools..

Something in the configuration of Couchbase is obviously different and the cuprit here..

With three buckets, if you try and create some indices, you'll also get weird errors about the indexer running out of memory.. even though in our env there are about 10 small documents in each bucket.

It should be noted that these indices work fine on our custom image and we have the indexer memory set to 256mb vs the default here of 512mb..

You might say that this is an issue with Couchbase and not the TestContainers wrapper, but I think if out of the box creating more than a single bucket causes this issue, then something about how the TestContainers wrapper does it needs to be fixed..

[user:info,2020-05-28T02:59:42.942Z,[email protected]:<0.27074.1>:menelaus_web_alerts_srv:global_alert:119]Warning: approaching max index RAM. Indexer RAM on node "192.168.48.3" is 100%, which is at or above the threshold of 75%.
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                           
  348 couchba+  20   0 2534520 123848  12540 S 115.7  1.2  34:41.65 memcached                                                                                         
  394 couchba+  20   0 1830936 229840  21812 S  27.3  2.3   7:30.09 indexer                                                                                           
  167 couchba+  20   0 4525784 465760   6940 S  27.0  4.6  11:55.94 beam.smp 

Code to reproduce:

        final CouchbaseContainer couchbase = new CouchbaseContainer()
                .withReuse(true)
                .withNetwork(Network.SHARED)
                .withBucket(new BucketDefinition("test1")
                        .withPrimaryIndex(false))
                .withBucket(new BucketDefinition("test2")
                        .withPrimaryIndex(false))
                .withBucket(new BucketDefinition("test3")
                        .withPrimaryIndex(false));
        couchbase.getNetworkAliases().clear();
        couchbase.start();

Most helpful comment

memached

Best. Typo. Ever. 馃槀 馃槀 馃槀

All 17 comments

cc @daschl

Indeed, if this is causing issues we need to fix it. Would it be possible that you share your custom script with the couchbase-cli invocations so we can compare them to the rest equivalents?

We've recently updated from 6.0.0 to 6.5.0, so I'm going to go back and double check we don't see the issue with 6.5.0 incase it is a couchbase issue, but then again all our upgraded stuff works without issue so far..

Dockerfile.txt
setup.sh.txt

@daschl ok, so just double checked, 6.0.0 definitely uses less CPU than 6.5.0, however the indexer getting out of memory does not happen with our Dockerfile+script that seems to be a TestContainer specific thing.

We have couchbase support so I might raise the CPU issue with them too.

Note that with your code I can also see that the creation of the 3 buckets worked, but memcached cpu is at 100% without any actual ops going on. One other difference is that you created with 0 replicas - I'll see if tweaking the default settings causes a change in behavior on the server side

@aaronjwhiteside quick question - what is your docker host OS ?

to clarify: I guess OSX from the MBP, but do you also see it i.e. on linux as a host?

@aaronjwhiteside if I switch back to 6.0.3 I do not see the cpu increase (memached at 13%)

        final CouchbaseContainer couchbase = new CouchbaseContainer("couchbase/server:6.0.3")
            .withReuse(true)
            .withNetwork(Network.SHARED)
            .withBucket(new BucketDefinition("test1")
                .withPrimaryIndex(false))
            .withBucket(new BucketDefinition("test2")
                .withPrimaryIndex(false))
            .withBucket(new BucketDefinition("test3")
                .withPrimaryIndex(false));
        couchbase.getNetworkAliases().clear();
        couchbase.start();

if you run it like this, do you also see high cpu?

Yep OSX, I haven't tried on Linux, I'd have to spin up a VM for that, may take me a little bit, company laptop restrictions and all..

memached

Best. Typo. Ever. 馃槀 馃槀 馃槀

Confirmed lower CPU usage with 6.0.3

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                           
  157 couchba+  20   0 1886692 186172   5856 S  31.2  1.8   0:29.30 beam.smp                                                                                          
  259 couchba+  20   0 2420612 338080  20960 S  15.0  3.3   0:09.95 memcached  

I was definitely comparing our custom dockerfile+scripts with 6.0.0 to the default 6.5.0 from TestContainers, I think I'm happy to leave that alone in this GH issue and raise a Couchbase support ticket for it instead.

I'll retest with 6.0.3 and see if we get the indexer issue with TestContainers and report back here..

Thanks for your time guys!

After initial troubleshooting this seems to have to do with a new background job that times out / cleans up SyncWrite jobs (for the new sync durability feature in 6.5). On native linux the overhead has been tuned to less than 1%, but something is odd here inside the docker environment. Definitely raise a support ticket if this is important to you so it will be looked at.

I'll also try to figure out if there is a workaround in the meantime we can apply, but I'm less sure of it.

Ok no indexer issues on 6.0.3 either, seems to be completely related to 6.5.0.

Closing the issue now, thank you @daschl for investigating!

@daschl FYI, opened CB support request #34433

@aaronjwhiteside I've been digging into this further and for the sake of other people seeing this too: check out this thread on docker for mac and postgres: https://twitter.com/felixge/status/1221512507690496001

Looks like there is an issue with docker for mac and time drift.. the suggested fix: https://twitter.com/felixge/status/1221512685008883712 restart docker for mac.. I did that and indeed cpu is at 10% and not 100 .. looks like we call clock_gettime more in 6.5 than 6.0 because of the sync durability timers

If people read this and are interested in a "fix": couchbase server 6.6 and later is for the operations in question switching to a coarse clock which should bring back the cpu even if the source switches to HPET (https://issues.couchbase.com/browse/MB-39618)

@rnorth I know you don't like to bump container db versions, but it might make sense to bump once 6.6 is released to avoid this papercut (workaround would be to restart docker on mac, but if some virtualized environments run on HPET this might be the only way to work around it properly). Note that 6.6 is a bit out at this point, I can make a PR once it is released. (and 6.6 is fully backwards compat with 6.5.1)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vmassol picture vmassol  路  3Comments

michael-simons picture michael-simons  路  3Comments

denis-zhdanov picture denis-zhdanov  路  3Comments

andredasilvapinto picture andredasilvapinto  路  3Comments

rnorth picture rnorth  路  3Comments