etcdClient.watch lead to memory usuage increasing all the time

Created on 13 Nov 2019 · 12Comments · Source: etcd-io/etcd

Env

etcd server: v3.4.0

below is one of the etcd cluster node, run it via docker container

b24afab33c18        quay.io/coreos/etcd:v3.4.0                            "/usr/local/bin/etcd…"   15 hours ago        Up 15 hours        0.0.0.0:2381->2381/tcp, 0.0.0.0:2482->2482/tcp, 2379-2380/tcp   etcd2

etcd client: 0.3.1-SNAPSHOT

At the beginning, we used version: 0.3.0, but found the watcher.close() doesn't work,
so changed to 0.3.1-SNAPSHOT (after changed, watcher.close() work well )

compile ("io.etcd:jetcd-core:0.3.1-SNAPSHOT")

Problem phenomenon

the watch or close api seems has memory leak problem, so i used below scala codes to prove it.
(I deployed a etcd cluster including three nodes, the 3rd node's memory is increasing all the time during blow codes running, other nodes's memory seems keep stable)

  def main(args: Array[String]): Unit = {
    val hostAndPorts = "xxx.xxx.xxx.xxx:2379,xxx.xxx.xxx.xxx:2380,xxx.xxx.xxx.xxx:2381"
    val addresses: List[URI] = hostAndPorts
      .split(",")
      .toList
      .map(hp => {
        val host :: port :: Nil = hp.split(":").toList
        URI.create(s"http://$host:$port")
      })
    val client = Client.builder().endpoints(addresses.asJava).build();
    val watchClient = client.getWatchClient

    for (count1 <- 1 to 100) {
      var watcherQueue = Queue.empty[Watcher]
      for (count2 <- 1 to 5000) {
        val key = s"namespace/${count1}/${count2}"
        val option = WatchOption
          .newBuilder()
          .withPrevKV(true)
          .withPrefix(key)
          .build()

        val watcher = watchClient.watch(key, option, Watch.listener((res: WatchResponse) => onKeyChange(res)))
        watcherQueue = watcherQueue.enqueue(watcher)
      }

      Thread.sleep(1000 * 10)

      // Close all watcher
      for (watcher <- watcherQueue) {
        watcher.close()
      }
    }
  }

In above codes, create 5000 watchers per loop, after sleep 10.seconds, close these 5000 watchers,
total execute 100 loop.

During testing, in spite of closed the watcher in above codes, but the result showes it will not release the memory(the momery usage is increasing all the time via docker stats , finally can't execute docker ps or docker stats(it seems etcd2 crash here)), if use free -m to check memory, the memory is consumed over.

During testing, i also used pprof to check the memory

go tool pprof http://xxx.xxx.xxx.xxx:2381/debug/pprof/heap?debug=1&seconds=10

Found go.etcd.io/etcd/mvcc.(*watchableStore).NewWatchStream's memory usuage is increasing all the time.

So here, we can get a preliminary conclusion: the watch api lead to the memory leak, maybe watcher.close() doesn't release the memory.

PS: I also did other test to prove the problem from another angle
Removed all etcdclient.watch/close logic from our own application, and test the application , monit the etcd memory usgage, the memory usuage keeps stable.

stale

Source

ningyougang

👍1

All 12 comments

I am not sure this problem's root reason in client side or server side, first just upgrade the client version.

Try other client version

If i upgraded the client version to 0.4.1 : https://mvnrepository.com/artifact/io.etcd/jetcd-core/0.4.1

compile ("io.etcd:jetcd-core:0.4.1")

Execute above main function, the problem still exist :(

Try other server version

I upgraded to v3.4.3 or downgraded to v3.2.28(this version is released on 2019.11.10), the problem still exist

@xiang90 @jpbetz @hexfusion @fanminshi @gyuho
I found you guys have a lot of experience on etcd and have handled similar issue, e.g https://github.com/etcd-io/etcd/issues/9103

Have any opinion?

ningyougang on 13 Nov 2019

👍1

@xiang90 @jpbetz @hexfusion @fanminshi @gyuho I also met the same problem. can you help to check this issue and give us some work around if there is. Thx!

smalltangcai on 14 Nov 2019

+1 on this.
Facing the same issue.

style95 on 14 Nov 2019

And i also did benmark against the etcdClient.put , e.g demo codes

        // put the key-value with lease
        Long currentTime = System.currentTimeMillis();
        ByteSequence key = ByteSequence.from(("ns/" + currentTime).getBytes());
        ByteSequence value = ByteSequence.from(("dummy").getBytes());

        CompletableFuture<LeaseGrantResponse> leaseGrantResponse = lease.grant(10);
        PutOption putOption = null;
        try {
            putOption = PutOption.newBuilder().withLeaseId(leaseGrantResponse.get().getID()).build();
            kvClient.put(key, value, putOption);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }

It seems the etcd's memory usuage is increasing during benchmark. (I use docker stat to monit the memory)
After stop benchmark, and monit the memory again, the memory usuage will not reduced forever.

@xiang90 @jpbetz @hexfusion @fanminshi @gyuho Do you guys know the reason? or have any idea for it if there is a problem.

ningyougang on 14 Nov 2019

Then, i used go client API to test again. codes as below

ackage main

import (
    "context"
    "fmt"
    "github.com/coreos/etcd/clientv3"
    "time"
)

func main() {
    cli, _ := clientv3.New(clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 5 * time.Second,
    })


    defer cli.Close()

    for j := 1; j <= 1000; j++ {
        var watchers []clientv3.Watcher
        for i := 1; i <= 5000; i++ {
            println("starting watcher: ", i)
            watcher := clientv3.NewWatcher(cli)
            key := fmt.Sprintf("foo-%d-%d", j, i)
            _ = watcher.Watch(context.Background(), key, clientv3.WithPrefix())

            watchers = append(watchers, watcher)

        }
        time.Sleep(10 * time.Second)

        for _, watcher := range watchers {
            println("closing watcher: ", watcher)
            watcher.Close()
        }
        println("done: ", j)
    }
}

I deployed a three nodes etcd cluster.
the etcd0's memory usuage keep stable (< 200M)
the etcd1's memory usuage keep stable (< 200M)
the etcd2's memory usuage keep stable (< 700M)
So obviousely，the etcd server itself and go version API have no problem also.
The problem is in jetcd.

ningyougang on 18 Nov 2019

Also facing the same issue.

/area bug

wgliang on 10 Dec 2019

@wgliang

We solved this problem via below method

Change our application's etcd client jar from jetcd to IBM's etcd-java: https://github.com/IBM/etcd-java
After benchamark with etcd-java using below codes, all etcd node's memory keeps stable.

import com.google.protobuf.ByteString;
import com.ibm.etcd.client.EtcdClient;
import com.ibm.etcd.client.kv.KvClient;
import com.ibm.etcd.client.kv.WatchUpdate;
import io.grpc.stub.StreamObserver;

import java.util.ArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class WatchTest {

    public static void main(String[] args) throws InterruptedException {
        ExecutorService threadpool = Executors.newFixedThreadPool(10);
        final StreamObserver<WatchUpdate> observer = new StreamObserver<WatchUpdate>() {

            @Override
            public void onNext(WatchUpdate value) {
                System.out.println("watch event: " + value);
            }

            @Override
            public void onError(Throwable t) {
                System.out.println("watch error: " + t);
            }

            @Override
            public void onCompleted() {
                System.out.println("watch completed");
            }
        };

        EtcdClient client = EtcdClient.forEndpoints("xxx.xxx.xxx.xxx:2379,xxx.xxx.xxx.xxx:2380,xxx.xxx.xxx.xxx:2381").withPlainText().build();
        KvClient kvclient = client.getKvClient();

        for (int i = 0; i < 1000; i++) {
            ArrayList<KvClient.Watch> watchers = new ArrayList<>(2);
            for (int j = 0; j < 5000; j++) {
                System.out.println("create watcher: " + j);
                String key = "foo-" + i + "-" + j;
                KvClient.Watch watcher = kvclient.watch(ByteString.copyFromUtf8(key)).asPrefix().executor(threadpool).start(observer);
                watchers.add(watcher);
            }
            System.out.println("iteration done: " + i);

            Thread.sleep(10 * 1000);

            for (KvClient.Watch watcher : watchers) {
                System.out.println("closing watchers");
                watcher.cancel(true);
                System.out.println("watcher canceled");
                watcher.close();
            }
            System.out.println("done: " + i);
        }
    }
}

So obviousely，jetcd's watcher has performance problem.
etcd-java is ok.

Reduce application's watcher number

it seems even if use etcd-java, if have a lot watcher which watches the same key, the memory will increase also. (if have a lot of watcher which watches different key, the memory keeps stable),
Anyway, it is not good when application has a lot of watcher, so it makes sense to do this(Reduce application's watcher number)

ningyougang on 10 Dec 2019

👍1

Maybe https://github.com/etcd-io/etcd/pull/11438 merged and release new etcd version which including this patch.
It is meaning to test below scenes

Check whether etcd memory increase when create a lot watchers which watch same key
Check whether etcd memory increase when create a lot watchers which watch different key

ningyougang on 10 Dec 2019

I think #11438 is more relevant in fixing the memory leakage.

tedyu on 10 Dec 2019

Please note the difference in test code.
In https://github.com/etcd-io/etcd/issues/11350#issuecomment-563939647 :

                watcher.cancel(true);

The above would clear victim continuously.

While in the test code https://github.com/etcd-io/etcd/issues/11350#issuecomment-554858136 , there is no cancel call.
That showed the accumulated victims, leading to memory leak.

11438 fixes the leak.

tedyu on 10 Dec 2019

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.