below is one of the etcd cluster node, run it via docker container
b24afab33c18 quay.io/coreos/etcd:v3.4.0 "/usr/local/bin/etcd…" 15 hours ago Up 15 hours 0.0.0.0:2381->2381/tcp, 0.0.0.0:2482->2482/tcp, 2379-2380/tcp etcd2
At the beginning, we used version: 0.3.0, but found the watcher.close() doesn't work,
so changed to 0.3.1-SNAPSHOT (after changed, watcher.close() work well )
compile ("io.etcd:jetcd-core:0.3.1-SNAPSHOT")
the watch or close api seems has memory leak problem, so i used below scala codes to prove it.
(I deployed a etcd cluster including three nodes, the 3rd node's memory is increasing all the time during blow codes running, other nodes's memory seems keep stable)
def main(args: Array[String]): Unit = {
val hostAndPorts = "xxx.xxx.xxx.xxx:2379,xxx.xxx.xxx.xxx:2380,xxx.xxx.xxx.xxx:2381"
val addresses: List[URI] = hostAndPorts
.split(",")
.toList
.map(hp => {
val host :: port :: Nil = hp.split(":").toList
URI.create(s"http://$host:$port")
})
val client = Client.builder().endpoints(addresses.asJava).build();
val watchClient = client.getWatchClient
for (count1 <- 1 to 100) {
var watcherQueue = Queue.empty[Watcher]
for (count2 <- 1 to 5000) {
val key = s"namespace/${count1}/${count2}"
val option = WatchOption
.newBuilder()
.withPrevKV(true)
.withPrefix(key)
.build()
val watcher = watchClient.watch(key, option, Watch.listener((res: WatchResponse) => onKeyChange(res)))
watcherQueue = watcherQueue.enqueue(watcher)
}
Thread.sleep(1000 * 10)
// Close all watcher
for (watcher <- watcherQueue) {
watcher.close()
}
}
}
In above codes, create 5000 watchers per loop, after sleep 10.seconds, close these 5000 watchers,
total execute 100 loop.
During testing, in spite of closed the watcher in above codes, but the result showes it will not release the memory(the momery usage is increasing all the time via docker stats , finally can't execute docker ps or docker stats(it seems etcd2 crash here)), if use free -m to check memory, the memory is consumed over.
During testing, i also used pprof to check the memory
go tool pprof http://xxx.xxx.xxx.xxx:2381/debug/pprof/heap?debug=1&seconds=10
Found go.etcd.io/etcd/mvcc.(*watchableStore).NewWatchStream's memory usuage is increasing all the time.
So here, we can get a preliminary conclusion: the watch api lead to the memory leak, maybe watcher.close() doesn't release the memory.
PS: I also did other test to prove the problem from another angle
Removed all etcdclient.watch/close logic from our own application, and test the application , monit the etcd memory usgage, the memory usuage keeps stable.
I am not sure this problem's root reason in client side or server side, first just upgrade the client version.
If i upgraded the client version to 0.4.1 : https://mvnrepository.com/artifact/io.etcd/jetcd-core/0.4.1
compile ("io.etcd:jetcd-core:0.4.1")
Execute above main function, the problem still exist :(
I upgraded to v3.4.3 or downgraded to v3.2.28(this version is released on 2019.11.10), the problem still exist
@xiang90 @jpbetz @hexfusion @fanminshi @gyuho
I found you guys have a lot of experience on etcd and have handled similar issue, e.g https://github.com/etcd-io/etcd/issues/9103
Have any opinion?
@xiang90 @jpbetz @hexfusion @fanminshi @gyuho I also met the same problem. can you help to check this issue and give us some work around if there is. Thx!
+1 on this.
Facing the same issue.
And i also did benmark against the etcdClient.put , e.g demo codes
// put the key-value with lease
Long currentTime = System.currentTimeMillis();
ByteSequence key = ByteSequence.from(("ns/" + currentTime).getBytes());
ByteSequence value = ByteSequence.from(("dummy").getBytes());
CompletableFuture<LeaseGrantResponse> leaseGrantResponse = lease.grant(10);
PutOption putOption = null;
try {
putOption = PutOption.newBuilder().withLeaseId(leaseGrantResponse.get().getID()).build();
kvClient.put(key, value, putOption);
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
It seems the etcd's memory usuage is increasing during benchmark. (I use docker stat to monit the memory)
After stop benchmark, and monit the memory again, the memory usuage will not reduced forever.
@xiang90 @jpbetz @hexfusion @fanminshi @gyuho Do you guys know the reason? or have any idea for it if there is a problem.
Then, i used go client API to test again. codes as below
ackage main
import (
"context"
"fmt"
"github.com/coreos/etcd/clientv3"
"time"
)
func main() {
cli, _ := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379"},
DialTimeout: 5 * time.Second,
})
defer cli.Close()
for j := 1; j <= 1000; j++ {
var watchers []clientv3.Watcher
for i := 1; i <= 5000; i++ {
println("starting watcher: ", i)
watcher := clientv3.NewWatcher(cli)
key := fmt.Sprintf("foo-%d-%d", j, i)
_ = watcher.Watch(context.Background(), key, clientv3.WithPrefix())
watchers = append(watchers, watcher)
}
time.Sleep(10 * time.Second)
for _, watcher := range watchers {
println("closing watcher: ", watcher)
watcher.Close()
}
println("done: ", j)
}
}
I deployed a three nodes etcd cluster.
the etcd0's memory usuage keep stable (< 200M)
the etcd1's memory usuage keep stable (< 200M)
the etcd2's memory usuage keep stable (< 700M)
So obviousely,the etcd server itself and go version API have no problem also.
The problem is in jetcd.
Also facing the same issue.
/area bug
@wgliang
We solved this problem via below method
jetcd to IBM's etcd-java: https://github.com/IBM/etcd-javaimport com.google.protobuf.ByteString;
import com.ibm.etcd.client.EtcdClient;
import com.ibm.etcd.client.kv.KvClient;
import com.ibm.etcd.client.kv.WatchUpdate;
import io.grpc.stub.StreamObserver;
import java.util.ArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class WatchTest {
public static void main(String[] args) throws InterruptedException {
ExecutorService threadpool = Executors.newFixedThreadPool(10);
final StreamObserver<WatchUpdate> observer = new StreamObserver<WatchUpdate>() {
@Override
public void onNext(WatchUpdate value) {
System.out.println("watch event: " + value);
}
@Override
public void onError(Throwable t) {
System.out.println("watch error: " + t);
}
@Override
public void onCompleted() {
System.out.println("watch completed");
}
};
EtcdClient client = EtcdClient.forEndpoints("xxx.xxx.xxx.xxx:2379,xxx.xxx.xxx.xxx:2380,xxx.xxx.xxx.xxx:2381").withPlainText().build();
KvClient kvclient = client.getKvClient();
for (int i = 0; i < 1000; i++) {
ArrayList<KvClient.Watch> watchers = new ArrayList<>(2);
for (int j = 0; j < 5000; j++) {
System.out.println("create watcher: " + j);
String key = "foo-" + i + "-" + j;
KvClient.Watch watcher = kvclient.watch(ByteString.copyFromUtf8(key)).asPrefix().executor(threadpool).start(observer);
watchers.add(watcher);
}
System.out.println("iteration done: " + i);
Thread.sleep(10 * 1000);
for (KvClient.Watch watcher : watchers) {
System.out.println("closing watchers");
watcher.cancel(true);
System.out.println("watcher canceled");
watcher.close();
}
System.out.println("done: " + i);
}
}
}
So obviousely,jetcd's watcher has performance problem.
etcd-java is ok.
Reduce application's watcher number
it seems even if use etcd-java, if have a lot watcher which watches the same key, the memory will increase also. (if have a lot of watcher which watches different key, the memory keeps stable),
Anyway, it is not good when application has a lot of watcher, so it makes sense to do this(Reduce application's watcher number)
Maybe https://github.com/etcd-io/etcd/pull/11438 merged and release new etcd version which including this patch.
It is meaning to test below scenes
I think #11438 is more relevant in fixing the memory leakage.
Please note the difference in test code.
In https://github.com/etcd-io/etcd/issues/11350#issuecomment-563939647 :
watcher.cancel(true);
The above would clear victim continuously.
While in the test code https://github.com/etcd-io/etcd/issues/11350#issuecomment-554858136 , there is no cancel call.
That showed the accumulated victims, leading to memory leak.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/reopen