Clickhouse: ZooKeeper::waitForDisappear leak question

Created on 2 Sep 2020  路  8Comments  路  Source: ClickHouse/ClickHouse

I noted there is one comment in this method.
/// NOTE: if the node doesn't exist, the watch will leak.

Is this issue already been fixed or still need to fix this potential leak issue?

comp-zookeeper development question

Most helpful comment

@alexey-milovidov Sure, I will submit a PR and you can help review.

All 8 comments

I checked the exists response logic, seems the watch clear logic is already implemented.

```
else if (xid == watch_xid)
{
ProfileEvents::increment(ProfileEvents::ZooKeeperWatchResponse);
response = std::make_shared();

request_info.callback = [this](const Response & response_)
{
    const WatchResponse & watch_response = dynamic_cast<const WatchResponse &>(response_);

    std::lock_guard lock(watches_mutex);

    auto it = watches.find(watch_response.path);
    if (it == watches.end())
    {
        /// This is Ok.
        /// Because watches are identified by path.
        /// And there may exist many watches for single path.
        /// And watch is added to the list of watches on client side
        ///  slightly before than it is registered by the server.
        /// And that's why new watch may be already fired by old event,
        ///  but then the server will actually register new watch
        ///  and will send event again later.
    }
    else
    {
        for (auto & callback : it->second)
            if (callback)
                callback(watch_response);   /// NOTE We may process callbacks not under mutex.

        CurrentMetrics::sub(CurrentMetrics::ZooKeeperWatch, it->second.size());
        watches.erase(it);
    }
};

}

@alexey-milovidov outdated comment?

If the node does not exist and never appear, the watch event will not happen and we will keep the watch.

PS. I don't remember - should ZooKeeper send us a watch event when non-existing node appear?

@alexey-milovidov I found a thread about this url.

Is getData the safer way?

Yes, it looks like the solution!

I searched through the code and found the following places to fix:

ZooKeeper.cpp:675        impl->exists(path, callback, watch);
LeaderElection.h:127            if (!zookeeper.existsWatch(path + "/" + *my_node_it, nullptr, task->getWatchCallback()))
ZooKeeperNodeCache:94    result.exists = zookeeper->existsWatch(path, &result.stat, watch_callback);
StorageReplicatedMergeTree:2145        if (zookeeper->exists(source_path + "/columns", nullptr, event))

We need to rewrite them to tryGet and tryGetWatch,
then remove the watch parameter from exists method from ZooKeeper class
and also from IKeeper interface and its implementations: ZooKeeperImpl, TestKeeper
to avoid watch leaks in future.

In all places the size of the node is small, so using getData won't introduce any significant overhead.

Do you want to implement this change?

@alexey-milovidov Sure, I will submit a PR and you can help review.

@alexey-milovidov I created a PR https://github.com/ClickHouse/ClickHouse/pull/14626 by following our discussion.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bseng picture bseng  路  3Comments

jangorecki picture jangorecki  路  3Comments

goranc picture goranc  路  3Comments

igor-sh8 picture igor-sh8  路  3Comments

jangorecki picture jangorecki  路  3Comments