I noted there is one comment in this method.
/// NOTE: if the node doesn't exist, the watch will leak.
Is this issue already been fixed or still need to fix this potential leak issue?
I checked the exists response logic, seems the watch clear logic is already implemented.
```
else if (xid == watch_xid)
{
ProfileEvents::increment(ProfileEvents::ZooKeeperWatchResponse);
response = std::make_shared
request_info.callback = [this](const Response & response_)
{
const WatchResponse & watch_response = dynamic_cast<const WatchResponse &>(response_);
std::lock_guard lock(watches_mutex);
auto it = watches.find(watch_response.path);
if (it == watches.end())
{
/// This is Ok.
/// Because watches are identified by path.
/// And there may exist many watches for single path.
/// And watch is added to the list of watches on client side
/// slightly before than it is registered by the server.
/// And that's why new watch may be already fired by old event,
/// but then the server will actually register new watch
/// and will send event again later.
}
else
{
for (auto & callback : it->second)
if (callback)
callback(watch_response); /// NOTE We may process callbacks not under mutex.
CurrentMetrics::sub(CurrentMetrics::ZooKeeperWatch, it->second.size());
watches.erase(it);
}
};
}
@alexey-milovidov outdated comment?
If the node does not exist and never appear, the watch event will not happen and we will keep the watch.
PS. I don't remember - should ZooKeeper send us a watch event when non-existing node appear?
@alexey-milovidov I found a thread about this url.
Is getData the safer way?
Yes, it looks like the solution!
I searched through the code and found the following places to fix:
ZooKeeper.cpp:675 impl->exists(path, callback, watch);
LeaderElection.h:127 if (!zookeeper.existsWatch(path + "/" + *my_node_it, nullptr, task->getWatchCallback()))
ZooKeeperNodeCache:94 result.exists = zookeeper->existsWatch(path, &result.stat, watch_callback);
StorageReplicatedMergeTree:2145 if (zookeeper->exists(source_path + "/columns", nullptr, event))
We need to rewrite them to tryGet and tryGetWatch,
then remove the watch parameter from exists method from ZooKeeper class
and also from IKeeper interface and its implementations: ZooKeeperImpl, TestKeeper
to avoid watch leaks in future.
In all places the size of the node is small, so using getData won't introduce any significant overhead.
Do you want to implement this change?
@alexey-milovidov Sure, I will submit a PR and you can help review.
@alexey-milovidov I created a PR https://github.com/ClickHouse/ClickHouse/pull/14626 by following our discussion.
Most helpful comment
@alexey-milovidov Sure, I will submit a PR and you can help review.