A command dvc push --run-cache takes very long time (more than hour in my case) if there are a lot files in dvc_cache/runs. The reason is making a new ssh connection for each file to check if it exist on the remote.
Output of dvc version:
$ dvc version
1.8.4+21a6be
Additional Information (if any):
$ dvc push --run-cache --verbose
2020-10-16 14:36:14,695 DEBUG: Check for update is enabled.
2020-10-16 14:36:14,697 DEBUG: fetched: [(3,)]
2020-10-16 14:36:14,818 DEBUG: Establishing ssh connection with 'xxx' through port '22' as user 'dvc'
2020-10-16 14:36:15,457 DEBUG: Establishing ssh connection with 'xxx' through port '22' as user 'dvc'
2020-10-16 14:36:16,070 DEBUG: Establishing ssh connection with 'xxx' through port '22' as user 'dvc'
and so on, hundreds of time.
It is caused by dropping ssh connection out of the pool in get_connection in dvc/tree/pool.py since the connection throws a GeneratorExit exception in generator SSHTree.walk_files if the generator is closed prematurely, which is the case in dvc/stage/cache.py:210 in the statement first(to_remote.walk_files(key)).
See my PR.
Most helpful comment
It is caused by dropping ssh connection out of the pool in
get_connectionindvc/tree/pool.pysince the connection throws aGeneratorExitexception in generatorSSHTree.walk_filesif the generator is closed prematurely, which is the case indvc/stage/cache.py:210in the statementfirst(to_remote.walk_files(key)).See my PR.