Currently rke can be used with any number of nodes, however running on large number of nodes raises some issues including too many open socket files and slow of the process.
The following is a list of optimizations that can be implemented to speed up the process and handle this number of nodes:
This should be part of the rke optimization and refactoring issue https://github.com/rancher/rancher/issues/15975
Fixes done for this issue are mostly performance tweaks to handle large number of nodes. Several areas are changed internally, so the common use cases should be verified with the latest build.
Notable changes in UX:
Sync labels and taints phase is faster and more stable with large number of nodes. @moelsayed Will these improvements also help rke remove? On a ~800 node cluster, it takes over 30 minutes.
My observation is rke remove is slow on large clusters and running tasks sequentially. Filed another issue to address - https://github.com/rancher/rke/issues/981
Tested yesterday with a master server build - b3249730 on an environment with 793 nodes. Total time for rke up to run was 13 minutes, 24 seconds. When using rke v 0.1.10, it took 35 minutes, 5 seconds on the same environment.
Most helpful comment
Tested yesterday with a master server build - b3249730 on an environment with 793 nodes. Total time for
rke upto run was 13 minutes, 24 seconds. When using rke v 0.1.10, it took 35 minutes, 5 seconds on the same environment.