I can start 100 workspaces on one che service. when I do the same thing on a cluster of two che services, seventy percent of the workspace failed to start. How can I achieve high concurrency in a cluster of the che?
@ScrewTSW you might be interested in providing input regarding load tests, configuration properties and info about a number of simultaneously running workspaces per che server instance
@ibuziuk Do you mean load balancing? We deployed the che service on kubernetes. when we extend two nodes, the problem I described has appeared. But one node is ok.
Can you tell me more?
@Hailongli88 have you deployed 2 che masters on different nodes? also, what is your goal regarding concurrency e.g. how many simultaneously running workspaces are you targeting?
@ibuziuk Our goal is to achieve multiple users. The current phenomenon is that the Workspace Status changes directly from starting to stopped in which workspaces failed to start
Hello @Hailongli88. At this moment che-server are not certified to work in parallel.
Our goal is to achieve multiple users.
Can you elaborate? Are you sure you deployed multi-user che? How did you start Che? What infrastructure you are using Kubernetes or OpenShift?
If you scaled up Che master to 2 instances it should be the reason of failed workspaces. Che is not capable to run in multi-instance mode at the moment. @sleshchenko correct me if I'm mistaken
@ibuziuk Our goal is to achieve multiple users.
yeah, but why have you decided to scale up the che server? @ScrewTSW correct me if I'm wrong, but it is proven than single che server can handle 500 simultaneously running workspaces which is quite a lot
@Hailongli88 are you expectations above this figure?
@ibuziuk single che server can handle 500 simultaneously running workspaces.
how did you achieve it? 500 is quite good. Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?
@ibuziuk What are the results of your test under what resource configuration?
@skabashnyuk
Kubernetes! Two che server pod.
@garagatyi
I got it! Thanks.
@ibuziuk single che server can handle 500 simultaneously running workspaces.
how did you achieve it? 500 is quite good. Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?
Hello. My findings are theoretical, but I was able to prove this case by collecting real communication data between workspace and che-master and created a test scenario where websocket connections are being open and the messages are being sent with real delays based on the captured data.
I was able to get the che-master to run stable and without OOM or restarts for over 6 hours on sustained load.
@ibuziuk What are the results of your test under what resource configuration?
My test settings were 30 jsonrpc prcessing threads, queue size of 7500 and pod memory set to 750Mb max. I was getting around 60 messages being dropped per minute with ~1650 requests per second being sent to the endpoint from 500 clients. The memory was stable and not increasing over around 600 Mb.
If you would want to go safely without messages being dropped, I would recommend to double these minimal requirements. Set the queue to 15k, threadpool size to 60 and provide 2Gb of ram to the pod.
https://github.com/redhat-developer/rh-che/tree/master/load-tests
Here is a link to the tool I was using for my testing observations.
Please keep in mind, that these are not real running workspaces. These are just theoretical limits based on the communication capacity of the che-master.
I have never experimentally confirmed that 500 real workspaces are able to be created and started at the same time.
Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?
@Hailongli88 this is quite interesting info. Could you please clarify what is your config e.g. how you deployed che server on k8s (infrastructure is k8s cluster as I understand)?
Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.
Mark the issue as fresh with /remove-lifecycle stale in a new comment.
If this issue is safe to close now please do so.
Moderators: Add lifecycle/frozen label to avoid stale mode.
Most helpful comment
If you scaled up Che master to 2 instances it should be the reason of failed workspaces. Che is not capable to run in multi-instance mode at the moment. @sleshchenko correct me if I'm mistaken