Che: High concurrency in a cluster of the che

Created on 24 Mar 2019 · 15Comments · Source: eclipse/che

I can start 100 workspaces on one che service. when I do the same thing on a cluster of two che services, seventy percent of the workspace failed to start. How can I achieve high concurrency in a cluster of the che?

kinquestion lifecyclstale

Source

Hailongli88

Most helpful comment

If you scaled up Che master to 2 instances it should be the reason of failed workspaces. Che is not capable to run in multi-instance mode at the moment. @sleshchenko correct me if I'm mistaken

garagatyi on 25 Mar 2019

👀3 👍3

All 15 comments

@ScrewTSW you might be interested in providing input regarding load tests, configuration properties and info about a number of simultaneously running workspaces per che server instance

ibuziuk on 24 Mar 2019

@ibuziuk Do you mean load balancing? We deployed the che service on kubernetes. when we extend two nodes, the problem I described has appeared. But one node is ok.
Can you tell me more?

Hailongli88 on 24 Mar 2019

@Hailongli88 have you deployed 2 che masters on different nodes? also, what is your goal regarding concurrency e.g. how many simultaneously running workspaces are you targeting?

ibuziuk on 24 Mar 2019

@ibuziuk Our goal is to achieve multiple users. The current phenomenon is that the Workspace Status changes directly from starting to stopped in which workspaces failed to start

Hailongli88 on 25 Mar 2019

Hello @Hailongli88. At this moment che-server are not certified to work in parallel.

Our goal is to achieve multiple users.

Can you elaborate? Are you sure you deployed multi-user che? How did you start Che? What infrastructure you are using Kubernetes or OpenShift?

skabashnyuk on 25 Mar 2019

If you scaled up Che master to 2 instances it should be the reason of failed workspaces. Che is not capable to run in multi-instance mode at the moment. @sleshchenko correct me if I'm mistaken

garagatyi on 25 Mar 2019

👀3 👍3

@ibuziuk Our goal is to achieve multiple users.

yeah, but why have you decided to scale up the che server? @ScrewTSW correct me if I'm wrong, but it is proven than single che server can handle 500 simultaneously running workspaces which is quite a lot
@Hailongli88 are you expectations above this figure?

ibuziuk on 25 Mar 2019

@ibuziuk single che server can handle 500 simultaneously running workspaces.
how did you achieve it? 500 is quite good. Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?

Hailongli88 on 25 Mar 2019

@ibuziuk What are the results of your test under what resource configuration?

Hailongli88 on 25 Mar 2019

@skabashnyuk
Kubernetes! Two che server pod.

Hailongli88 on 25 Mar 2019

@garagatyi
I got it! Thanks.

Hailongli88 on 25 Mar 2019

@ibuziuk single che server can handle 500 simultaneously running workspaces.
how did you achieve it? 500 is quite good. Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?

Hello. My findings are theoretical, but I was able to prove this case by collecting real communication data between workspace and che-master and created a test scenario where websocket connections are being open and the messages are being sent with real delays based on the captured data.

I was able to get the che-master to run stable and without OOM or restarts for over 6 hours on sustained load.

ScrewTSW on 25 Mar 2019

@ibuziuk What are the results of your test under what resource configuration?

My test settings were 30 jsonrpc prcessing threads, queue size of 7500 and pod memory set to 750Mb max. I was getting around 60 messages being dropped per minute with ~1650 requests per second being sent to the endpoint from 500 clients. The memory was stable and not increasing over around 600 Mb.

If you would want to go safely without messages being dropped, I would recommend to double these minimal requirements. Set the queue to 15k, threadpool size to 60 and provide 2Gb of ram to the pod.

https://github.com/redhat-developer/rh-che/tree/master/load-tests
Here is a link to the tool I was using for my testing observations.

Please keep in mind, that these are not real running workspaces. These are just theoretical limits based on the communication capacity of the che-master.
I have never experimentally confirmed that 500 real workspaces are able to be created and started at the same time.

ScrewTSW on 25 Mar 2019

Now, we can achieve up to 100 concurrent. What do I need to do to increase the amount of concurrency?

@Hailongli88 this is quite interesting info. Could you please clarify what is your config e.g. how you deployed che server on k8s (infrastructure is k8s cluster as I understand)?

ibuziuk on 25 Mar 2019

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.