Parse-server: Parse Server periodically is getting slow

Created on 5 Sep 2018  路  25Comments  路  Source: parse-community/parse-server

  • parse-server version 2.7.4 (Azure on a Standard_B4ms)
  • mongoDB-server version 3.4.14 (Azure on a separate Standard_B4ms)

I have an iOS & Android app, with LiveQuery (set on the parse-server's VM) that's being used a lot for chatting, where usually there are 卤 50 simultaneous users. The thing is, after a few hours of continuous usage, the server's cloud code responses are getting REALLY slow! Not just from a specific one... all cloud functions!

I'm using screen to run the parse server. So I found that if I restart the parse server (not the vm), the app is getting back to normal.

I also have logs enabled at all times. (just mentioning it in case it could be the issue)

I can't understand why this is happening!
Any ideas?

All 25 comments

I also have logs enabled at all times. (just mentioning it in case it could be the issue)

Depending on the log level set, it could cause problems.
Do you mean verbose is always on ?

@georgesjamous I have it on default... It shows every cloud call's params and response!
I haven't added the verbose option at all...

Alright, forget logging for a second. How about the cache. Are you using Redis ? or the default memory cache?

Just trying to narrow down the problem for my self and other people who stumble upon this issue.
It doesn't necessarily have to be a Live Query problem, but it could.

I'm not using any caching mechanism!

I'm thinking that LiveQuery may be the issue, because in the Parse's docs they propose to set the LiveQuery server on a different machine and I haven't! Although, 50 users chatting at the same time shouldn't cause this problem! I don't think they're that many!

You should probably run a profiler attached to your process or an APM that will help you quantify how slow.

@Samigos i think you should try what flovilmart suggested.

Some additional notes:
There are lots of stuff that should be taken into consideration before you considering this as an issue.

First off, Yes, it's far better to have LiveQuery on a separate server, try that.
Also, I am not sure how much RAM or CPUs your Azure instance has, but if you are running everything (custom cloud logic + live query + accessing the via SDKs) on a single NodeJs instance, then I am sure the problem lies there. Since most of these operations will be queued-up/throttled in node.
Secondly, the amount of Data in/out of your instance need to be considered also, for example, what does your custom cloud code logic do ? how many calls ps is each user doing additionally to LiveQ? How many objects are you returning in a single query ? (Tip: keep it low, cache on user devices)
In other words, if you are doing a bunch of additional requests concurrently with LiveQ all running on the same instance then yes, your problem is to be expected :)

Look how I think your debugging should go,
1- Split LiveQ from your main server
(test if this speeds things up a bit)
2- Run Parse clustered. To the N-1. N is the number of cores your CPU has. _(< 2 cores one instance, like you are doing now. 3 cores run 2, and so on)_
(test if this speeds things up a bit)
3- Start using Redis Cache, this will free up some memory if you have lots of Roles or 5+ levels deep)
(test if this speeds things up a bit)
4- debug your code, how many request are you doing per user ? what does your cloud code do ?
Plus: I turn off console logging for production, I use a custom logging adapter, but i guess you can edit this somehow here Line44

Many different things could cause your server to slow down if after doing 1,2,3 the problem remains, then i think this could be considered an issue with parse-server.

@georgesjamous Thanks for the answer! It'll really help me a lot!

Quick question though, about your second suggestion. I'm not sure how it's done, but I think pm2 is a way to do it, right? If so, I think I've read somewhere @flovilmart writing that we shouldn't run parse server on pm2...

@flovilmart Could you please elaborate on that?

Well pm2 is a process manager. If you鈥檙e using anything else than Ec2, you should not use that.

Also, you don鈥檛 need pm2 to run parse in cluster mode. There鈥檚 a 鈥攃luster option on parse-server CLI.

I know that pm2 is one of the best!! Is there a specific reason that we shouldn't use it with parse server?

@samigos PM2 is one of the best what?

As a process manager and a clustering solution... I've seen people recommending it many times! (not for parse servers specifically)

So what's the reason I shouldn't use it?

So what's the reason I shouldn't use it?

Read again what I posted above. UNLESS you are deploying to your own EC2 / Google Cloud Compute etc.. you don't need PM2. for example: elastic beanstalk, docker (most cases), heroku, Google app engine, Kubernetes and more.

So again, unless you know why you're using PM2, you should probably not use it. Also this is getting off topic, so I'll be closing this.

Hi again! It's been a few months! :D
Since you both seem to know your stuff, could you help me out with the clustering part?

I don't know if a Parse Server is stateless or not! The _Session_ and _Installation_ collections for example, will they have any issues?

I'm also using LiveQuery for chatting. Could that have any issues?

The Session and Installation collections for example, will they have any issues?

Why those 2 collections?

The server is stateless, unless you have live query. At this point it needs to be aware of the live query instances, the best way, is to read the docs, so you are familiar with scaling parse-server horizontally

https://docs.parseplatform.org/parse-server/guide/#configuring-the-server

Why those 2 collections?

A few months ago I read somewhere that those 2 are being cached and that if we want to cluster the server, it could be a potential issue... Maybe I remember it wrong? It's not a thing?

I happened to find this issue I opened a while back and I have a question @georgesjamous... you said "Run Parse clustered. To the N-1." Why not N?

I have seen the issue described here. Parse Server response times were continuously increasing, a server restart would reset the response time. How fast the response time was increasing depended on the request load, more load meant faster increase.

I was not able to figure out what caused this, even with transaction tracing and instance resource monitoring. Clustering did not help.

I would also be curious to find out what the reason for that was.

@Samigos I believe you should always leave at least one (or depending on your needs) CPU core os.cpus() for the system processes.

I have seen the issue described here. Parse Server response times were continuously increasing, a server restart would reset the response time. How fast the response time was increasing depended on the request load, more load meant faster increase.

I was not able to figure out what caused this, even with transaction tracing and instance resource monitoring. Clustering did not help.

I would also be curious to find out what the reason for that was.

I have actually done everything @georgesjamous suggested! Our chat is now powered by Firestore (he didn鈥檛 suggest to go there, but to separate the live server from the app server), we just started using Redis to cache pretty much anything and the app is now clustered!

I can say with much confidence that the server feels noticeably lighter and faster! I don't know though how much traffic we can handle!

And you don't see an increase in response time anymore? How long are the parse server instances running continuously without getting restarted?

We keep restarting them every 15 minutes but we are going to decrease the frequency soon. Even at these regular restarts we had issues before! Now, our processes occupy a 10-25% of the cpu and very frequently we see below 10! Before the clustering & caching, we would see it spiking even over 100%, with a medium of 40%. The cache helped a lot since the server gets occupied for shorter periods per request!

We keep restarting them every 15 minutes

That sounds like a very short, almost impractical interval. I would expect it to run at least for some days without issue. It would be interesting to see how the response time changes if you do not restart the server.

I'll post an update when we do it and have some results

Was this page helpful?
0 / 5 - 0 ratings