Meilisearch: Sudden spike in CPU usage in production causes server crash

Created on 27 Nov 2020  路  3Comments  路  Source: meilisearch/MeiliSearch

Describe the bug

I have a meilisearch one-click app deployed to Digital Ocean (2vCPU + 2GB RAM + 25GB SSD + 3TB Transfer) 3 days ago. In the last two days thrice there was a sudden spike in CPU usage, that caused the droplet to crash with 502 nginx error. I had to restart the droplet all the time to make it up and running.

I have followed the steps in meilisearch-digital-ocean documentation and hosted the application. Not sure what causes this sudden spike. I have only a single index and 7k small documents within the index on the hosted application.

I am not sure if meilisearch is deployed as standalone or docker application on Digital Ocean one-click, but any guide on how to debug such issues would be a great help !

Screenshot

The highlighted spikes in CPU usage caused the crash.
image

Additional context
On contacting the DO support team. They responded with the below message:

While looking into the history of the physical host the Droplet is on, there doesn't appear to be any recent cases that would clearly outline a source for any issues. However, reviewing your Droplet graphs, I see that CPU usage spiked to 100%.

Please note that as a self-managed provider we do not directly access customer deployments within droplets. We are happy to offer guidance and any insights we might have to help you move forward even though we may not be able to provide a direct solution.

That said, typically, high CPU usage would be a result of either an application/process or the kernel process running within the Droplet itself. While we can and do monitor the CPU usage on the physical host, the usage on the hypervisor itself doesn't translate to the CPU usage within the Droplet.

Regards,
Bhavya

Most helpful comment

Hi!

I will try to help you a bit to try and find the problem. Please be sure you can SSH to your droplet for the following steps and ideas :)

  1. Meilisearch is running as a service, using systemd. This means that all the application logs should be available to you by doing:
journalctl -u meilisearch.service

Remember that you can use --since and --until to limit your research. Try to check MeiliSearch logs to see if there are any unusual requests at the moment you had the crash or unusual activity.

  1. It is also possible that the service runs in a process that is taking too much memory, and it could get killed by the OS. In this case, you won't find anything useful in MeilISearch logs, as when the OS kills the process, MeilISearch is incapable of logging anything at all before it's sad death. In this case, you can probably find in your journalctl (not filtering for MeiliSearch) some Out of memory error logged by LINUX.

I suggest you activate memory tracking on Digital Ocean, by installing their Metrics agent (and also set some alerts if you want to monitor easily)

  1. Normally, systemd restarts your program when it crashes or some internal error occurs. But in case the OS kills MeiliSearch (like in the out of memory example) it may not restart. You can change the restart policy by opening the service file located at /etc/systemd/system/meilisearch.service and add a Restart=always policy in your [Service] section. Your file would look like this:
...
[Service]
Type=simple
Restart=always # add this line!
ExecStart=/usr/bin/meilisearch
...

Save and exit. Now let's restart deamons and meilisearch.service

systemctl deamon-reload
systemctl restart meilisearch

PS: This will restart your service but it can also hide some problems, so it may be better to use it in combination with monitoring at the same time (metrics and alerts). If it restarts too often because it doesn't have enough memory, you might not even notice it, but get service interrutpions etc.

Hope this helps!

All 3 comments

Hi!

I will try to help you a bit to try and find the problem. Please be sure you can SSH to your droplet for the following steps and ideas :)

  1. Meilisearch is running as a service, using systemd. This means that all the application logs should be available to you by doing:
journalctl -u meilisearch.service

Remember that you can use --since and --until to limit your research. Try to check MeiliSearch logs to see if there are any unusual requests at the moment you had the crash or unusual activity.

  1. It is also possible that the service runs in a process that is taking too much memory, and it could get killed by the OS. In this case, you won't find anything useful in MeilISearch logs, as when the OS kills the process, MeilISearch is incapable of logging anything at all before it's sad death. In this case, you can probably find in your journalctl (not filtering for MeiliSearch) some Out of memory error logged by LINUX.

I suggest you activate memory tracking on Digital Ocean, by installing their Metrics agent (and also set some alerts if you want to monitor easily)

  1. Normally, systemd restarts your program when it crashes or some internal error occurs. But in case the OS kills MeiliSearch (like in the out of memory example) it may not restart. You can change the restart policy by opening the service file located at /etc/systemd/system/meilisearch.service and add a Restart=always policy in your [Service] section. Your file would look like this:
...
[Service]
Type=simple
Restart=always # add this line!
ExecStart=/usr/bin/meilisearch
...

Save and exit. Now let's restart deamons and meilisearch.service

systemctl deamon-reload
systemctl restart meilisearch

PS: This will restart your service but it can also hide some problems, so it may be better to use it in combination with monitoring at the same time (metrics and alerts). If it restarts too often because it doesn't have enough memory, you might not even notice it, but get service interrutpions etc.

Hope this helps!

Hi @eskombro,

Thank you for the detailed help !

I updated my droplet settings with some of the suggestions above :

  • Restart always
  • Also upgraded the DO metrics from legacy to recent version

I also tried to take a look at the application logs using journalctl -u meilisearch.service , but found no useful app logs there.

Also I am little concerned over the spike in the Disk I/O operation (as in the above mentioned screenshot) which could also be a potential cause for spike in the CPU usage. Does meilisearch perform any kind of background I/O jobs periodically ?

Regards,
Bhavya.

Also I am little concerned over the spike in the Disk I/O operation (as in the above mentioned screenshot) which could also be a potential cause for spike in the CPU usage. Does meilisearch perform any kind of background I/O jobs periodically ?

I think maybe @MarinPostma could give you a little more insight on this? :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maximedegreve picture maximedegreve  路  3Comments

curquiza picture curquiza  路  3Comments

curquiza picture curquiza  路  5Comments

andersju picture andersju  路  3Comments

vird picture vird  路  3Comments