It has already happened to me twice that the server running solely mailcow started running some process on 100 % and overloaded the server which became unaccessible via any means (like SSH).
First time it happened around 23:30 on one day, second time it was this morning at 5:07.
It seems to have no apparent reason and currently it is unknown to me which process does it. There is no cron task that should be run at these times so I am guessing it could be one of the containers. I am investigating the logs and will post them later on when I may have some clue.
My question is whether this has happened to anyone already? It is a production server and I have not done any modifications whatsoever.
Hi,
I have not seen this happening... ClamAV can use some cpu time when updating, but not _that_ much.
Couldn't you see the process? This will ultimately help. :-)
I saw something happening there; unfortunately I don't have the previous log as I assumed it could have been some cron task. This is now not the case.
It drove the server unaccesible and I saw errors in various containers which is hard to debug as some do not log the time. I am posting what I found interesting in logs cutting out the unimportant and useless older or newer logs:
mailcow-clamd.log.txt
mailcow-dovecot.log.txt
mailcow-fail2ban.log.txt
mailcow-mysql.log.txt
mailcow-php-fpm.log.txt
mailcow-postfix.log.txt
mailcow-redis.log.txt
mailcow-rspamd.log.txt
mailcow-unbound.log.txt
Oh and no, I could not see the process. There was no way for me to enter the server.
I don't see anything indicating the start of a problem. Though there was a problem - obviously.
Maybe the server run out of memory? Looks like it took Redis way to long to write to memory.
As for the resources, I am posting CPU and memory usage.


I am running PROMOX and the server is virtualized so it is simple to see these resources but as with any other locked server, it is still locked when it is locked.
I also found interesting in the log there are various errors around that time:
So it may be either clamav or redis.
dn_expand seem to be a DNS test, by the way.
Hm, you shouldn't assume this by the logs. Processes can die or stuck and stop further logging. Other innocent processes then log strange errors due to being low on resources (CPU, memory).
Can you install some kind of process logging?
Am 14.07.2017 um 09:42 schrieb kunago notifications@github.com:
I also found interesting in the log there are various errors around that time:
clamav started an update with some dn_expand errors (which I have not figured out what they mean)
redis memory usage
mysql crashed
So it may be either clamav or redis.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
I tried looking for some monitoring software. The thing is what to monitor? I found a nice command "docker stats". Would that be good to start with?
I set up my own monitoring script so once I have some news about the CPU usage, I will post more details.
Do you use Debian 9 and have some snapshots on your Proxmox
So here it goes.

Seems to be some kernel issue. It was also happening on another server of mine but I tought it was the VPS provider issue. It was not apparently.
I am using Debian 8 by the way.
Kernel Version ? uname -a
It turns out this is an already documented issue and the solution is to upgrade the kernel.
If it happens to somebody else, please see this page in the docs: Prepare Your System.
I must have missed this notice since I was updating mailcow since a few versions back and even though this was a quite important one, I saw it nowhere.
@elvirdz : It must me the case with kernel, right now working on the update.
As healthchecks were an on-the-fly change and kernel upgrade means to change the storage drive to anything but _aufs_ as it is not supported in Kernel 4.9, could anyone point me to a safe way to migrate everything?
ok u have to change docker on overlay2 on 4.9 kernel
docker-compose down
service docker stop
rm -r /var/lib/docker/aufs
nano /etc/docker/daemon.json
{
"storage-driver": "overlay2"
}
service docker start
docker-compose up -d
Will that affect any data, such as email storage?
no just docker-compose down
That's right, the driver is used by the containers. Just run "down" to drop all containers (and every other container that may be running on this host, too).
Nice, guys. Thank you very much for your help. It was just how @elvirdz wrote down, fairly easy. It would be good to leave this topic here for the reference.
Most helpful comment
no just docker-compose down