Server: slow trashbin expire run

Created on 17 Jan 2020  路  3Comments  路  Source: nextcloud/server

The database in use has some 2 million files, and had 66000 files in the trashbin which had piled up since expiry was disabled for a long time (purposefully).

After expiry was enabled ('trashbin_retention_obligation' => "7,30"), the manual run took a very long time. In parallel, the cron job tried to expire the files as well, but took too long, so the next run would try the same and so on.

I fixed this by starting the expiry manually (occ trashbin:expire), and disabled the automatic expiry in config.php in the meantime to prevent cron from intervening. This ultimately succeeded, but took about 30 hours on decent machines. Whilst running, the PostgreSQL12 database showed constant high CPU load, resulting from many "SELECT ... FROM oc_filecache WHERE storage=$ AND name ILIKE $"

I had a look at files_trashbin Commands/ExpireTrash.php, but couldn't see the reason why this query should be issued all over again. The PostgreSQL Activity view won't show the parameters, so I can't tell if the name parameter is better than '%'. If it starts with characters, the Cache::select method, which seems to be called by trashbin:expire, could be improved by using lower(name) LIKE lower($) and creating an index on (storage, lower(name)). Still, I couldn't spot why expiry should use this method at all.

NC17 on Debian10/Apache2.4, PostgreSQL12 with plenty of RAM on dedicated server, no encryption or other file related apps.

0. Needs triage bug

All 3 comments

This is likely a database configuration issue. If I recall correctly binlog_commit_wait_use in particular can speed things up when set lower but will increase disk I/O significantly.

It has to do with how nextcloud polls the database.

Depending on how nextcloud processes deletions you may need to increase your PHP-FPM children/servers to speed things up as well.

That binlog_commit_wait_use seems to be some MySQL stuff, I'm on PostgreSQL. No php-fpm instances shortages.
The issue seems to be NC code to hammer the server with full table scans again and again (maybe trying to update its cache?). This can't be fixed with database server configuration.

Yes, it's a polling issue with the database iirc. Setting that setting among other things speeds up processing time.

There is a GitHub issue a bout slow preview generations which was also caused by the polling. You're likely looking at the same issue.

Was this page helpful?
0 / 5 - 0 ratings