Horizon: High memory usage in redis

Created on 3 Dec 2019 · 17Comments · Source: laravel/horizon

Horizon Version: v3.4.3
Laravel Version: v6.6.0
PHP Version: 7.3.11
Redis Driver & Version: predis/phpredis 1.1.1 or php-redis extension 5.1.1, same result
Database Driver & Version:

Description:

After we upgraded from Laravel 5.8 to 6.6 / Horizon 3.2.2 to 3.4.3 the resource consumption of our redis 4 server started to grow exponentially, sitting at around 5Gb.

Horizon Dashboard:
horizon-dashboard

Redis instance dedicated to horizon:
redis-usage

Previous version vs new (the release was pushed Nov 25):

Screenshot_1575402597

Steps To Reproduce:

We have no clue of what is causing this behavior, but our number of jobs is almost the same as before, the only difference is the framework and horizon versions.

needs more info

Source

mauri870

👍1

Most helpful comment

I'm not understanding why #720 was closed? It seems to me the current situation with very high, possibly runaway resource utilization by Redis is a larger issue than possibly wonky pagination? Am I missing something here?

@mauri870 it's been a couple of months since your forked solution. How is it holding up and are you experiencing issues with pagination as discused in #720?

TheOneDaveYoung on 13 Feb 2020

👍3

All 17 comments

I have the same issue, the jobs when finished appear to not be removed from the redis memory, resulting in high memory usage. Running about 500k jobs per day

What are your trim values in horizon.php?

SDekkers on 4 Dec 2019

    'trim' => [
        'recent' => 30,
        'recent_failed' => 30,
        'failed' => 60,
        'monitored' => 0
    ],

I can't find a trim option for completed jobs tho

mauri870 on 4 Dec 2019

The problem seems to be that trim.recent is used to either expire a job that has been pushed(but not processed) and jobs that are already completed. Maybe it should be considered to add a new trim.completed to expire the completed jobs without loosing jobs that are still on queue.

mauri870 on 4 Dec 2019

👍3

~~For now our solution was to simply expire/delete the completed job to free up some memory:~~

Queue::after(function (JobProcessed $event) {
    Redis::expireat(config('horizon.prefix') . $event->job->getJobId(), Carbon::now()->addMinute()->timestamp);
});

mauri870 on 4 Dec 2019

🎉1

Thats our fix in a horizon fork with trim.completed of 1:

Screenshot_1575487248

diff --git a/config/horizon.php b/config/horizon.php
index b9803a8..318a945 100644
--- a/config/horizon.php
+++ b/config/horizon.php
@@ -98,6 +98,7 @@ return [
         'recent_failed' => 10080,
         'failed' => 10080,
         'monitored' => 10080,
+        'completed' => 60,
     ],

     /*
diff --git a/src/Repositories/RedisJobRepository.php b/src/Repositories/RedisJobRepository.php
index 171b040..dfb2cac 100644
--- a/src/Repositories/RedisJobRepository.php
+++ b/src/Repositories/RedisJobRepository.php
@@ -66,6 +66,7 @@ class RedisJobRepository implements JobRepository
     {
         $this->redis = $redis;
         $this->recentJobExpires = config('horizon.trim.recent', 60);
+        $this->completedJobExpires = config('horizon.trim.completed', 60);
         $this->failedJobExpires = config('horizon.trim.failed', 10080);
         $this->recentFailedJobExpires = config('horizon.trim.recent_failed', $this->failedJobExpires);
         $this->monitoredJobExpires = config('horizon.trim.monitored', 10080);
@@ -405,7 +406,7 @@ class RedisJobRepository implements JobRepository
             ? $pipe->hmset($id, ['status' => 'failed'])
             : $pipe->hmset($id, ['status' => 'completed', 'completed_at' => str_replace(',', '.', microtime(true))]);

-        $pipe->expireat($id, Chronos::now()->addMinutes($this->recentJobExpires)->getTimestamp());
+        $pipe->expireat($id, Chronos::now()->addMinutes($this->completedJobExpires)->getTimestamp());
     }

     /**

mauri870 on 4 Dec 2019

I have similar issues. I was able to get the job to expire correctly by setting horizon.trim.recent in my config. (Check the __construct() function in RedisJobRepository.php to see where it's accessed).

That said, I still have an issue of memory slowing filling up, and I think it's because keys are left behind in the horizon:recent:TAGNAME Redis keys. Right now, for example, I only have about 100 recent jobs listed, but the horizon:recent:TAGNAME keys contain over 2,000,000 entries, all referencing IDs of long-expired jobs.

travisaustin on 5 Dec 2019

Please see https://github.com/laravel/horizon/issues/625

Are you all monitoring tags?

driesvints on 5 Dec 2019

No, I’m not monitoring any tags at all.

travisaustin on 5 Dec 2019

👍3

Neither I. The problem is indeed Horizon not cleaning completed jobs until trim.recent expires. In our case, with 440k jobs every 30 minutes the completed jobs are causing redis to fill up memory quickly and also increasing cpu usage due to the number of keys. Please refer to the diff above which introduces a mechanism to control how long completed jobs are persisted.

https://github.com/laravel/horizon/issues/715#issuecomment-561796434

mauri870 on 5 Dec 2019

I think there are two issues here.

First, as reported by @mauri870, is that completed jobs are retained for 1 week by default. This is easily solved by using the undocumented configuration option of horizon.trim.recent. @mauri870 - I don't think your diff is necessary if you set the configuration item horizon.trim.recent to a low value. Is that correct?

Second is that all new job IDs are added to the key horizon:recent:TAGNAMEHERE (where TAGNAMEHERE is the name of a tag). Even if these tags are not monitored, these keys fill with the Job ID of every job that is dispatched with that tag. Horizon never cleans out this list of Job IDs, and these keys continue to fill up until they are manually cleared.

Edit: there are two places that fill up. horizon:recent:TAGNAMEHERE and horizon:failed:TAGNAMEHERE

Edit again: I just realized that the configuration option horizon.trim.recent sets the TTL on the Redis job payload when it's created. If the job isn't dispatched before the horizon.trim.recent expires, the job payload will disappear from Redis before it can be dispatched. Am I understanding that right?

travisaustin on 5 Dec 2019

👍2

@travisaustin I think you are, at least reading the source code. That's why I added trim.completed in my fork, it's working as expected now.

https://github.com/laravel/horizon/issues/715#issuecomment-561660801

mauri870 on 6 Dec 2019

A solution is proposed in https://github.com/laravel/horizon/pull/720

themsaid on 9 Dec 2019

👀1

I have the same issue, my config is

    'waits' => [
        'redis:default' => 60,
    ],

    /*
    |--------------------------------------------------------------------------
    | Job Trimming Times
    |--------------------------------------------------------------------------
    |
    | Here you can configure for how long (in minutes) you desire Horizon to
    | persist the recent and failed jobs. Typically, recent jobs are kept
    | for one hour while all failed jobs are stored for an entire week.
    |
    */

    'trim' => [
        'recent'        => 60,
        'recent_failed' => 10080,
        'failed'        => 10080,
        'monitored'     => 10080,
    ],

I'm using gdb to dump the memory of a horizon:work process and I see that many queue doesn't release from memory even it has been finished for hour. it seems an issue of JobMetrics feature.

My horizon version: v3.4.3
My Laravel version: v6.6

KevinHN on 13 Jan 2020

@mauri870 it's been a couple of months since your forked solution. How is it holding up and are you experiencing issues with pagination as discused in #720?

TheOneDaveYoung on 13 Feb 2020

👍3

@TheOneDaveYoung IDK why Taylor closed that PR, he mentioned that something was not right with pagination but after we changed to my fork the OOM problems ceased and everything seems to be working great. More than 2 months now without problems.

mauri870 on 13 Feb 2020

👍1

https://github.com/laravel/horizon/pull/720 was merged. This will unfortunately break pagination. We're currently considering separating the different types of jobs into separate screens to solve this problem.

driesvints on 14 Feb 2020

👍1

Still kind of experiencing this issue. I noticed Horizon uses zsets and hash sets. Maybe there are some sane tunings one can use to improve the memory consumption and performance since by default if you're running millions of jobs, using generic data types will have too much memory spill.