Horizon: Queue Job just sat pending until the timeout duration passed

Created on 12 Jun 2019  路  22Comments  路  Source: laravel/horizon

  • Horizon Version: 3.2.1
  • Laravel Version: 5.8.19
  • PHP Version: 7.2.17
  • Redis Driver & Version: predis/phpredis 1.1.1
  • Database Driver & Version:

Description:

So I'm not entirely sure as to what's causing this, but I have run into an issue where a queue job will stall in the queue, then after the defined timeout passes, the job is run, almost as if there is a delay configured, but I don't have one set.

config/queue.php:

    'redis' => [
        'driver' => 'redis',
        'connection' => 'default',
        'queue' => env('REDIS_QUEUE', 'default'),
        'retry_after' => 90,
        'block_for' => null,
    ],

    'redis_long' => [
        'driver' => 'redis',
        'connection' => 'default',
        'queue' => env('REDIS_QUEUE', 'default') . '_long',
        'retry_after' => 1290,
        'block_for' => null,
    ],

config/horizon.php:

    'production' => [
        'supervisor-short' => [
            'connection' => 'redis',
            'queue' => [
                'queue1',
                'queue2',
            ],
            'balance' => 'auto',
            'processes' => 15,
            'tries' => 3,
        ],
        'supervisor-long' => [
            'connection' => 'redis_long',
            'queue' => [
                'queue3',
                'queue4',
            ],
            'balance' => 'auto',
            'processes' => 10,
            'tries' => 3,
            'timeout' => 1200,
        ],
    ],

And the job is really not doing anything special but I don't think that's a part of the issue due to the symptoms.
So the issue is that a job is queued on queue3, horizon can see there is one, and that it's pending, but the job doesn't run until the timeout of 1200 is reached for some reason. In looking at the api request that horizon is making to update the status on screen, the job isn't reserved, or anything....it's just stuck in pending.

The details are:

completed_at: "1560350620.4479" // 06/12/2019 @ 2:43pm (UTC)
connection: "redis"
exception: null
failed_at: null
id: "20"
index: 0
name: "App\Jobs\ImportExcel"
payload: {type: "job", timeout: null, tags: ["import"], id: "20", data: {,鈥,鈥
attempts: 2
data: {,鈥
delay: null
displayName: "App\Jobs\ImportExcel"
id: "20"
job: "Illuminate\Queue\CallQueuedHandler@call"
maxTries: null
pushedAt: "1560349325.1486" // 06/12/2019 @ 2:22pm (UTC)
tags: ["import"]
timeout: null
timeoutAt: null
type: "job"
queue: "queue3"
reserved_at: "1560350616.0919" // 06/12/2019 @ 2:43pm (UTC)
retried_by: null
status: "completed"

UPDATE: Additionally horizon logs never shows an attempt at all which i just noticed this is marked as 2 attempts, so I'm not sure what could cause that as well. and in looking at the couple of horizon job status requests, it showed attempts: 0, then suddenly was 2 attempts but the job is configured for 1 try.

At this point I'm stumped, because I can't reproduce on demand, it only happens sometimes and at least right now far and few between, but this will become more critical for this application as it's utilized more than it is now. I was hoping that perhaps someone else has this issue and greater minds can come to the solution.

needs more info

Most helpful comment

@driesvints I can confirm with 2 other colleagues where I have tested it with, it seems like a serious problem.

tl;dr: it seems like the timeout setting you put inside the job (e.g. public $timeout = 10) is not taken in consideration by horizon. (It'll run endlessly for some reason)

I have created a repository for you to simulate the problem easily, its just a clean laravel installation.

https://github.com/Cannonb4ll/laravel-horizon-timeout-problem

Now if you visit /dispatch, it will dispatch a job that takes 10 seconds to finish, but has a timeout of 5 seconds setup in it. Problem is - it will never fail that job after the 5 seconds, it just keeps going 'silently?'.

Whenever you change the retry_after in config/queue.php to 5, it will fail properly after 5 seconds.

Would really appreciate if you can take a look into this and run it to see what we are talking about.

All 22 comments

similar issue here here.

It works fine in one server I deployed long time ago, a brand new server on Laravel Forge running PHP 7.3 is not doing anything. The job is started and terminated after each deployment, the handle never gets processed.

1gb droplet in digital ocea, php 7.3 and maria db.

Can you post the actual job? More code could help us identify the problem.

Here is the job and a little background on what it handles.
The activity passed in is a spatie activity log model, the other params are either strings or arrays so nothing in there I can see should be an issue.
Additionally, I wrote it so I would know when the job was at least attempted with the activity log update to add in the 'Attempting Import' line. It's also worth noting that this job ran for days without any modifications, and has had so far 2 blips where it didn't get handled (no additional activity log on the model or in the horizon logging saying 'Processing' to denote that it even gets attempted at all. Like I said it shows up in the horizon recent jobs as soon as it's queued....just never run until the timeout of 1200 seconds passes, but only infrequently.

<?php

namespace App\Jobs;

use Exception;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Storage;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Maatwebsite\Excel\Validators\ValidationException;
use Maatwebsite\Excel\Exceptions\NoTypeDetectedException;

class ImportExcel implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $importerClass;

    public $importFilePath;

    public $options;

    public $activity;

    public $tries = 1;

    public function __construct($activity, $importerClass, $importFilePath, $options = [])
    {
        $this->activity = $activity;
        $this->importerClass = $importerClass;
        $this->importFilePath = $importFilePath;
        $this->options = $options;
    }

    public function handle()
    {
        $this->logActivity('Attempting Import...', 'running');
        $importer = new $this->importerClass();
        Storage::put($this->importFilePath, Storage::disk('temp')->get($this->importFilePath));

        try {
            $importer->import($this->importFilePath);
            $this->logActivity('Import Completed', 'finished');
        } catch (NoTypeDetectedException $e) {
            $this->logActivity('Invalid file type', 'failed');
        } catch (ValidationException $e) {
            $this->logActivity($this->parseValidationException($e), 'failed');
        } catch (Exception $e) {
            $this->logActivity($e->getMessage(), 'failed');
        } finally {
            Storage::delete($this->importFilePath);
            Storage::disk('temp')->delete($this->importFilePath);
        }
    }

    protected function logActivity($message, $status = null)
    {
        $this->activity->refresh();
        $this->activity->properties = $this->activity->properties->put(now()->toDateTimeString(), $message);

        if ($status) {
            $this->activity->properties = $this->activity->properties->put('status', $status);
        }

        $this->activity->save();
    }

    public function tags()
    {
        return [
            'import', $this->importerClass::$name
        ];
    }

    public function failed(Exception $exception)
    {
        $this->logActivity($exception->getMessage(), 'failed');
    }

    protected function parseValidationException(ValidationException $ex)
    {
        return collect($ex->errors())
            ->flatten()
            ->prepend($ex->getMessage())
            ->toJson();
    }
}

A bit of an update here as well, had another very strange occurance:
Job was queued at 2019-06-13 16:35:56
Horizon logs show it started to process but them immediately failed the job:

Jun 13 16:57:27 app01 php[7310]: [2019-06-13 16:57:26][22] Processing: App\Jobs\ImportExcel
Jun 13 16:57:27 app01 php[7310]: [2019-06-13 16:57:26][22] Failed:     App\Jobs\ImportExcel

With the exception being:

Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\ImportExcel has been attempted too many times or run too long. The job may have previously timed out.

Anyone have any thoughts, had another very strange issue where a job got queued, got reserved 20 minutes later, shows 2 attempts but max_tries is set to 1, and now seems just stuck for another 20 minutes.

Is there any sort of issue with running supervisors on multiple machines? That's the only thing I can think of that could be causing something like this, but I also experienced the same thing during development (albeit only once) when only a single horizon instance running. Additionally, this time in checking the logs for both horizon instances, I didn't even see the normal "Processing" statement when it would normally start processing, and I have 3 jobs stuck in redis (the prefixed job id hashes) that are just sitting there. If this is truly the issue of not being able to run multiple instances (even though the UI clearly was built to handle that), that should be very clearly noted as this is incredibly unexpected.

So in further debugging and log tracing, I might have gotten lucky....
On the latest queue job, (queued at 2019-06-20 13:28:09) there was a log entry at the same time:

[2019-06-20 17:28:09] production.ERROR: Error while reading line from the server. [tcp://redis01:16379] {"exception":"[object] (Predis\\Connection\\ConnectionException(code: 0): Error while reading line from the server. [tcp://redis01:16379] at /var/www/html/project1/vendor/predis/predis/src/Connection/AbstractConnection.php:155)
[stacktrace]
#0 /var/www/html/project1/vendor/predis/predis/src/Connection/StreamConnection.php(314): Predis\\Connection\\AbstractConnection->onConnectionError('Error while rea...')
#1 /var/www/html/project1/vendor/predis/predis/src/Connection/AbstractConnection.php(120): Predis\\Connection\\StreamConnection->read()
#2 /var/www/html/project1/vendor/predis/predis/src/Connection/AbstractConnection.php(112): Predis\\Connection\\AbstractConnection->readResponse(Object(Predis\\Command\\HashSetMultiple))
#3 /var/www/html/project1/vendor/predis/predis/src/Client.php(331): Predis\\Connection\\AbstractConnection->executeCommand(Object(Predis\\Command\\HashSetMultiple))
#4 /var/www/html/project1/vendor/predis/predis/src/Client.php(314): Predis\\Client->executeCommand(Object(Predis\\Command\\HashSetMultiple))
#5 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php(114): Predis\\Client->__call('hmset', Array)
#6 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php(214): Illuminate\\Redis\\Connections\\Connection->command('hmset', Array)
#7 /var/www/html/project1/vendor/laravel/horizon/src/Repositories/RedisJobRepository.php(271): Illuminate\\Redis\\Connections\\Connection->__call('hmset', Array)
#8 /var/www/html/project1/vendor/laravel/horizon/src/Listeners/MarkJobAsReserved.php(36): Laravel\\Horizon\\Repositories\\RedisJobRepository->reserved('redis_long', 'products', Object(Laravel\\Horizon\\JobPayload))
#9 [internal function]: Laravel\\Horizon\\Listeners\\MarkJobAsReserved->handle(Object(Laravel\\Horizon\\Events\\JobReserved))
#10 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php(366): call_user_func_array(Array, Array)
#11 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php(196): Illuminate\\Events\\Dispatcher->Illuminate\\Events\\{closure}('Laravel\\\\Horizon...', Array)
#12 /var/www/html/project1/vendor/laravel/horizon/src/RedisQueue.php(155): Illuminate\\Events\\Dispatcher->dispatch('Laravel\\\\Horizon...')
#13 /var/www/html/project1/vendor/laravel/horizon/src/RedisQueue.php(94): Laravel\\Horizon\\RedisQueue->event('products', Object(Laravel\\Horizon\\Events\\JobReserved))
#14 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Support/helpers.php(1123): Laravel\\Horizon\\RedisQueue->Laravel\\Horizon\\{closure}(Object(Illuminate\\Queue\\Jobs\\RedisJob))
#15 /var/www/html/project1/vendor/laravel/horizon/src/RedisQueue.php(96): tap(Object(Illuminate\\Queue\\Jobs\\RedisJob), Object(Closure))
#16 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(247): Laravel\\Horizon\\RedisQueue->pop('products')
#17 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(107): Illuminate\\Queue\\Worker->getNextJob(Object(Laravel\\Horizon\\RedisQueue), 'products')
#18 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(102): Illuminate\\Queue\\Worker->daemon('redis_long', 'products', Object(Illuminate\\Queue\\WorkerOptions))
#19 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(86): Illuminate\\Queue\\Console\\WorkCommand->runWorker('redis_long', 'products')
#20 /var/www/html/project1/vendor/laravel/horizon/src/Console/WorkCommand.php(46): Illuminate\\Queue\\Console\\WorkCommand->handle()
#21 [internal function]: Laravel\\Horizon\\Console\\WorkCommand->handle()
#22 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(32): call_user_func_array(Array, Array)
#23 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(90): Illuminate\\Container\\BoundMethod::Illuminate\\Container\\{closure}()
#24 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(34): Illuminate\\Container\\BoundMethod::callBoundMethod(Object(Illuminate\\Foundation\\Application), Array, Object(Closure))
#25 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Container/Container.php(576): Illuminate\\Container\\BoundMethod::call(Object(Illuminate\\Foundation\\Application), Array, Array, NULL)
#26 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Console/Command.php(183): Illuminate\\Container\\Container->call(Array)
#27 /var/www/html/project1/vendor/symfony/console/Command/Command.php(255): Illuminate\\Console\\Command->execute(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))
#28 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Console/Command.php(170): Symfony\\Component\\Console\\Command\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))
#29 /var/www/html/project1/vendor/symfony/console/Application.php(921): Illuminate\\Console\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#30 /var/www/html/project1/vendor/symfony/console/Application.php(273): Symfony\\Component\\Console\\Application->doRunCommand(Object(Laravel\\Horizon\\Console\\WorkCommand), Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#31 /var/www/html/project1/vendor/symfony/console/Application.php(149): Symfony\\Component\\Console\\Application->doRun(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#32 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Console/Application.php(90): Symfony\\Component\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#33 /var/www/html/project1/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(133): Illuminate\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#34 /var/www/html/project1/artisan(37): Illuminate\\Foundation\\Console\\Kernel->handle(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#35 {main}
"}

This looks to me that the connection timed out during the attempt to mark the job reserved, so it was added to the queue reserved zset, but never updated as being reserved, so it seemingly gets stuck in a blindspot due to this. I know there is some amount of connection loss handling but it seems this is somewhere it's not being accounted for.

@ragingdave sorry man. Bit swamped myself atm so probably don't have time to deep-dive into this soon..

@driesvints I can confirm with 2 other colleagues where I have tested it with, it seems like a serious problem.

tl;dr: it seems like the timeout setting you put inside the job (e.g. public $timeout = 10) is not taken in consideration by horizon. (It'll run endlessly for some reason)

I have created a repository for you to simulate the problem easily, its just a clean laravel installation.

https://github.com/Cannonb4ll/laravel-horizon-timeout-problem

Now if you visit /dispatch, it will dispatch a job that takes 10 seconds to finish, but has a timeout of 5 seconds setup in it. Problem is - it will never fail that job after the 5 seconds, it just keeps going 'silently?'.

Whenever you change the retry_after in config/queue.php to 5, it will fail properly after 5 seconds.

Would really appreciate if you can take a look into this and run it to see what we are talking about.

@Cannonb4ll Yeah right, same issue here. We have some job that doing external change by the client. When they don't give response back the job will run some days. We must kill the processes then by manuel.

Maybe @SDekkers can help us, he have fix a scaleup problem some days ago! :-) !

Hey everyone, I can't seem to replicate the original issue reported here but I can see that the comments are discussing a completely different matter so let me clarify.

A job will only be reported as failed after the retry_after period passes, but the process is actually terminated once the timeout is reached.

@themsaid thnx for the answer.

I have some timeout of 60 seconds, and a retry_after of 90 seconds.

Check the screen and see the runtime of the process...

https://cdn.discordapp.com/attachments/491412519123157002/618046614971613184/unknown.png

@tomswinkels on my machine the horizon:work process terminates right after the timeout passes, not sure why it isn't on yours.

So to get back to the original issue I think I may have left out an important piece of information that the connection between horizon and redis was going over haproxy which if the connection timed out at just the right time, the job wouldn't get processed but would have been picked up like it had been and that then after the timeout would be treated as an attempt. The temporary fix I have been using (and I hopefully stress temporary) is just to reload horizon sooner than the haproxy timeout as it seems that horizon doesn't handle connection drops while looping at this time. This has solved the issue for me but it certainly isn't ideal and a connection drop should be something handled internally. I also can't imagine I'm the only one sending traffic to redis over haproxy so I'm surprised this hasn't come up before

In my situation i don't have HaProxy. Every day there are more jobs that have a runtime more than the timeout and retry_after. When the jobs are running too long my server get a high CPU load. When the load are high a do a restart:

php artisan queue:restart
php artisan horizon:terminate

Than the jobs are killed.

image

I think that this is only when you do a request to a database, or other external sites and they don't give a response back. I hope that i have time the next weeks to reproduce this better.

I am experiencing a similar issue where excel import processing job gets stuck and eventually times out (queue job will stall in the queue, then after the defined timeout passes, the job is run, almost as if there is a delay configured, but I don't have one set).

Here, Maatwebsite/Laravel-Excel processes the excel by creating multiple readChunk jobs to read and process excel data. When I tested this on LOCAL, multiple readChunk jobs were automatically created but when I did it on production, it didn't work.


Horizon Version: 3.2
Laravel Version: 5.7
PHP Version: 7.2
Redis Driver & Version: predis/phpredis 1.1.1

I'm still unable to replicate the original issue and can't think of an explanation to what some of you see :/

Basically @themsaid it narrows down to this;

People expect when they define a $timeout in their jobs that once the timeout is reached - the job will fail. This is not the case, the process is killed but in the UI it looks like it is still processing forever. (Until it reaches the retry_after)

However it also seems that when the timeout is reached, the process is not actually killed. I have not tested this though, @tomswinkels knows more about this.

Yes this is by design:

A job will only be reported as failed after the retry_after period passes, but the process is actually terminated once the timeout is reached.

But the process is indeed terminated, you can test it with a job that logs every second for example and has timeout of 5 seconds. You'll see the log appearing in your laravel.log file 5 times and then no more.

Before moving to a job, I was running the commands from a controller and hit an nginx timeout (504 I think). I am guessing increasing this would resolve the issues. But then again Horizon runs against redis, not nginx? Might explain why it works on one server and not another if they are configured differently.

https://divinglaravel.com/always-set-a-timeout-for-guzzle-requests-inside-a-queued-job

Exactly what my problem was, only not a Guzzle connection but a Bolt connection. (https://github.com/graphaware/neo4j-bolt-php)

@tomswinkels interested to know what was blocking the code for you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mahdiahmadi73 picture mahdiahmadi73  路  3Comments

tillkruss picture tillkruss  路  6Comments

mikeminckler picture mikeminckler  路  3Comments

elynnaie picture elynnaie  路  3Comments

meathanjay picture meathanjay  路  3Comments