Framework: Laravel 5. PDOException. QUEUE_DRIVER=database 1213 Deadlock

Created on 19 Jan 2015 · 69Comments · Source: laravel/framework

I use QUEUE_DRIVER=database
When I run more than 10 workers, I get the following error:

'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

[2015-01-19 16:45:00] production.ERROR: exception 'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:380
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(380): PDOStatement->execute(Array)
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(606): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'update `jobs` s...', Array)
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#4 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#12 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#13 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#14 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#15 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 {main}

Next exception 'Illuminate\Database\QueryException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: update `jobs` set `reserved` = 0, `reserved_at` = , `attempts` = attempts + 1 where `queue` = queuename1 and `reserved` = 1 and `reserved_at` <= 1421664300)' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:614
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#4 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#12 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#13 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#14 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#15 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 {main}

Source

easmith

Most helpful comment

Took a few days attempting debugging before coming across this post; may be worth mentioning in the documentation to only use one worker with the database queue? Would have definitely saved me a lot of time.

20TRIES on 7 Apr 2016

👍49

All 69 comments

If you're running 10 workers, than the database driver isn't right for you. I'd suggesting using a driver actually designed for queuing like beanstalk. The database driver is there for people who want a simple quick queuing setup for a low number of jobs.

GrahamCampbell on 19 Jan 2015

👎2

@GrahamCampbell I've got deadlock with just 5 workers. That's too much for db-driver too?

alexglue on 14 Sep 2015

😕1

I get deadlocks even with 3 workers running for different queues each :-/

bequadrat on 24 Sep 2015

I get deadlocks even with 3 workers running for different queues each :-/

You can only use 1 worker if you want to use the database driver. If you need anything more, you should probably check out a real solution that's designed for queuing like beanstalkd, or something that supports first on, first off, like redis. Both are fine choices for queue drivers.

GrahamCampbell on 24 Sep 2015

👎8

@GrahamCampbell Maybe you just need to put a process ID to jobs table, for example, and read its value from artisan:worker to avoid dead locking?

alexglue on 25 Sep 2015

Hey @GrahamCampbell and @taylorotwell , I am getting the same error here. Up until recently I've been using the Beanstalkd driver. ( works like a charm really ) but I would love to take advantage of this new database driver... it's so much cleaner to manage jobs in my opinion, and easier to display what's 'in the queue' to the users if you wanted to do that.

First off, I have extended the database driver to also put a 'key' on the queue jobs while pushing them in... and this has been super helpful for me. Does anyone have an idea on how we could use this in a production environment that absolutely needs multiple queue runners? Think of scaling out on this one, it has to support any number of runners that we need to keep our services running smoothly.

ryanhungate on 9 Nov 2015

20TRIES on 7 Apr 2016

👍49

Also getting deadlocks with multiple database workers. Seems strange that the documentation suggests creating 8 processes within the supervisor config - yet it's recommended to only use one (or am I missing the point here?).

bbthorntz on 11 Apr 2016

👍7

same here! Also CPU usages gets 100%.

hayesrobin on 11 Apr 2016

I ran into this as well (the documentation should be fixed). It seems that using queue:listen instead of queue:work --daemon removes the issues caused by the deadlocks. They still occur, but don't cause any problems for me as the listener automatically causes a retry. I am running at 8 queue listeners.

rabbitfang on 28 Apr 2016

You shouldn't be limited to 1 worker with database driver as @GrahamCampbell said. Don't know why he said that.

Anyways, hopefully we can figure this out. I've seen it before myself when running quite a few database workers.

taylorotwell on 17 May 2016

👍8 😄3

The issue is in the query that migrates jobs that have been reserved too long to being available. We have a bunch of workers trying to execute that query simultaneously. We could simply ignore problems with that query and assume another worker is already doing it. I'm curious if even when you get deadlocks your queues keep processing as normal?

taylorotwell on 17 May 2016

@taylorotwell Databases aren't built for this sort of application, which is why we'll get these sort of issues at scale. I think you're totally correct that the queue will still work just fine, even with these errors though. They'll just get progressively worse the more workers you add, and eventually will just totally lock up. Say if you pushed a million jobs to the queue, and had 1000 db workers,

GrahamCampbell on 17 May 2016

@GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen (backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon locked up the DB, but it might have had to do with the speed at which it was able to process tasks.

Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.

rabbitfang on 17 May 2016

I agree that there might should be some sort of “lottery” or a different way to migrate the reserved jobs that need to be made available again.

On May 17, 2016, at 1:59 PM, rabbitfang [email protected] wrote:

@GrahamCampbell https://github.com/GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen (backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon locked up the DB, but it might have had to do with the speed at which it was able to process tasks.

Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-219819320

taylorotwell on 17 May 2016

Would definitely be interested in hacking on ideas with people if anyone is interested. A few thoughts come to mind. When using the DB driver definitely set the --sleep option to something reasonable so it's not just pinging your database constantly especially if you don't have a ton of jobs.

We need to rethink how jobs are migrated entirely most likely. I would be curious to just comment out that code entirely and see if people still have problems with the DB locking up so that we can isolate that as the actual problem. Can anyone try this?

I have pushed code to 5.2 branch (dev stability) to just swallow those deadlock exceptions. If we get an exception on the job migration query we'll just assume another queue listener is handling it and move along.

taylorotwell on 17 May 2016

Here is a rought draft of what I'm thinking about changing the migration of old jobs code to:

            $this->database->beginTransaction();

            $ready = $this->database->table($this->table)
                        ->lockForUpdate()
                        ->where('queue', $this->getQueue($queue))
                        ->where('reserved', 1)
                        ->where('reserved_at', '<=', $expired)
                        ->get();

            $ids = \Illuminate\Support\Collection::make($ready)->map(function ($row) {
                return $row->id;
            })->all();

            $this->database->table($this->table)
                ->whereIn('id', $ids)
                ->update([
                    'reserved' => 0,
                    'reserved_at' => null,
                    'attempts' => new Expression('attempts + 1'),
                ]);

            $this->database->commit();

Thoughts? I think this will avoid any deadlock problems with this query, and in addition I think we should have a small lottery so we don't need to run this each time the queue. Something like maybe a 10% chance of running it on any given iteration?

taylorotwell on 17 May 2016

So far I have processed 3,500 jobs with 8 workers running at the same time and have had no deadlock issues with this logic.

taylorotwell on 17 May 2016

👍10

I've pushed my logic to the 5.2 branch if anyone wants to use "dev" stability and try it out. Would really appreciate feedback! :)

taylorotwell on 17 May 2016

Currently running just short of 400k jobs with 11 daemon workers using the database driver. 40k complete so far and only 3 exceptions thrown with the message:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from jobs where id = 30990)

20TRIES on 18 May 2016

Those all completed and only ever received 4 of the exceptions that i mentioned above.

20TRIES on 18 May 2016

Nice. Do you have the full stack trace on those exceptions? How many workers were you running?

On Wed, May 18, 2016 at 4:37 AM -0700, "Marcus" [email protected] wrote:

These all completed and only ever received 4 of the exceptions that i mentioned above.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

taylorotwell on 18 May 2016

Failed Queue Job: App\Jobs\CreateOrder SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from jobs where id = 30990)
vendor/laravel/framework/src/Illuminate/Support/Facades/Facade.php:224 Illuminate\Support\Facades\Facade::__callStatic
app/Listeners/ReportFailedQueueJob.php:61 App\Listeners\ReportFailedQueueJob::handle
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:347 Illuminate\Events\Dispatcher::Illuminate\Events{closure}
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:221 Illuminate\Events\Dispatcher::fire
app/Libraries/CustomLaravelComponents/Queue/Worker.php:100 App\Libraries\CustomLaravelComponents\QueueWorker::raiseFailedJobEvent
app/Libraries/CustomLaravelComponents/Queue/Worker.php:82 App\Libraries\CustomLaravelComponents\QueueWorker::logFailedJob
app/Libraries/CustomLaravelComponents/Queue/Worker.php:36 App\Libraries\CustomLaravelComponents\QueueWorker::process
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:155 Illuminate\QueueWorker::pop
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:111 Illuminate\QueueWorker::runNextJobForDaemon
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:85 Illuminate\QueueWorker::daemon
vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:103 Illuminate\Queue\ConsoleWorkCommand::runWorker
vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:71 Illuminate\Queue\ConsoleWorkCommand::fire
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Container/Container.php:507 Illuminate\Container\Container::call
vendor/laravel/framework/src/Illuminate/Console/Command.php:169 Illuminate\Console\Command::execute
vendor/symfony/console/Command/Command.php:256 Symfony\Component\Console\Command\Command::run
vendor/laravel/framework/src/Illuminate/Console/Command.php:155 Illuminate\Console\Command::run
vendor/symfony/console/Application.php:791 Symfony\Component\Console\Application::doRunCommand
vendor/symfony/console/Application.php:186 Symfony\Component\Console\Application::doRun
vendor/symfony/console/Application.php:117 Symfony\Component\Console\Application::run
vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:107 Illuminate\Foundation\Console\Kernel::handle
artisan:35 [main]

20TRIES on 18 May 2016

Running 11 workers

20TRIES on 18 May 2016

Interesting. So it was actually on deleting a failed job. Do you know why the jobs failed? Were you randomly causing jobs to fail just to test it?

taylorotwell on 18 May 2016

Duplicate primary key entry when inserting; genuine error in the job.

PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'

20TRIES on 18 May 2016

👍1

OK Thanks.

On May 18, 2016, at 7:57 AM, Marcus [email protected] wrote:

Duplicate primary key entry when inserting; genuine error in the job.

PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220017597

taylorotwell on 18 May 2016

No props, btw, could you take a brief look at the latest on #6368 when you get a moment; I'm not sure if you get notifications for closed issues but i think its worth a glance.

20TRIES on 18 May 2016

Oh btw, worth noting, the error in the job occurred 1276 times; so seems to be intermittent that it fails to delete the job from the queue. I've included a bugsnag screenshot that might illustrate; the remaining 1000's of errors didn't cause any locking problems.

marcus_dev_server_timeline_-_bugsnag

20TRIES on 18 May 2016

OK just pushed another commit to 5.2. Can you pull in that and see if that solves the deadlocks on the deletes?

Thanks!

On May 18, 2016, at 8:14 AM, Marcus [email protected] wrote:

Oh btw, worth noting, the error in the job occurred many more then 4 times; so seems to be intermittent. I've included a bugsnag screenshot that might illustrate.

https://cloud.githubusercontent.com/assets/5886624/15359753/acabe4fc-1d02-11e6-846c-71c8734632ce.png
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220021952

taylorotwell on 18 May 2016

Will some (or all) of the fixes be backported to 5.1?

rabbitfang on 18 May 2016

👍3

@20TRIES you have a chance to try my latest commit and see if it resolves the delete deadlock?

taylorotwell on 18 May 2016

Hey, still getting the same thing.

Is the change below the change that you were trying?

    public function deleteReserved($queue, $id)
    {
        $this->database->beginTransaction();
        if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
            $this->database->table($this->table)->where('id', $id)->delete();
        }
        $this->database->commit();
    }

Stack trace:

failed_queue_job__app_jobs_createorder_in_marcus_dev_server_-_bugsnag

20TRIES on 19 May 2016

Could the job record be locked from the last time it was updated?

Would the code below lock more then the one record?

protected function getNextAvailableJob($queue)
    {
        $this->database->beginTransaction();

        $job = $this->database->table($this->table)
                    ->lockForUpdate()
                    ->where('queue', $this->getQueue($queue))
                    ->where('reserved', 0)
                    ->where('available_at', '<=', $this->getTime())
                    ->orderBy('id', 'asc')
                    ->first();

        return $job ? (object) $job : null;
    }

I think MySql, in some situations, locks all records that are scanned; i had a similar issue when deleting records.

For example, if i deleted WHERE IN (1,5) it would lock 2,3 and 4 as well.

To get around this i had to specifically delete where id is 1 and then where id is 5 in two separate statements.

Not sure if the statement above is similar and locks more then expected; meaning that other records are locked, thus causing issues when other workers attempt to lock / perform updates on them.

http://stackoverflow.com/a/2051370/6140759

http://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html

If this is the case then you would likely have to get the id and then specifically lock that record, catching any exceptions when performing the lock. If the record is already locked when you perform the lock statement, move onto the next record...?

20TRIES on 19 May 2016

Not sure. Feel free to play with it a bit and see if you can figure something out?

taylorotwell on 19 May 2016

I was thinking something like this (i still need to check the exception code):

    protected function getNextAvailableJob($queue)
    {
        $this->database->beginTransaction();

        $job = $this->database
            ->table($this->table)
            ->where('queue', $this->getQueue($queue))
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->orderBy('id', 'RAND')
            ->first();

        if(!is_null($job)) {
            try {
                $job = $this->database
                    ->table($this->table)
                    ->lockForUpdate()
                    ->where('id', $job->id)
                    ->get();
            } catch (\PDOException $ex) {
                if($ex->getCode() != '1213') {
                    throw $ex;
                }
                return null;
            }
        }
        return $job;
    }

Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...

Will give it a test tomorrow.

20TRIES on 19 May 2016

Sounds like form my reading around online it’s really not going to be feasible to avoid deadlocks in some way when using a relational database as a queue. It’s just not really intended for that purpose. If you’re only running a couple workers it’s probably fine but too many workers is going to start locking up.

On May 19, 2016, at 11:43 AM, Marcus [email protected] wrote:

I was thinking something like this (i still need to check the exception code):
protected function getNextAvailableJob($queue)
{
    $this->database->beginTransaction();

    $job = $this->database
        ->table($this->table)
        ->where('queue', $this->getQueue($queue))
        ->where('reserved', 0)
        ->where('available_at', '<=', $this->getTime())
        ->orderBy('id', 'RAND')
        ->first();

    if(!is_null($job)) {
        try {
            $job = $this->database
                ->table($this->table)
                ->lockForUpdate()
                ->where('id', $job->id)
                ->get();
        } catch (\PDOException $ex) {
            if($ex->getCode() != '1213') {
                throw $ex;
            }
            return null;
        }
    }
    return $job;
}
Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...

Will give it a test tomorrow.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220382963

taylorotwell on 20 May 2016

Hmm possibly, i have a couple more things i want to try first before accepting that though :P

My theory above turned out to be wrong; your previous query to lock the next available job for update works fine and doesn't lock multiple records.

My next theory is:

When you lock the next available job, this doesn't stop another worker from seeing the job as available; it merely makes them wait to lock it themselves. Its not until the second query (an update) is made that the job will been seen as reserved. However slight, there should be time between these two queries for a second worker to attempt to lock the job and thus be left waiting for 50secs until the lock timeout error is thrown. The fact that this window is very short would also explain why i only get a couple of the exceptions when running lots of jobs.

So i propose the following:

protected function getNextAvailableJob($queue)
    {
        $job = $this->database
            ->table($this->table)
            ->where('queue', $this->getQueue($queue))
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->orderBy('id', 'asc')
            ->first();

        return is_null($job) || $this->markJobAsReserved($job->id) ? $job : $this->getNextAvailableJob($queue);
    }

 protected function markJobAsReserved($id)
    {
        $updates = $this->database
            ->table($this->table)
            ->where('id', $id)
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->update([
                'reserved' => 1,
                'reserved_at' => $this->getTime(),
            ]);

        return $updates > 0;
    }

The above functions do not lock records, but instead gracefully handle the fact that another worker beat them to reserving the record by going and getting the next available job and starting again.

This could however cause race conditions where a worker is constantly blocked by other workers that are beating them to the punch, so to speak. In this case, i guess the best way forward would be to then randomise the order that jobs are fetched from the queue; this would drastically reduce the likelihood of a worker being blocked many times.

I can't test this out until Monday though as I'm on 4G and don't have the necessary data allowance :P

Will get back to you Monday with an update; let me know what you think :)

20TRIES on 21 May 2016

Seems to be running well with those changes and a couple of other small bits. Will make a pull request tomorrow; changes are a bit big for pasting into a comment.

20TRIES on 23 May 2016

Thank you everyone! :heart:

GrahamCampbell on 17 Jun 2016

Great changes!

But what are the thoughts for using both _reserved_at_ and _available_at_ columns to find next available job? Also the _reserved_ column seems to not be used at all?

This is the current SQL:

select * from``jobs``where``queue``= ? and ((``reserved_at``is null and``available_at``<= ?) or (``reserved_at``<= ?)) order by``id``asc limit 1 for update

If we modify the usage of the _available_at_ column a little bit we could end up with:

select * from``jobs``where``queue``= ? and``available_at``<= ? order by``id``asc limit 1 for update

The idea is to increase the _available_at_ timestamp every time the row is reserved. It also gives the posibility to implement a touch function which increases the timestamp while reserved, giving the script more time to process the job.

If the index is changed to ['queue', 'available_at'] the MySQL explain execution of the SQL gives "Using index condition; Using filesort" and a 100% filtered instead of "Using index condition; Using where; Using filesort" with less filtered.

The _reserved_ column can be removed entirely.

I haven't done any further tests or benchmark. I just wanted to share my thoughts. :)

Edit:
I'm not sure how GitHub notifications works for closed issues. Sorry for pinging you directly if you have already seen this comment. @GrahamCampbell @taylorotwell

mpskovvang on 3 Nov 2016

As @rabbitfang wondered could the solution (if works) be backported in 5.1?

ipa1981 on 21 Nov 2016

👍3

I have 7 named queuees with work daemons (1 worker per queuee) running on database driver with supervisor. Laravel version 5.2.45.

Params: --delay=1 --sleep=3 --tries=3 (Btw does that delay param even apply at all, as delay between jobs...?)

Just crunched about 2500 jobs quickly and have got about 110 deadlock errors. This particular bunch of jobs was so that about 2400 jobs was on a single queue and about 100 was only an another queue, both running fast on parallel.

[2016-12-04 01:06:28] production.ERROR: PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php:335
Stack trace:
#0 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(335): PDOStatement->execute(Array)
#1 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(722): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'select * from `...', Array)
#2 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(685): Illuminate\Database\Connection->runQueryCallback('select * from `...', Array, Object(Closure))
#3 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(349): Illuminate\Database\Connection->run('select * from `...', Array, Object(Closure))
#4 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1610): Illuminate\Database\Connection->select('select * from `...', Array, false)
#5 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1596): Illuminate\Database\Query\Builder->runSelect()
#6 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1577): Illuminate\Database\Query\Builder->get(Array)
#7 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(193): Illuminate\Database\Query\Builder->first()
#8 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(164): Illuminate\Queue\DatabaseQueue->getNextAvailableJob('queuename3')
#9 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(184): Illuminate\Queue\DatabaseQueue->pop('queuename3')
#10 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(149): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename3')
#11 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(111): Illuminate\Queue\Worker->pop('database', 'queuename3', '5', '3', '3')
#12 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon('database', 'queuename3', '5', '3', '3')
#13 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(119): Illuminate\Queue\Worker->daemon('database', 'queuename3', '5', 128, '3', '3')
#14 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(78): Illuminate\Queue\Console\WorkCommand->runWorker('database', 'queuename3', '5', 128, true)
#15 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#16 /home/project/bootstrap/cache/compiled.php(1257): call_user_func_array(Array, Array)
#17 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(169): Illuminate\Container\Container->call(Array)
#18 /home/project/vendor/symfony/console/Command/Command.php(256): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(155): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/project/vendor/symfony/console/Application.php(794): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/project/vendor/symfony/console/Application.php(186): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/project/vendor/symfony/console/Application.php(117): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/project/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(107): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 /home/project/artisan(35): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#25 {main}

ux-engineer on 4 Dec 2016

Don't think this was ever fully resolved.

Need to combine the query to get the next job and the query to reserve that job, however, solution seems to be platform specific.

See #14231. This makes the workers "block" to avoid the exceptions, but there would be some wasted process time every time a worker blocks and these blocks would still increase with more then 10-15 workers per queue.

I switched to SQS queue driver in order to avoid this.

If nothing else it might be worth checking whether you get any lock issues when running the jobs on a different driver like sqs; might be that your jobs contain queries which cause some of the locks? Might not be the case but definitely worth checking as the number of lock errors is quite high and if it is the case you might find you can switch back to the database driver once the jobs have been optimised.

20TRIES on 4 Dec 2016

I just faced it with Laravel 5.3.26 having 8 workers and database driver.
We are switching to Redis to remove even a possibility of such thing on our production server.

I am keeping an eye on this thread to know if this problem was\will be resolved for database driver.

lezhnev74 on 10 Jan 2017

👍1

I've ran into this as well, testing on a local machine with HTTP requests being handled by the queues and this is locking the database up completely. The threads get killed by supervisor.d but they just restart and get locked again, it seems that the UPDATE's aren't locking up the database though, I can't put my finger on what specifically is locking it up but something is causing the UPDATE requests to hang and then get killed by supervisor and start again, same issue every time.

Edit: Please please please make the documentation reflect this oddity, it's obviously quite prevalent and it's very frustrating running into this when testing...

williamjulianvicary on 4 Feb 2017

The database driver should not be used in a heavy production type environment. It's not the right tool for the job. Heavily suggest any of the other drivers. The database driver is primarily useful in local development environments or in small production environments where you just have 1-2 workers.

taylorotwell on 5 Feb 2017

Whilst I appreciate that now, I think myself and many of the other commenters here weren't aware that this would be such an issue, especially not at low volume (3 workers). It would be great if the documentation should note that the DB driver isn't good for queuing at any kind of volume.

I've now switched to Redis, which isn't as nice for testing it'll be better than sticking with the MySQL workers and falling into these kind of issues.

williamjulianvicary on 6 Feb 2017

👍15

Still have this in Laravel 5.4 with 8 workers and database as the driver. Moving away to redis to avoid this. It would definitely help if the documentation has a mention of this.

iamgoodbytes on 10 Sep 2017

Didn't know that a micro instance on AWS with 4 workers is considered a high volume. Docs should mention that DB is designed for only 1 worker.

poma on 14 Sep 2017

@taylorotwell Any chance we could get the docs update (and/or force only a single DB worker) to avoid these issues? I've had discussions with a lot of users who have been using the DB driver with these issues.

williamjulianvicary on 1 Nov 2017

I've came across this issue in 5.4 but sadly I don't have the luxury of moving away from the DB driver.

I'm getting a deadlock on the delete query but I don't understand why. By logging to file at the start and end of job processing, I can see that only one queue worker is picking up the job . Using show engine innodb status I can see that there is a lock on the row (as expected) and not on the table (also as expected) so I'm at a loss as to why the deadlock is occurring as there doesn't appear to be anything conflicting with the delete statement.

When running queue:work its been very easy to reproduce an issue where the deadlock occurs then the queue hangs on 3 of the 4 workers. The other worker continues processing the queue, but when cancelled 1 of the 3 others becomes active and carries on processing. Likewise, when you cancel the new active worker, another inactive worker becomes active.

I've been able to work around this issue by adding retries to the deleteReserved() function:

    public function deleteReserved($queue, $id)
    {
        $this->database->transaction(function() use ($queue, $id) {
            if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
                $this->database->table($this->table)->where('id', $id)->delete();
            }
        }, 5);
    }

I was able to process 200k jobs across 6 workers in 1 hour without any problems, whereas previously i'd receive lock issues with 4 workers and that would make the queues hang after only a couple of seconds.

@taylorotwell do you think its worth me putting this into a PR?

micmania1 on 9 Nov 2017

👍5

I'm having the same issue.

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction in /home/forge/default/releases/20171113162416/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOConnection.php:77

I do run on a cluster with 3 Mysql servers if that has anything to do with it.

Dan

daniel-farina on 13 Nov 2017

@daniel-farina does the workaround I posted solve your problem?

micmania1 on 13 Nov 2017

@micmania1 I tried your workaround and it seems to work, thanks!
Edit: not entirely, see below...

ph4r05 on 27 Nov 2017

@micmania1 Do you did the changes in the vendor folder?

vingrad on 6 Dec 2017

@micmania1 if that works I would love a pull request to fix deadlock issues. I have to process 23K jobs.

victorboissiere on 8 Dec 2017

Hi guys,
I've got another deadlock on the system recently even with this new fix.

After running innotop to inspect the deadlock I got this:

_____________________________________________________________________________________________ Deadlock Transactions _____________________________________________________________________________________________
ID  Timestring           User      Host       Victim  Time   Undo  LStrcts  Query Text                                                                                                                           
 4  2017-12-09 09:54:02  keychest  localhost  Yes     00:00     0        3  select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` as
 5  2017-12-09 09:54:02  keychest  localhost  No      00:00     1        3  delete from `jobs` where `id` = ?                                                                                                    

______________________________________ Deadlock Locks _______________________________________
ID  Waiting  Mode  DB        Table  Index                         Special          Ins Intent
 4        1  X     keychest  jobs   PRIMARY                       rec but not gap           0
 5        0  X     keychest  jobs   PRIMARY                       rec but not gap           0
 5        1  X     keychest  jobs   jobs_queue_reserved_at_index  rec but not gap           0

So apparently the delete transaction got the lock on the PRIMARY index and later tried to acquire another index jobs_queue_reserved_at_index. Maybe due to lockForUpdate() on the previous query.

I am wondering why the select itself is needed there. So I changed this:

if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
    $this->database->table($this->table)->where('id', $id)->delete();
}

To this:

$this->database->table($this->table)->where('id', $id)->delete();

And testing it on the dev system.

UPDATE: I am getting different kind of deadlock now:

______________________________________________________________________________________________________ Deadlock Transactions ______________________________________________________________________________________________________
ID    Timestring           User      Host       Victim  Time   Undo  LStrcts  Query Text                                                                                                                                           
8767  2017-12-11 18:08:03  keychest  localhost  No      00:00     1        5  update `jobs` set `reserved_at` = ?, `attempts` = ? where `id` = ?                                                                                   
8768  2017-12-11 18:08:03  keychest  localhost  Yes     00:00     0        4  select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` asc limit 1 for up

_______________________________________ Deadlock Locks ________________________________________
ID    Waiting  Mode  DB        Table  Index                         Special          Ins Intent
8767        0  X     keychest  jobs   PRIMARY                                                 0
8767        1  X     keychest  jobs   jobs_queue_reserved_at_index  gap before rec            1
8768        1  X     keychest  jobs   PRIMARY                       rec but not gap           0

So two concurrent job selections are locked. One tries to update the selected job while the another one blocks it.

DatabaseQueue.php uses manual transaction management with (beginTransaction(), commit()). I refactored DatabaseQueue::pop so it uses closure like transactions with retry:
$this->database->transaction(closure). It does correct rollback() when exception ocurrs (e.g. database detects deadlock)

Deadlock still happens obviously but workers are able to recover.

ph4r05 on 11 Dec 2017

For those following this issue; @ph4r05 is a fantastic provider of debugging information. This is the first time someone has checked where the deadlock actually occurs. In his case it's a gap lock on the jobs_queue_reserved_at_index index. This index was removed in the default 5.5 jobs table in https://github.com/laravel/framework/commit/acd933fd6b793b0c79a2ba4eb462892cd6c460e3

So, if anyone still have deadlock issues, check your show engine innodb status, check it is a gap lock on the index, and if so, try removing it.

I should probably say something like "make sure you know your database before randomly removing indexes" or "take backups before drinking" or something. This is an internet forum, I'm talking about changes in important system tables. You know the deal. ;)

sisve on 12 Dec 2017

👍2

Thanks to the @sisve for discussion. I didn't notice the index was removed as I started the project on older Laravel and database migrations did not remove the index during Laravel upgrade (as far as I am aware) - may be the same case for others in this discussion. After removing the index there are no more deadlocks.

@taylorotwell @20TRIES on a discussion about lock contention in DB workers - for the research and testing purposes I reimplemented pessimistic locking of the database queue to optimistic locking. Take a look at my commit if interested: https://github.com/ph4r05/framework/commit/60fcddd9057af5f534fe19eed6336d51bf2c9257

There are no explicit transactions involved and no DB level locking. DB workers compete for jobs on their version columns, using affected rows information from update call.

I don't think this is a production ready code (don't use without proper testing) and some benchmarking for pessimistic vs optimistic locking would be nice (I can get to that in a few days).

I can submit PR later (once we have benchmarks) if you deem it interesting.

ph4r05 on 12 Dec 2017

This is only a theory, but this is what I think is causing the deadlock on delete.

    Queue Worker 1               Queue Worker 2

1    Select (lock)

2    Update

3    Process Job                  Select (lock)

4    Delete Job (locked - fail)   Update

5    Delete Job (success)         Process Job

6                                 Delete Job (success - 0 rows affected)

This would mean that a job could get processed twice. I am running this with the assumption of a transaction retry (ie. the workaround I provided).

Whilst the delete retry solved my problem locally, when we began testing in what will be our prod environment we ran into further issues. We're using a galera cluster with master-master db setup. Locking doesn't work the same way and the default queue simply can't be ran concurrently with that setup.

I've solved this in our project by separating the queue into multiple streams where each worker is responsible for a single stream. When a job is saved it will pick one of the available streams at random. You could potentially do this by using different queues too (one per environment) but I never tried that. This introduces other potential issues (mostly operational), but essentially avoids locking/conflicts completely which is our biggest concern and provided good speed and some scalability.

If you do have the luxury of moving to a different queue system, you should do that.

micmania1 on 13 Dec 2017

There was a bug in the database queue transaction handling (no rollback) which caused the freezing of the workers. Take a look at the pull request #22394 (maybe check if you have the jobs_queue_reserved_at_index index).

The second worker should not pick the job as it is being reserved. If job is being processed twice reconsider tuning retryAfter parameter.

I am currently using the queue with PR merged and it works perfectly for me.

ph4r05 on 13 Dec 2017

@micmania1 Use show engine innodb status to debug your deadlock / lock timeout issues. Your theory does not contain the scope of the transactions, and that is important information. There is a SELECT+UPDATE and a SELECT+DELETE that have their own transactions. Activities outside transactions are probably not causing locking issues. (There's technically implicit transactions with auto-commit, so locks held by others are still respected, but the transactions own locks are released after every statement.)

sisve on 14 Dec 2017

@sisve @micmania1 I have a new example of deadlocks on the DatabaseQueue.

With #22394 and #22433 merged the DatabaseQueue workers will recover from the deadlock (no freeze) but some jobs will get processed more than once. Here is why and how:

I created a benchmark project for Laravel queueing for testing of various job processing modifications. The project is fresh Laravel 5.5 (created 1 day ago with composer). I generate a synthetic load and measure correctness and performance. The scenario:

10 workers --queue=high,default,low --sleep=1 --tries=3
pre-generate 10000 empty jobs, so we have high concurrency. Queue default, available_at set to the future.
after the whole batch is ready in the database, UPDATE jobs SET available_at=0; which causes job processing by workers.
innodb_deadlock_detect=1, MySQL 5.7.20, InnoDB

CREATE TABLE `jobs` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `queue` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `payload` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `attempts` tinyint(3) unsigned NOT NULL,
  `reserved_at` int(10) unsigned DEFAULT NULL,
  `available_at` int(10) unsigned NOT NULL,
  `created_at` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `jobs_queue_index` (`queue`)
) ENGINE=InnoDB AUTO_INCREMENT=353855 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Deadlock:

LATEST DETECTED DEADLOCK
------------------------
2017-12-15 02:57:39 0x7efd583f3700
*** (1) TRANSACTION:
TRANSACTION 3183282, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 8 row lock(s)
MySQL thread id 1426, OS thread handle 139626575173376, query id 15379859 localhost 127.0.0.1 laravel Sending data
select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` asc limit 1 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 27 page no 3630 n bits 104 index PRIMARY of table `laravel`.`jobs` trx id 3183282 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 9; compact format; info bits 32
 0: len 8; hex 00000000000575bb; asc       u ;;

*** (2) TRANSACTION:
TRANSACTION 3183286, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 1437, OS thread handle 139626572363520, query id 15379876 localhost 127.0.0.1 laravel updating
delete from `jobs` where `id` = ?
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 27 page no 3630 n bits 104 index PRIMARY of table `laravel`.`jobs` trx id 3183286 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 9; compact format; info bits 32
 0: len 8; hex 00000000000575bb; asc       u ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 27 page no 4 n bits 824 index jobs_queue_index of table `laravel`.`jobs` trx id 3183286 lock_mode X locks rec but not gap waiting
Record lock, heap no 719 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 7; hex 64656661756c74; asc default;;
 1: len 8; hex 00000000000575bb; asc       u ;;

*** WE ROLL BACK TRANSACTION (2)

Both transactions are locking the same record. The problem is delete transaction is rolled back. The exception propagates to the worker which continues to the next task. The culprit is an index, this time jobs_queue_index I guess.

The job was already processed but not removed from the database so it will be picked up after it expires. This can happen multiple times.

It would be preferable the transaction 1 is rolled back (select) instead of the delete but I am not aware this is possible to set.

The funny thing is the following hack solves the problem:

public function deleteReserved($queue, $id)
{
    $this->database->transaction(function () use ($queue, $id) {
        $this->database->table($this->table)->where('id', $id)->delete();
    }, 5);
}

As the worker really insists to delete that job. With magic constant 5 I had no deadlock anymore.
There could be implemented some kind of back-off mechanism to try to remove job several times and eventually fail - this could significantly decrease error rate. But it is not a clean approach. (200 jobs per second).

Another way out is adding a new flag field delete_mark - soft delete indicator / 2-stage delete:

public function deleteReserved($queue, $id)
{
    $this->database->transaction(function () use ($queue, $id) {
        $this->database->table($this->table)->where('id', $id)->update(['delete_mark' => 1]);
    });

    $this->database->table($this->table)->where('id', $id)->delete();
}

If worker picks a job with delete_mark > 0 it deletes/ignores it so the job will be processed only once. The update will succeed as it does not modify the index (delete modifies the index). I will test this approach more and if it works correctly I will submit a PR (170 jobs per second).

Another solution is optimistic locking which I am testing right now - no deadlocks. I will publish a blog with benchmarks soon but the performance is not very shiny with the first implementation (slower than pessimistic, 100 jobs per second).

Another solution is removing the index altogether (slow, 90 jobs per second).

If innodb_deadlock_detect=0 then the only options are a) optimistic locking b) removing the index, it seems.

Just for comparison, beanstalkd queue with the same parameters gives 1080 jobs per second.

ph4r05 on 15 Dec 2017

I'm looking at the innodb status you've provided, but I do not see the actual culprit.

Transaction 1 is attempting to fetch a new job to process, and does this by acquiring an exclusive record lock on primary key. This is normal.
Transaction 2 is trying to delete a job that has been processed and has already acquired the exclusive lock on the primary key that the first transaction is waiting for. It is now blocked waiting for an exclusive lock for the record in jobs_queue_index.

Your status does not tell us who has already acquired a conflicting lock on jobs_queue_index.

These locks are not gap locks, so it's not the same problem as with the jobs_queue_reserved_at_index. Do you perhaps have a open connection from your desktop where you've accidentally started a long-running transaction and caused some locks?

sisve on 15 Dec 2017

@sisve thanks for your time! If you want to take a look at this complete report: https://github.com/ph4r05/laravel-queueing-benchmark/blob/master/.github/deadlock07.txt

I think the index is held by select queries.
I've disabled deadlock detection and increased timeouts so its easier to debug

set global innodb_deadlock_detect=0;
set global innodb_lock_wait_timeout=1000;
set innodb_lock_wait_timeout=1000;

ph4r05 on 15 Dec 2017

We have two different transactions going on.

Fetch a new job (DatabaseQueue::pop)

SELECT * FROM jobs WHERE queue = ? AND (...) ORDER BY id ASC LIMIT 1 FOR UPDATE
UPDATE jobs SET reserved_at = ? WHERE id = ?
Delete a job after it is processed (DatabaseQueue::deleteReserved)

SELECT * FROM jobs WHERE id = ? FOR UPDATE
DELETE FROM jobs WHERE id = ?

_I'll be guessing more than usual from here on._

It seems like the select that fetches the jobs issues locks on both the jobs_queue_index and the primary key (guessing that it does it in that order), while the select that exists in deleteReserved only locks the primary key. This allows the fetch-select to grab half (the index) of the locks they require and block waiting for the primary key lock, while the deletion transaction has grabbed the primary key lock and is now blocking on the index lock.

So, while I am still guessing wildly, can we trick the database to issue index locks? Can we modify the locking in deleteReserved so it's based on SELECT * FROM jobs WHERE queue = ? AND id = ? FOR UPDATE which would hopefully issue identical locks as the fetch-select.

All this is extremely dependent on the table structure. Any other indexes would probably also add the the chaos and madness. Mmm madness...

sisve on 15 Dec 2017

❤1

I made some benchmarks on Amazon c4.large of the Laravel queueing methods (redis, beanstalkd, database):

https://ph4r05.deadcode.me/blog/2017/12/16/laravel-queueing-benchmark.html

I also implemented optimistic locking strategy for database driver (no deadlocks), blogpost + repo:

ph4r05 on 23 Dec 2017

👍7

Hey guys,

I've managed to test running 500,000 jobs by 15 workers without any deadlocks with the PR (https://github.com/laravel/framework/pull/31287). thoughts please :)