I use QUEUE_DRIVER=database
When I run more than 10 workers, I get the following error:
'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
[2015-01-19 16:45:00] production.ERROR: exception 'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:380
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(380): PDOStatement->execute(Array)
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(606): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'update `jobs` s...', Array)
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#4 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#12 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#13 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#14 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#15 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 {main}
Next exception 'Illuminate\Database\QueryException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: update `jobs` set `reserved` = 0, `reserved_at` = , `attempts` = attempts + 1 where `queue` = queuename1 and `reserved` = 1 and `reserved_at` <= 1421664300)' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:614
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#4 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#12 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#13 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#14 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#15 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 {main}
If you're running 10 workers, than the database driver isn't right for you. I'd suggesting using a driver actually designed for queuing like beanstalk. The database driver is there for people who want a simple quick queuing setup for a low number of jobs.
@GrahamCampbell I've got deadlock with just 5 workers. That's too much for db-driver too?
I get deadlocks even with 3 workers running for different queues each :-/
I get deadlocks even with 3 workers running for different queues each :-/
You can only use 1 worker if you want to use the database driver. If you need anything more, you should probably check out a real solution that's designed for queuing like beanstalkd, or something that supports first on, first off, like redis. Both are fine choices for queue drivers.
@GrahamCampbell Maybe you just need to put a process ID to jobs table, for example, and read its value from artisan:worker to avoid dead locking?
Hey @GrahamCampbell and @taylorotwell , I am getting the same error here. Up until recently I've been using the Beanstalkd driver. ( works like a charm really ) but I would love to take advantage of this new database driver... it's so much cleaner to manage jobs in my opinion, and easier to display what's 'in the queue' to the users if you wanted to do that.
First off, I have extended the database driver to also put a 'key' on the queue jobs while pushing them in... and this has been super helpful for me. Does anyone have an idea on how we could use this in a production environment that absolutely needs multiple queue runners? Think of scaling out on this one, it has to support any number of runners that we need to keep our services running smoothly.
Took a few days attempting debugging before coming across this post; may be worth mentioning in the documentation to only use one worker with the database queue? Would have definitely saved me a lot of time.
Also getting deadlocks with multiple database workers. Seems strange that the documentation suggests creating 8 processes within the supervisor config - yet it's recommended to only use one (or am I missing the point here?).
same here! Also CPU usages gets 100%.
I ran into this as well (the documentation should be fixed). It seems that using queue:listen
instead of queue:work --daemon
removes the issues caused by the deadlocks. They still occur, but don't cause any problems for me as the listener automatically causes a retry. I am running at 8 queue listeners.
You shouldn't be limited to 1 worker with database driver as @GrahamCampbell said. Don't know why he said that.
Anyways, hopefully we can figure this out. I've seen it before myself when running quite a few database workers.
The issue is in the query that migrates jobs that have been reserved too long to being available. We have a bunch of workers trying to execute that query simultaneously. We could simply ignore problems with that query and assume another worker is already doing it. I'm curious if even when you get deadlocks your queues keep processing as normal?
@taylorotwell Databases aren't built for this sort of application, which is why we'll get these sort of issues at scale. I think you're totally correct that the queue will still work just fine, even with these errors though. They'll just get progressively worse the more workers you add, and eventually will just totally lock up. Say if you pushed a million jobs to the queue, and had 1000 db workers,
@GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen
(backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon
, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon
locked up the DB, but it might have had to do with the speed at which it was able to process tasks.
Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.
I agree that there might should be some sort of “lottery” or a different way to migrate the reserved jobs that need to be made available again.
On May 17, 2016, at 1:59 PM, rabbitfang [email protected] wrote:
@GrahamCampbell https://github.com/GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen (backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon locked up the DB, but it might have had to do with the speed at which it was able to process tasks.
Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-219819320
Would definitely be interested in hacking on ideas with people if anyone is interested. A few thoughts come to mind. When using the DB driver definitely set the --sleep
option to something reasonable so it's not just pinging your database constantly especially if you don't have a ton of jobs.
We need to rethink how jobs are migrated entirely most likely. I would be curious to just comment out that code entirely and see if people still have problems with the DB locking up so that we can isolate that as the actual problem. Can anyone try this?
I have pushed code to 5.2 branch (dev stability) to just swallow those deadlock exceptions. If we get an exception on the job migration query we'll just assume another queue listener is handling it and move along.
Here is a rought draft of what I'm thinking about changing the migration of old jobs code to:
$this->database->beginTransaction();
$ready = $this->database->table($this->table)
->lockForUpdate()
->where('queue', $this->getQueue($queue))
->where('reserved', 1)
->where('reserved_at', '<=', $expired)
->get();
$ids = \Illuminate\Support\Collection::make($ready)->map(function ($row) {
return $row->id;
})->all();
$this->database->table($this->table)
->whereIn('id', $ids)
->update([
'reserved' => 0,
'reserved_at' => null,
'attempts' => new Expression('attempts + 1'),
]);
$this->database->commit();
Thoughts? I think this will avoid any deadlock problems with this query, and in addition I think we should have a small lottery so we don't need to run this each time the queue. Something like maybe a 10% chance of running it on any given iteration?
So far I have processed 3,500 jobs with 8 workers running at the same time and have had no deadlock issues with this logic.
I've pushed my logic to the 5.2 branch if anyone wants to use "dev" stability and try it out. Would really appreciate feedback! :)
Currently running just short of 400k jobs with 11 daemon workers using the database driver. 40k complete so far and only 3 exceptions thrown with the message:
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from
jobs
whereid
= 30990)
Those all completed and only ever received 4 of the exceptions that i mentioned above.
Nice. Do you have the full stack trace on those exceptions? How many workers were you running?
On Wed, May 18, 2016 at 4:37 AM -0700, "Marcus" [email protected] wrote:
These all completed and only ever received 4 of the exceptions that i mentioned above.
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
Failed Queue Job: App\Jobs\CreateOrder SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from
jobs
whereid
= 30990)
vendor/laravel/framework/src/Illuminate/Support/Facades/Facade.php:224 Illuminate\Support\Facades\Facade::__callStatic
app/Listeners/ReportFailedQueueJob.php:61 App\Listeners\ReportFailedQueueJob::handle
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:347 Illuminate\Events\Dispatcher::Illuminate\Events{closure}
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:221 Illuminate\Events\Dispatcher::fire
app/Libraries/CustomLaravelComponents/Queue/Worker.php:100 App\Libraries\CustomLaravelComponents\QueueWorker::raiseFailedJobEvent
app/Libraries/CustomLaravelComponents/Queue/Worker.php:82 App\Libraries\CustomLaravelComponents\QueueWorker::logFailedJob
app/Libraries/CustomLaravelComponents/Queue/Worker.php:36 App\Libraries\CustomLaravelComponents\QueueWorker::process
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:155 Illuminate\QueueWorker::pop
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:111 Illuminate\QueueWorker::runNextJobForDaemon
vendor/laravel/framework/src/Illuminate/Queue/Worker.php:85 Illuminate\QueueWorker::daemon
vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:103 Illuminate\Queue\ConsoleWorkCommand::runWorker
vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:71 Illuminate\Queue\ConsoleWorkCommand::fire
[internal] call_user_func_array
vendor/laravel/framework/src/Illuminate/Container/Container.php:507 Illuminate\Container\Container::call
vendor/laravel/framework/src/Illuminate/Console/Command.php:169 Illuminate\Console\Command::execute
vendor/symfony/console/Command/Command.php:256 Symfony\Component\Console\Command\Command::run
vendor/laravel/framework/src/Illuminate/Console/Command.php:155 Illuminate\Console\Command::run
vendor/symfony/console/Application.php:791 Symfony\Component\Console\Application::doRunCommand
vendor/symfony/console/Application.php:186 Symfony\Component\Console\Application::doRun
vendor/symfony/console/Application.php:117 Symfony\Component\Console\Application::run
vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:107 Illuminate\Foundation\Console\Kernel::handle
artisan:35 [main]
Running 11 workers
Interesting. So it was actually on deleting a failed job. Do you know why the jobs failed? Were you randomly causing jobs to fail just to test it?
Duplicate primary key entry when inserting; genuine error in the job.
PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'
OK Thanks.
On May 18, 2016, at 7:57 AM, Marcus [email protected] wrote:
Duplicate primary key entry when inserting; genuine error in the job.
PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220017597
No props, btw, could you take a brief look at the latest on #6368 when you get a moment; I'm not sure if you get notifications for closed issues but i think its worth a glance.
Oh btw, worth noting, the error in the job occurred 1276 times; so seems to be intermittent that it fails to delete the job from the queue. I've included a bugsnag screenshot that might illustrate; the remaining 1000's of errors didn't cause any locking problems.
OK just pushed another commit to 5.2. Can you pull in that and see if that solves the deadlocks on the deletes?
Thanks!
On May 18, 2016, at 8:14 AM, Marcus [email protected] wrote:
Oh btw, worth noting, the error in the job occurred many more then 4 times; so seems to be intermittent. I've included a bugsnag screenshot that might illustrate.
https://cloud.githubusercontent.com/assets/5886624/15359753/acabe4fc-1d02-11e6-846c-71c8734632ce.png
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220021952
Will some (or all) of the fixes be backported to 5.1?
@20TRIES you have a chance to try my latest commit and see if it resolves the delete
deadlock?
Hey, still getting the same thing.
Is the change below the change that you were trying?
public function deleteReserved($queue, $id)
{
$this->database->beginTransaction();
if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
$this->database->table($this->table)->where('id', $id)->delete();
}
$this->database->commit();
}
Stack trace:
Could the job record be locked from the last time it was updated?
Would the code below lock more then the one record?
protected function getNextAvailableJob($queue)
{
$this->database->beginTransaction();
$job = $this->database->table($this->table)
->lockForUpdate()
->where('queue', $this->getQueue($queue))
->where('reserved', 0)
->where('available_at', '<=', $this->getTime())
->orderBy('id', 'asc')
->first();
return $job ? (object) $job : null;
}
I think MySql, in some situations, locks all records that are scanned; i had a similar issue when deleting records.
For example, if i deleted WHERE IN (1,5) it would lock 2,3 and 4 as well.
To get around this i had to specifically delete where id is 1 and then where id is 5 in two separate statements.
Not sure if the statement above is similar and locks more then expected; meaning that other records are locked, thus causing issues when other workers attempt to lock / perform updates on them.
http://stackoverflow.com/a/2051370/6140759
http://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html
If this is the case then you would likely have to get the id and then specifically lock that record, catching any exceptions when performing the lock. If the record is already locked when you perform the lock statement, move onto the next record...?
Not sure. Feel free to play with it a bit and see if you can figure something out?
I was thinking something like this (i still need to check the exception code):
protected function getNextAvailableJob($queue)
{
$this->database->beginTransaction();
$job = $this->database
->table($this->table)
->where('queue', $this->getQueue($queue))
->where('reserved', 0)
->where('available_at', '<=', $this->getTime())
->orderBy('id', 'RAND')
->first();
if(!is_null($job)) {
try {
$job = $this->database
->table($this->table)
->lockForUpdate()
->where('id', $job->id)
->get();
} catch (\PDOException $ex) {
if($ex->getCode() != '1213') {
throw $ex;
}
return null;
}
}
return $job;
}
Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...
Will give it a test tomorrow.
Sounds like form my reading around online it’s really not going to be feasible to avoid deadlocks in some way when using a relational database as a queue. It’s just not really intended for that purpose. If you’re only running a couple workers it’s probably fine but too many workers is going to start locking up.
On May 19, 2016, at 11:43 AM, Marcus [email protected] wrote:
I was thinking something like this (i still need to check the exception code):
protected function getNextAvailableJob($queue) { $this->database->beginTransaction(); $job = $this->database ->table($this->table) ->where('queue', $this->getQueue($queue)) ->where('reserved', 0) ->where('available_at', '<=', $this->getTime()) ->orderBy('id', 'RAND') ->first(); if(!is_null($job)) { try { $job = $this->database ->table($this->table) ->lockForUpdate() ->where('id', $job->id) ->get(); } catch (\PDOException $ex) { if($ex->getCode() != '1213') { throw $ex; } return null; } } return $job; }
Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...
Will give it a test tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220382963
Hmm possibly, i have a couple more things i want to try first before accepting that though :P
My theory above turned out to be wrong; your previous query to lock the next available job for update works fine and doesn't lock multiple records.
My next theory is:
When you lock the next available job, this doesn't stop another worker from seeing the job as available; it merely makes them wait to lock it themselves. Its not until the second query (an update) is made that the job will been seen as reserved. However slight, there should be time between these two queries for a second worker to attempt to lock the job and thus be left waiting for 50secs until the lock timeout error is thrown. The fact that this window is very short would also explain why i only get a couple of the exceptions when running lots of jobs.
So i propose the following:
protected function getNextAvailableJob($queue)
{
$job = $this->database
->table($this->table)
->where('queue', $this->getQueue($queue))
->where('reserved', 0)
->where('available_at', '<=', $this->getTime())
->orderBy('id', 'asc')
->first();
return is_null($job) || $this->markJobAsReserved($job->id) ? $job : $this->getNextAvailableJob($queue);
}
protected function markJobAsReserved($id)
{
$updates = $this->database
->table($this->table)
->where('id', $id)
->where('reserved', 0)
->where('available_at', '<=', $this->getTime())
->update([
'reserved' => 1,
'reserved_at' => $this->getTime(),
]);
return $updates > 0;
}
The above functions do not lock records, but instead gracefully handle the fact that another worker beat them to reserving the record by going and getting the next available job and starting again.
This could however cause race conditions where a worker is constantly blocked by other workers that are beating them to the punch, so to speak. In this case, i guess the best way forward would be to then randomise the order that jobs are fetched from the queue; this would drastically reduce the likelihood of a worker being blocked many times.
I can't test this out until Monday though as I'm on 4G and don't have the necessary data allowance :P
Will get back to you Monday with an update; let me know what you think :)
Seems to be running well with those changes and a couple of other small bits. Will make a pull request tomorrow; changes are a bit big for pasting into a comment.
Thank you everyone! :heart:
Great changes!
But what are the thoughts for using both _reserved_at_ and _available_at_ columns to find next available job? Also the _reserved_ column seems to not be used at all?
This is the current SQL:
select * from``jobs``where``queue``= ? and ((``reserved_at``is null and``available_at``<= ?) or (``reserved_at``<= ?)) order by``id``asc limit 1 for update
If we modify the usage of the _available_at_ column a little bit we could end up with:
select * from``jobs``where``queue``= ? and``available_at``<= ? order by``id``asc limit 1 for update
The idea is to increase the _available_at_ timestamp every time the row is reserved. It also gives the posibility to implement a touch function which increases the timestamp while reserved, giving the script more time to process the job.
If the index is changed to ['queue', 'available_at'] the MySQL explain execution of the SQL gives "Using index condition; Using filesort" and a 100% filtered instead of "Using index condition; Using where; Using filesort" with less filtered.
The _reserved_ column can be removed entirely.
I haven't done any further tests or benchmark. I just wanted to share my thoughts. :)
Edit:
I'm not sure how GitHub notifications works for closed issues. Sorry for pinging you directly if you have already seen this comment. @GrahamCampbell @taylorotwell
As @rabbitfang wondered could the solution (if works) be backported in 5.1?
I have 7 named queuees with work daemons (1 worker per queuee) running on database driver with supervisor. Laravel version 5.2.45.
Params: --delay=1 --sleep=3 --tries=3 (Btw does that delay param even apply at all, as delay between jobs...?)
Just crunched about 2500 jobs quickly and have got about 110 deadlock errors. This particular bunch of jobs was so that about 2400 jobs was on a single queue and about 100 was only an another queue, both running fast on parallel.
[2016-12-04 01:06:28] production.ERROR: PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php:335
Stack trace:
#0 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(335): PDOStatement->execute(Array)
#1 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(722): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'select * from `...', Array)
#2 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(685): Illuminate\Database\Connection->runQueryCallback('select * from `...', Array, Object(Closure))
#3 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(349): Illuminate\Database\Connection->run('select * from `...', Array, Object(Closure))
#4 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1610): Illuminate\Database\Connection->select('select * from `...', Array, false)
#5 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1596): Illuminate\Database\Query\Builder->runSelect()
#6 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1577): Illuminate\Database\Query\Builder->get(Array)
#7 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(193): Illuminate\Database\Query\Builder->first()
#8 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(164): Illuminate\Queue\DatabaseQueue->getNextAvailableJob('queuename3')
#9 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(184): Illuminate\Queue\DatabaseQueue->pop('queuename3')
#10 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(149): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename3')
#11 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(111): Illuminate\Queue\Worker->pop('database', 'queuename3', '5', '3', '3')
#12 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon('database', 'queuename3', '5', '3', '3')
#13 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(119): Illuminate\Queue\Worker->daemon('database', 'queuename3', '5', 128, '3', '3')
#14 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(78): Illuminate\Queue\Console\WorkCommand->runWorker('database', 'queuename3', '5', 128, true)
#15 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#16 /home/project/bootstrap/cache/compiled.php(1257): call_user_func_array(Array, Array)
#17 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(169): Illuminate\Container\Container->call(Array)
#18 /home/project/vendor/symfony/console/Command/Command.php(256): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(155): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/project/vendor/symfony/console/Application.php(794): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/project/vendor/symfony/console/Application.php(186): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/project/vendor/symfony/console/Application.php(117): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/project/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(107): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 /home/project/artisan(35): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#25 {main}
Don't think this was ever fully resolved.
Need to combine the query to get the next job and the query to reserve that job, however, solution seems to be platform specific.
See #14231. This makes the workers "block" to avoid the exceptions, but there would be some wasted process time every time a worker blocks and these blocks would still increase with more then 10-15 workers per queue.
I switched to SQS queue driver in order to avoid this.
If nothing else it might be worth checking whether you get any lock issues when running the jobs on a different driver like sqs; might be that your jobs contain queries which cause some of the locks? Might not be the case but definitely worth checking as the number of lock errors is quite high and if it is the case you might find you can switch back to the database driver once the jobs have been optimised.
I just faced it with Laravel 5.3.26
having 8 workers and database driver.
We are switching to Redis to remove even a possibility of such thing on our production server.
I am keeping an eye on this thread to know if this problem was\will be resolved for database driver.
I've ran into this as well, testing on a local machine with HTTP requests being handled by the queues and this is locking the database up completely. The threads get killed by supervisor.d but they just restart and get locked again, it seems that the UPDATE's aren't locking up the database though, I can't put my finger on what specifically is locking it up but something is causing the UPDATE requests to hang and then get killed by supervisor and start again, same issue every time.
Edit: Please please please make the documentation reflect this oddity, it's obviously quite prevalent and it's very frustrating running into this when testing...
The database driver should not be used in a heavy production type environment. It's not the right tool for the job. Heavily suggest any of the other drivers. The database driver is primarily useful in local development environments or in small production environments where you just have 1-2 workers.
Whilst I appreciate that now, I think myself and many of the other commenters here weren't aware that this would be such an issue, especially not at low volume (3 workers). It would be great if the documentation should note that the DB driver isn't good for queuing at any kind of volume.
I've now switched to Redis, which isn't as nice for testing it'll be better than sticking with the MySQL workers and falling into these kind of issues.
Still have this in Laravel 5.4 with 8 workers and database as the driver. Moving away to redis to avoid this. It would definitely help if the documentation has a mention of this.
Didn't know that a micro instance on AWS with 4 workers is considered a high volume. Docs should mention that DB is designed for only 1 worker.
@taylorotwell Any chance we could get the docs update (and/or force only a single DB worker) to avoid these issues? I've had discussions with a lot of users who have been using the DB driver with these issues.
I've came across this issue in 5.4 but sadly I don't have the luxury of moving away from the DB driver.
I'm getting a deadlock on the delete query but I don't understand why. By logging to file at the start and end of job processing, I can see that only one queue worker is picking up the job . Using show engine innodb status
I can see that there is a lock on the row (as expected) and not on the table (also as expected) so I'm at a loss as to why the deadlock is occurring as there doesn't appear to be anything conflicting with the delete statement.
When running queue:work
its been very easy to reproduce an issue where the deadlock occurs then the queue hangs on 3 of the 4 workers. The other worker continues processing the queue, but when cancelled 1 of the 3 others becomes active and carries on processing. Likewise, when you cancel the new active worker, another inactive worker becomes active.
I've been able to work around this issue by adding retries to the deleteReserved()
function:
public function deleteReserved($queue, $id)
{
$this->database->transaction(function() use ($queue, $id) {
if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
$this->database->table($this->table)->where('id', $id)->delete();
}
}, 5);
}
I was able to process 200k jobs across 6 workers in 1 hour without any problems, whereas previously i'd receive lock issues with 4 workers and that would make the queues hang after only a couple of seconds.
@taylorotwell do you think its worth me putting this into a PR?
I'm having the same issue.
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction in /home/forge/default/releases/20171113162416/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOConnection.php:77
I do run on a cluster with 3 Mysql servers if that has anything to do with it.
Dan
@daniel-farina does the workaround I posted solve your problem?
@micmania1 I tried your workaround and it seems to work, thanks!
Edit: not entirely, see below...
@micmania1 Do you did the changes in the vendor folder?
@micmania1 if that works I would love a pull request to fix deadlock issues. I have to process 23K jobs.
Hi guys,
I've got another deadlock on the system recently even with this new fix.
After running innotop
to inspect the deadlock I got this:
_____________________________________________________________________________________________ Deadlock Transactions _____________________________________________________________________________________________
ID Timestring User Host Victim Time Undo LStrcts Query Text
4 2017-12-09 09:54:02 keychest localhost Yes 00:00 0 3 select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` as
5 2017-12-09 09:54:02 keychest localhost No 00:00 1 3 delete from `jobs` where `id` = ?
______________________________________ Deadlock Locks _______________________________________
ID Waiting Mode DB Table Index Special Ins Intent
4 1 X keychest jobs PRIMARY rec but not gap 0
5 0 X keychest jobs PRIMARY rec but not gap 0
5 1 X keychest jobs jobs_queue_reserved_at_index rec but not gap 0
So apparently the delete transaction got the lock on the PRIMARY
index and later tried to acquire another index jobs_queue_reserved_at_index
. Maybe due to lockForUpdate()
on the previous query.
I am wondering why the select itself is needed there. So I changed this:
if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
$this->database->table($this->table)->where('id', $id)->delete();
}
To this:
$this->database->table($this->table)->where('id', $id)->delete();
And testing it on the dev system.
UPDATE: I am getting different kind of deadlock now:
______________________________________________________________________________________________________ Deadlock Transactions ______________________________________________________________________________________________________
ID Timestring User Host Victim Time Undo LStrcts Query Text
8767 2017-12-11 18:08:03 keychest localhost No 00:00 1 5 update `jobs` set `reserved_at` = ?, `attempts` = ? where `id` = ?
8768 2017-12-11 18:08:03 keychest localhost Yes 00:00 0 4 select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` asc limit 1 for up
_______________________________________ Deadlock Locks ________________________________________
ID Waiting Mode DB Table Index Special Ins Intent
8767 0 X keychest jobs PRIMARY 0
8767 1 X keychest jobs jobs_queue_reserved_at_index gap before rec 1
8768 1 X keychest jobs PRIMARY rec but not gap 0
So two concurrent job selections are locked. One tries to update the selected job while the another one blocks it.
DatabaseQueue.php
uses manual transaction management with (beginTransaction()
, commit()
). I refactored DatabaseQueue::pop
so it uses closure like transactions with retry:
$this->database->transaction(closure)
. It does correct rollback()
when exception ocurrs (e.g. database detects deadlock)
Deadlock still happens obviously but workers are able to recover.
For those following this issue; @ph4r05 is a fantastic provider of debugging information. This is the first time someone has checked where the deadlock actually occurs. In his case it's a gap lock on the jobs_queue_reserved_at_index index. This index was removed in the default 5.5 jobs table in https://github.com/laravel/framework/commit/acd933fd6b793b0c79a2ba4eb462892cd6c460e3
So, if anyone still have deadlock issues, check your show engine innodb status
, check it is a gap lock on the index, and if so, try removing it.
I should probably say something like "make sure you know your database before randomly removing indexes" or "take backups before drinking" or something. This is an internet forum, I'm talking about changes in important system tables. You know the deal. ;)
Thanks to the @sisve for discussion. I didn't notice the index was removed as I started the project on older Laravel and database migrations did not remove the index during Laravel upgrade (as far as I am aware) - may be the same case for others in this discussion. After removing the index there are no more deadlocks.
@taylorotwell @20TRIES on a discussion about lock contention in DB workers - for the research and testing purposes I reimplemented pessimistic locking of the database queue to optimistic locking. Take a look at my commit if interested: https://github.com/ph4r05/framework/commit/60fcddd9057af5f534fe19eed6336d51bf2c9257
There are no explicit transactions involved and no DB level locking. DB workers compete for jobs on their version
columns, using affected rows
information from update call.
I don't think this is a production ready code (don't use without proper testing) and some benchmarking for pessimistic vs optimistic locking would be nice (I can get to that in a few days).
I can submit PR later (once we have benchmarks) if you deem it interesting.
This is only a theory, but this is what I think is causing the deadlock on delete.
Queue Worker 1 Queue Worker 2
1 Select (lock)
2 Update
3 Process Job Select (lock)
4 Delete Job (locked - fail) Update
5 Delete Job (success) Process Job
6 Delete Job (success - 0 rows affected)
This would mean that a job could get processed twice. I am running this with the assumption of a transaction retry (ie. the workaround I provided).
Whilst the delete retry solved my problem locally, when we began testing in what will be our prod environment we ran into further issues. We're using a galera cluster with master-master db setup. Locking doesn't work the same way and the default queue simply can't be ran concurrently with that setup.
I've solved this in our project by separating the queue into multiple streams where each worker is responsible for a single stream. When a job is saved it will pick one of the available streams at random. You could potentially do this by using different queues too (one per environment) but I never tried that. This introduces other potential issues (mostly operational), but essentially avoids locking/conflicts completely which is our biggest concern and provided good speed and some scalability.
If you do have the luxury of moving to a different queue system, you should do that.
There was a bug in the database queue transaction handling (no rollback) which caused the freezing of the workers. Take a look at the pull request #22394 (maybe check if you have the jobs_queue_reserved_at_index
index).
The second worker should not pick the job as it is being reserved. If job is being processed twice reconsider tuning retryAfter parameter.
I am currently using the queue with PR merged and it works perfectly for me.
@micmania1 Use show engine innodb status
to debug your deadlock / lock timeout issues. Your theory does not contain the scope of the transactions, and that is important information. There is a SELECT+UPDATE and a SELECT+DELETE that have their own transactions. Activities outside transactions are probably not causing locking issues. (There's technically implicit transactions with auto-commit, so locks held by others are still respected, but the transactions own locks are released after every statement.)
@sisve @micmania1 I have a new example of deadlocks on the DatabaseQueue
.
With #22394 and #22433 merged the DatabaseQueue workers will recover from the deadlock (no freeze) but some jobs will get processed more than once. Here is why and how:
I created a benchmark project for Laravel queueing for testing of various job processing modifications. The project is fresh Laravel 5.5 (created 1 day ago with composer). I generate a synthetic load and measure correctness and performance. The scenario:
--queue=high,default,low --sleep=1 --tries=3
default
, available_at
set to the future.UPDATE jobs SET available_at=0;
which causes job processing by workers.innodb_deadlock_detect=1
, MySQL 5.7.20, InnoDBCREATE TABLE `jobs` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`queue` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`payload` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`attempts` tinyint(3) unsigned NOT NULL,
`reserved_at` int(10) unsigned DEFAULT NULL,
`available_at` int(10) unsigned NOT NULL,
`created_at` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `jobs_queue_index` (`queue`)
) ENGINE=InnoDB AUTO_INCREMENT=353855 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Deadlock:
LATEST DETECTED DEADLOCK
------------------------
2017-12-15 02:57:39 0x7efd583f3700
*** (1) TRANSACTION:
TRANSACTION 3183282, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 8 row lock(s)
MySQL thread id 1426, OS thread handle 139626575173376, query id 15379859 localhost 127.0.0.1 laravel Sending data
select * from `jobs` where `queue` = ? and ((`reserved_at` is null and `available_at` <= ?) or (`reserved_at` <= ?)) order by `id` asc limit 1 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 27 page no 3630 n bits 104 index PRIMARY of table `laravel`.`jobs` trx id 3183282 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 9; compact format; info bits 32
0: len 8; hex 00000000000575bb; asc u ;;
*** (2) TRANSACTION:
TRANSACTION 3183286, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 1437, OS thread handle 139626572363520, query id 15379876 localhost 127.0.0.1 laravel updating
delete from `jobs` where `id` = ?
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 27 page no 3630 n bits 104 index PRIMARY of table `laravel`.`jobs` trx id 3183286 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 9; compact format; info bits 32
0: len 8; hex 00000000000575bb; asc u ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 27 page no 4 n bits 824 index jobs_queue_index of table `laravel`.`jobs` trx id 3183286 lock_mode X locks rec but not gap waiting
Record lock, heap no 719 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 7; hex 64656661756c74; asc default;;
1: len 8; hex 00000000000575bb; asc u ;;
*** WE ROLL BACK TRANSACTION (2)
Both transactions are locking the same record. The problem is delete
transaction is rolled back. The exception propagates to the worker which continues to the next task. The culprit is an index, this time jobs_queue_index
I guess.
The job was already processed but not removed from the database so it will be picked up after it expires. This can happen multiple times.
It would be preferable the transaction 1 is rolled back (select) instead of the delete but I am not aware this is possible to set.
The funny thing is the following hack solves the problem:
public function deleteReserved($queue, $id)
{
$this->database->transaction(function () use ($queue, $id) {
$this->database->table($this->table)->where('id', $id)->delete();
}, 5);
}
As the worker really insists to delete that job. With magic constant 5 I had no deadlock anymore.
There could be implemented some kind of back-off mechanism to try to remove job several times and eventually fail - this could significantly decrease error rate. But it is not a clean approach. (200 jobs per second).
Another way out is adding a new flag field delete_mark
- soft delete indicator / 2-stage delete:
public function deleteReserved($queue, $id)
{
$this->database->transaction(function () use ($queue, $id) {
$this->database->table($this->table)->where('id', $id)->update(['delete_mark' => 1]);
});
$this->database->table($this->table)->where('id', $id)->delete();
}
If worker picks a job with delete_mark > 0
it deletes/ignores it so the job will be processed only once. The update will succeed as it does not modify the index (delete modifies the index). I will test this approach more and if it works correctly I will submit a PR (170 jobs per second).
Another solution is optimistic locking which I am testing right now - no deadlocks. I will publish a blog with benchmarks soon but the performance is not very shiny with the first implementation (slower than pessimistic, 100 jobs per second).
Another solution is removing the index altogether (slow, 90 jobs per second).
If innodb_deadlock_detect=0
then the only options are a) optimistic locking b) removing the index, it seems.
Just for comparison, beanstalkd
queue with the same parameters gives 1080 jobs per second.
I'm looking at the innodb status you've provided, but I do not see the actual culprit.
Your status does not tell us who has already acquired a conflicting lock on jobs_queue_index.
These locks are not gap locks, so it's not the same problem as with the jobs_queue_reserved_at_index. Do you perhaps have a open connection from your desktop where you've accidentally started a long-running transaction and caused some locks?
@sisve thanks for your time! If you want to take a look at this complete report: https://github.com/ph4r05/laravel-queueing-benchmark/blob/master/.github/deadlock07.txt
I think the index is held by select queries.
I've disabled deadlock detection and increased timeouts so its easier to debug
set global innodb_deadlock_detect=0;
set global innodb_lock_wait_timeout=1000;
set innodb_lock_wait_timeout=1000;
We have two different transactions going on.
Fetch a new job (DatabaseQueue::pop)
SELECT * FROM jobs WHERE queue = ? AND (...) ORDER BY id ASC LIMIT 1 FOR UPDATE
UPDATE jobs SET reserved_at = ? WHERE id = ?
Delete a job after it is processed (DatabaseQueue::deleteReserved)
SELECT * FROM jobs WHERE id = ? FOR UPDATE
DELETE FROM jobs WHERE id = ?
_I'll be guessing more than usual from here on._
It seems like the select that fetches the jobs issues locks on both the jobs_queue_index and the primary key (guessing that it does it in that order), while the select that exists in deleteReserved only locks the primary key. This allows the fetch-select to grab half (the index) of the locks they require and block waiting for the primary key lock, while the deletion transaction has grabbed the primary key lock and is now blocking on the index lock.
So, while I am still guessing wildly, can we trick the database to issue index locks? Can we modify the locking in deleteReserved so it's based on SELECT * FROM jobs WHERE queue = ? AND id = ? FOR UPDATE
which would hopefully issue identical locks as the fetch-select.
All this is extremely dependent on the table structure. Any other indexes would probably also add the the chaos and madness. Mmm madness...
I made some benchmarks on Amazon c4.large of the Laravel queueing methods (redis, beanstalkd, database):
I also implemented optimistic locking strategy for database driver (no deadlocks), blogpost + repo:
Hey guys,
I've managed to test running 500,000 jobs by 15 workers without any deadlocks with the PR (https://github.com/laravel/framework/pull/31287). thoughts please :)
Most helpful comment
Took a few days attempting debugging before coming across this post; may be worth mentioning in the documentation to only use one worker with the database queue? Would have definitely saved me a lot of time.