Magento2: Cron starts when it's already running

Created on 24 Aug 2017  路  26Comments  路  Source: magento/magento2

I noticed that when I start a cron that's already running it will run again. For example: I have a task that takes 15 minutes, but the cron executes it's every 5 minutes. In the database the cron gets the status running and the other scheduled jobs have pending. But after 5 minutes, the other job (with the same code) also starts to run. So now I have the same job running twice!

Preconditions

  1. Magento 2.1.7

Steps to reproduce

  1. Create a cron that takes a while (for example, sleep 15 minutes)
  2. Schedule it to run every 5 minutes

Expected result

The other jobs should wait for the first job to complete, even though they are scheduled to run every 5 minutes.

Actual result

The new job starts, ignoring the fact that it's already running...

I'm not sure if this is intended behavior or a bug. Am I expected to create my own lock / flag for this?

Fixed in 2.2.x Clear Description Confirmed Format is valid Ready for Work Reproduced on 2.1.x Reproduced on 2.2.x Reproduced on 2.3.x bug report

Most helpful comment

@ihor-sviziev

  1. We're a hosting provider with many live customers on M2 with this problem. We need a fix right now and one that works for every version.

  2. We don't know what Magento's own plans are for a fix. This cron issue has been open for a very long time. It has not been properly addressed. It is impacting customers severely. See 1.

  3. We submitted a PR with fixes initially. No response. See 1.

  4. PR merges are very slow. Releases with fixes are very slow. See 1.

  5. M2 fixes are not being back ported. Customers on live M2 stores are not able to upgrade to the latest version. See 1.

We're happy to work with the core team to get this merged into the core. We will be at Imagine in April at the hackathon and will discuss it further with the comm eng team. See 1.

All 26 comments

Note to anybody working on this: ticket #10279 was closed as a dup of this but contains additional details about the root cause that I had already tracked down.

@kanduvisla, thank you for your report.
We've created internal ticket(s) MAGETWO-80790 to track progress on the issue.

Hey there. I'm working on it #SQUASHTOBERFEST

@gabrielqs-redstage Sorry I was already working on it yesterday and sent the PR. It was not marked bc I forgot it while the ticket was open on my side.

@ishakhsuvarov can you please assign this one to @diglin instead?

@gabrielqs-redstage reassigned.

Hi,

Does anyone know why concurrent tasks with same job_code is not prevented by the following:

Magento\Cron\Observer\ProcessCronQueueObserver => if ($schedule->tryLockJob()) {
Magento\Cron\Model\Model\Schedule => if ($this->_getResource()->trySetJobUniqueStatusAtomic(
            $this->getId(),
            self::STATUS_RUNNING,
            self::STATUS_PENDING
        )) 
Magento\Cron\Model\ResourceModel\Schedule => trySetJobUniqueStatusAtomic

Query that was generated by the test was this

UPDATE `cron_schedule` AS `current`
 LEFT JOIN `cron_schedule` AS `existing` ON existing.job_code = current.job_code AND existing.status = 'running'
SET `current`.`status` = 'running'
WHERE (current.schedule_id = '111') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)

It seems designed to prevent two concurrently running tasks with the same job code.
(It is also covered by tests at Magento\Cron\Model\ScheduleTest in the integration test-suite)

@s-hoffman this is in progress in https://github.com/magento/magento2/pull/12497

@elvinristi, as I understand that ticket, it is adding MySql based locking around the cron group's execution and does not explain why trySetJobUniqueStatusAtomic is failing to catch this error.
(It may prevent concurrent execution by definition but does not directly address trySetJobUniqueStatusAtomic's failure).

@s-hoffman I think what Magento does in _$schedule->tryLockJob()_ is locking individual scheduled execution run of a job, so that specific job run is not triggered again. What it does not prevent is running the same job code (but different scheduled execution run) at the same time, and it was never intended to do so. _$schedule_ is referring to a row in _cron_schedule_ table, where there are multiple rows per job code.

@s-hoffman random thought but does your database contain more than 1 schedule for the given status? I wonder if the code fails if there is already more than one entry in the database.

@paveq, Not [running the same job code (but different scheduled execution run) at the same time]?
What can I do when I do not see where in the code that is the intention?
As far as I can see, the method that is currently called, is named, trySetJobUniqueStatusAtomic.

The Sql that is generated by the method is:

UPDATE `cron_schedule` AS `current`
 LEFT JOIN `cron_schedule` AS `existing` ON existing.job_code = current.job_code AND existing.status = 'running'
SET `current`.`status` = 'running'
WHERE (current.schedule_id = '111') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)

This clearly shows intent to check for currently running schedules, without checking scheduled time or any other timestamp field.

Sets schedule status only if no existing schedules with the same job code
     * have that status.  This is used to implement locking for cron jobs.

The comment also indicates an intent to implement locking.

Also, there are integration tests in .dev\tests\integration\testsuite\Magento\Cron\Model\ScheduleTest.php
These tests testTryLockJobNoLockedJobsSucceeds & testTryLockJobAlreadyLockedFails & testTryLockJobOtherLockedFails & testTryLockJobDifferentJobLocked, try to cover the basic scenario of, no existing running jobs and yes existing running jobs.

The above description does include a temporal component. (But after 5 minutes, the other job). Otherwise the tests seem to cover the scenario described above.

@dmanners, I can some tests on my local using the SQL statement above, and tried to add more schedules. I was unable to replicate it by adding extra jobs, though it was guesswork testing. (low cost low time investment).

I have added a back-port pull request for the 2.1-develop branch so that both 2.1 and 2.2 should behave in the same way with regards to cron scheduling. (See https://github.com/magento/magento2/pull/12805). We are also getting the core team involved with https://github.com/magento/magento2/pull/12497 to try to work on cron scheduling with the community.

@s-hoffman Indeed, it looks like this was introduced in 2.2 here https://github.com/magento/magento2/commit/eacb7024e2640f71199b0bf943a4409d5d883548

Either it does not work, or if it does it causes even larger issues. If we blindly check for "running" status, since crashed job will cause that job to never run again until the row is removed from schedule table.

In my opinion MySQL based locking which gets released when connection closes is better solution here, as it actually depends on the running process.

@paveq Actually I just had an issue, where some cronjobs didn't start after updating from 2.1 to 2.2, because there where old entrys of failed jobs with running status in the schedule table.

@paveq, This ticket is _currently_ labeled_ 'Reproduced on 2.2.x'.
Is that label correct?

@s-hoffman Yes, I can confirm we can reproduce the issue on a 2.2.2 EE.

This issue also exists in CE 2.2.x. Its hilariously server breaking.

What's the state of this? We have 2.2.2. CE running and cron is not running correctly and needs to be fixed asap.

We wrote an extension to fix these bugs, speed up performance, and control the execution of tasks: https://github.com/magemojo/m2-ce-cron

@ericvhileman why have you decided to create separate module instead creating PR with all your improvements?

@ihor-sviziev , I can just assume but I saw tickets and problems with already created Pull Requests being ignored for months. And as we know, major bugfixes often just get into the next major release, which happens once a year.

As users working with production environments, we need solutions asap.
Same happend when I found out paypal is no longer working starting with 2.2.0 - needed a community module like this to fix this.

@johnny-longneck thank you for clarifying. Specially I'd like to add these fixes directly into magento, it will fix issues for a much more people than separate module. Is there any list of issues and how they were fixed in your module?

PS: specially I'm creating PR with all needed changes and use following article to apply these changes into our projects: https://twitter.com/s3lf/status/957993636058288128. As result - these changes will be merged at some point of time and we're getting issue fixed on production right now.

Also if you have any issues with long process of merging - you can contact community maintainers in slack or just mention in comment, it will help to speedup this process.

Yes most people agree @johnny-longneck , waiting on Magento to release a fix is a complete waste of time when problems exist in the present, not the next release so we move on.

Instead we get more 3rd party stuff added in releases which is basically an advertisement for a given vendor.

@ihor-sviziev

  1. We're a hosting provider with many live customers on M2 with this problem. We need a fix right now and one that works for every version.

  2. We don't know what Magento's own plans are for a fix. This cron issue has been open for a very long time. It has not been properly addressed. It is impacting customers severely. See 1.

  3. We submitted a PR with fixes initially. No response. See 1.

  4. PR merges are very slow. Releases with fixes are very slow. See 1.

  5. M2 fixes are not being back ported. Customers on live M2 stores are not able to upgrade to the latest version. See 1.

We're happy to work with the core team to get this merged into the core. We will be at Imagine in April at the hackathon and will discuss it further with the comm eng team. See 1.

Was this page helpful?
0 / 5 - 0 ratings