At least one job has gotten into a state where it cannot be deleted. I've tried restarting the beanstalkd server but it doesn't seem to free up. Also, notice how the releases count is 0 for the job.
Server stats:
OK 972
---
current-jobs-urgent: 0
current-jobs-ready: 2
current-jobs-reserved: 1
current-jobs-delayed: 6690
current-jobs-buried: 1480463
cmd-put: 500348
cmd-peek: 1
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 0
cmd-reserve-with-timeout: 546523
cmd-delete: 500378
cmd-release: 74
cmd-use: 821
cmd-watch: 393
cmd-ignore: 393
cmd-bury: 3
cmd-kick: 0
cmd-touch: 0
cmd-stats: 30
cmd-stats-job: 462
cmd-stats-tube: 73
cmd-list-tubes: 1
cmd-list-tube-used: 0
cmd-list-tubes-watched: 12
cmd-pause-tube: 0
job-timeouts: 4
total-jobs: 500348
max-job-size: 65535
current-tubes: 17
current-connections: 425
current-producers: 71
current-workers: 197
current-waiting: 142
total-connections: 1101
pid: 11708
version: 1.9
rusage-utime: 36.818301
rusage-stime: 143.336958
uptime: 721
binlog-oldest-index: 433964
binlog-current-index: 434066
binlog-records-migrated: 711683
binlog-records-written: 1712481
binlog-max-size: 10485760
id: cf281b94ce75771f
hostname: queue3
Job stats:
stats-job 9786693342
OK 196
---
id: 9786693342
tube: raw_item_indexer
state: reserved
pri: 2147483648
age: 3138203
delay: 0
ttr: 120
time-left: 103
file: 434013
reserves: 26135
timeouts: 26100
releases: 0
buries: 0
kicks: 0
Job data:
peek 9786693342
FOUND 9786693342 113
{"item_id": 291725218, "occurrences": 7, "event": "repeat_item", "new_occurrences": 8, "raw_item_id": 1862703885}
Attempt to delete the job:
delete 9786693342
NOT_FOUND
Just wanted to say that I have been having issues with job deletion as well. Currently still debugging the issue but just wanted to confirm the same problem.
Edit:
So I did a bit of testing and found that I too get sporadic job timeouts. Just for reference, my application is written in CakePHP and I'm using one of the beanstalkd plugins to handle the workers. But I want to make it clear that this _exact same code_ did work correctly when we initially wrote this feature back in June/July.
Anyway, here is the timeline of events. I started a fresh instance of the beanstalkd server with a number of workers via supervisord.
Time: 27 seconds
I manually cause a single event in my application that creates a job. The job gets executed immediately after.
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 1
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 1
cmd-peek: 0
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 8
cmd-reserve-with-timeout: 0
cmd-delete: 0
cmd-release: 0
cmd-use: 1
cmd-watch: 8
cmd-ignore: 0
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 3
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 1
max-job-size: 65535
current-tubes: 2
current-connections: 9
current-producers: 0
current-workers: 8
current-waiting: 7
total-connections: 11
pid: 8790
version: 1.9+9+g157d88b
rusage-utime: 0.000000
rusage-stime: 0.004000
uptime: 27
binlog-oldest-index: 0
binlog-current-index: 0
binlog-records-migrated: 0
binlog-records-written: 0
binlog-max-size: 10485760
id: 7e5b9557d53ecef4
Time: 182 seconds
I am not finding the issue so I trigger the event within the application again. A second job gets created and executed immediately.
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 1
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 2
cmd-peek: 1
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 9
cmd-reserve-with-timeout: 0
cmd-delete: 1
cmd-release: 0
cmd-use: 2
cmd-watch: 9
cmd-ignore: 0
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 34
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 2
max-job-size: 65535
current-tubes: 2
current-connections: 10
current-producers: 1
current-workers: 8
current-waiting: 7
total-connections: 12
pid: 8790
version: 1.9+9+g157d88b
rusage-utime: 0.004000
rusage-stime: 0.008000
uptime: 182
binlog-oldest-index: 0
binlog-current-index: 0
binlog-records-migrated: 0
binlog-records-written: 0
binlog-max-size: 10485760
id: 7e5b9557d53ecef4
Time: 302 seconds
The second job gets executed again. We can see that there has been one job timeout.
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 1
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 2
cmd-peek: 4
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 12
cmd-reserve-with-timeout: 0
cmd-delete: 1
cmd-release: 0
cmd-use: 2
cmd-watch: 12
cmd-ignore: 0
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 58
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 1
total-jobs: 2
max-job-size: 65535
current-tubes: 2
current-connections: 9
current-producers: 0
current-workers: 8
current-waiting: 6
total-connections: 12
pid: 8790
version: 1.9+9+g157d88b
rusage-utime: 0.004000
rusage-stime: 0.008000
uptime: 302
binlog-oldest-index: 0
binlog-current-index: 0
binlog-records-migrated: 0
binlog-records-written: 0
binlog-max-size: 10485760
id: 7e5b9557d53ecef4
I ended the test shortly after this. At this point I am not at all sure what is causing the issue.
Has anyone else experienced this and/or have a solution to the deletion bug? cc @kr
ping?
I've found that the job needs to be in ready state to be deletable.
use $queueName
kick 10
delete $jobId
I've not found anything else that reliably deletes the job.
@kr Any chance of addressing this?
I think @coryvirok's NOT_FOUND may be explained by protocol.txt. It states that the response to a delete command will be "NOT_FOUND\r\n" if the job does not exist or is not either reserved by the client, ready, or buried. This could happen if the job timed out before the client sent the delete command. In @coryvirok's case it appears that he hasn't reserved the job with the client (and it's still reserved).
@jtreminio: That makes sense if the client you're issuing the delete from hasn't reserved the job. protocol.txt says:
A client can delete jobs that it has reserved, ready jobs, delayed jobs, and jobs that are buried.
I've experienced with the same problem. The reason is the ttr, as @contentfree said. Take a example, worker A reserved a job X with ttr 120s. The job X will cost A for 300 seconds, but at 120s, X was expired into ready state and soon later reserved by another worker B. Then when A released the job X, it will throw a NOT FOUND exception for A don't acquire X anymore.
Enlarge you ttr for a quick fix. Or is it a bug? @kr
Not a bug, but could IMHO maybe give a nicer error message if possible.
We have just experienced the same problem, and the reason was the number of jobs in the queue: 65 536.
Jobs were generated with TTR 3600, but when these jobs were over the limit 65 535, their TTR was lowered to 60! ( visible via https://github.com/schickling/dockerfiles/tree/master/beanstalkd-console ).
We had to lower number of generated jobs.
@MartinMikita Would you mind creating a separate issue for this? Also, would it be possible for you to write a small script to recreate this?
Is there any reason you cannot delete a delayed job? You can't even seem to be able to update it once delayed either. IE you can't reduce the delayed time once saved.
@m4tthumphrey Please file a new issue. Your comment is not related to this issue.
I could not replicate anything mentioned in the comments. Also I see that beanstalkd 1.9 was used by @charukiewicz.
Please comment if you can reproduce the problem.
Most helpful comment
I've experienced with the same problem. The reason is the ttr, as @contentfree said. Take a example, worker A reserved a job X with ttr 120s. The job X will cost A for 300 seconds, but at 120s, X was expired into ready state and soon later reserved by another worker B. Then when A released the job X, it will throw a NOT FOUND exception for A don't acquire X anymore.
Enlarge you ttr for a quick fix. Or is it a bug? @kr