Beanstalkd: Allow for custom job ID

Created on 20 May 2015 · 13Comments · Source: beanstalkd/beanstalkd

put <pri> <delay> <ttr> <bytes> <id>\r\n
<data>\r\n

This would allow to implement failovers.

Source

rzajac

👍1

Most helpful comment

@rzajac @JensRantil I've also run into this.
The way I do it right now is put the external-id in Redis like @JensRantil suggested and save a mapping to the Beanstalkd generated id. Then, I use it later to cancel, query the job etc. In a way, having Beanstalkd take a custom Id would eliminate the need for an extra piece of infra.

urjitbhatia on 18 Oct 2016

👍4

All 13 comments

I can't see the reason for specifying a custom job ID within a queue.

emanuelecasadio on 21 May 2015

And I really see a reason and really hope this will implemented!
Here is an example:

You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!
You can check a specific jobs state
and so on...

Please implement this!! :+1:

aight8 on 21 May 2015

👍1

You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!

This can be done easily by checking if a daily job is exucted before putting it into or after reserving it from beanstalkd.

And, why not /etc/cron.daily/?

ifduyue on 21 May 2015

Allowing to specify custom job IDs would allow me to implement sort of HA and Failover on the library level:

adding/deleting... jobs to two or more beanstalkd servers
in case of one failing workers / producers could connect to servers from a pool

Unless I'm missing something and current protocol allows for better solutions.

PS. Adding job with ID that already exist should trigger an error

rzajac on 22 May 2015

How do you maintain your job ids without central point of failure? In general, in a distributed system you cannot have a job which runs exactly once (http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/). Beanstalkd provides at most once and we do not want to change that. You can have at least once with more than one beanstalkd and some distributed locking or similar (some people use memcached or a db for that purpose).

jabdoa2 on 27 May 2015

Maintaining job ids is out of the scope of this ticket. What I need is to be able to specify my own job ID.

rzajac on 29 May 2015

This would allow to implement failovers.

@rzajac Could you elaborate a little on this? I'm not exactly sure about your use-case.

JensRantil on 3 Apr 2016

@JensRantil I explained it a little bit here: https://github.com/kr/beanstalkd/issues/264#issuecomment-104544535

rzajac on 3 Apr 2016

This feature request breaks BC for protocol v1.x

sergeyklay on 3 Apr 2016

@rzajac Ah, sorry. Missed that. Thanks!

I'm going to be the devil's advocate here and shoot down some of the use cases :-)

@aight8 wrote:

You have number of repeatable job that you want to execute every day once but you want it handle in the queue. Today you have delete first the whole tube, that's ugly (by iterate and delete the queue items). With this solution it's lot easier and flexible!

There are various approaches to regular cronjobs:

Simply having a an /etc/cron.daily putting your daily job on the queue.
Having a permanent job, which every day is being delayed until next midnight.
For multiple jobs being executed at the same time every day, you could use any of the two above cases and simply have your daily job put smaller tasks on the queue. That is, task would split into smaller tasks.

You can check a specific jobs state

Valid point. Workaround is to store the job id in another datastore.

and so on...

Not an argument. Carry on. ;)

@rzajac wrote:

Allowing to specify custom job IDs would allow me to implement sort of HA and Failover on the library level:

I really don't think this is a good idea. Doing double writes independently to two queues is bound to eventually make them diverge and have different state. There are all sorts of race conditions. Examples; One TTL times out on one queue and not on the other. Another problem is that you currently can't reserve a specific job. You can delete a specific job, but then you can't be sure that no other consumer has reserved it etc.

The _real_ solution here would be to use something like Zookeeper's ZAB or probably even better RAFT algorithm. All writes would go through master and a majority would need to acknowledge each state change. This would obviously introduce complexity, new failure modes and additional latency to every operation.

JensRantil on 3 Apr 2016

urjitbhatia on 18 Oct 2016

👍4

This will also help with self-throttling the job on the client side as well :) Simply checking if the job is already there allow us to avoid sending another one or just increase delay time.

yellow1912 on 28 Dec 2016

👍1

Okay, I'm going to close this issue as a no-go. Reasons are as follows:

Allowing a custom ID also has the implications that we break the uniqueness of job id. There are also potential concurrency confusions since a very recently deleted job might seem to "pop up" again if a job with a specific job id is added adding lots of confusion to both clients and developers.
There are lots of ways this can be solved in a different way than expanding the scope of beanstalkd:
- submit multiple identical cronjob tasks and make the task processing idempotent (such as gracefully ignoring recently processed message or similar).
- only running a single process pushing a cronjob task to beanstalkd.
- taking a distributed lock to make sure only a single process pushes a cronjob task to beanstalkd. See for example https://dkron.io.

Please open a new issue describing your _use-case_ if you believe if your use-case can't be worked around using the above approaches.

JensRantil on 19 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings