Beanstalkd: clear tube command

Created on 4 Nov 2009 · 16Comments · Source: beanstalkd/beanstalkd

http://groups.google.com/group/beanstalk-talk/t/4ed1d368d7b3a5a8

FeatureRequest NeedsFix

Source

Most helpful comment

@JensRantil really? I could think at least few actually:

atomicity of operation
minimizing Round-Trip-Time (see pipelining feature of Redis)
huge speed improvement of the operation itself due to possible optimization of the process of clearing a tube instead of deleting items one-by-one
better tolerance for errors like net downtime, packets being lost etc.

It's easy to say that beanstalk is fast having it installed on localhost and doing benchmarks on it but c'mon, not all of us install beanstalk on the same machine as the queuing/executor scripts. Imagine having 20k queued messages and RTT time even as low as only 1ms. You're wasting 20 seconds just for RTT! Now think of the same situation when RTT time is 10ms (I abstract the reasons of this situation). That's 200 seconds just for RTT time. Think of the probability of error doing 20k requests instead of just 1.

prgTW on 17 Apr 2018

👍5

All 16 comments

Was any work ever done on this issue? I know I could definitely use this on a project that extensively uses Beanstalk.

jherdman on 27 Mar 2012

There have been a few pull requests, but nothing merged in yet.
The latest is #87.

kr on 21 Apr 2012

Any progress on this clear tube command? I have a tube with: 711769 jobs that I need to clear. It runs already for hours, doing individual deletes, but is advancing very slow. I am looking to get a more efficient way.

pentium10 on 22 Sep 2013

Yea, flushing a tube would be nice. The whole queue while we're at it.

geudrik on 12 Jan 2015

:+1: +1 to this idea

prgTW on 12 Jan 2015

@pentium10 Interesting it takes such a long time for you to simply delete your jobs. My little test script does that pretty quickly:

import sys
import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)

if sys.argv[1] == "fill":
  print "Filling"
  for i in range(711769):
    beanstalk.put("hello")
elif sys.argv[1] == "clear":
  run = True
  while run:
    job = beanstalk.reserve(timeout=0)
    if job is None:
      run = False
    else:
      job.delete()

Micro-benchmark:

$ time ./testpy/bin/python test.py fill                                                                                             [20:01:23]
ERROR:root:Failed to load PyYAML, will not parse YAML
Filling
./testpy/bin/python test.py fill  18.73s user 11.42s system 57% cpu 52.188 total
$ time ./testpy/bin/python test.py clear                                                                                            [20:02:06]
ERROR:root:Failed to load PyYAML, will not parse YAML
./testpy/bin/python test.py clear  41.44s user 22.28s system 57% cpu 1:49.88 total

Any idea why clearing the tube is so slow for you?

JensRantil on 3 Apr 2016

👀1

My jobs are around 800kbytes, maybe try with that size.

pentium10 on 3 Apr 2016

@pentium10 711769*800 kbytes = ~543 GBytes. Just making sure here, are you sure you had that many items in memory? Pretty beefy server, huh? ;) Also, are you sure you weren't swapping memory to disk?

JensRantil on 3 Apr 2016

I will double check, we have only 64GB.

pentium10 on 4 Apr 2016

absolutely needed feature

terion-name on 12 Apr 2018

Could everyone asking for this feature explain why you guys simply can't do what my script in https://github.com/kr/beanstalkd/issues/25#issuecomment-205023390 does? That is, simply reserve and delete all jobs one by one. Beanstalkd is very fast.

JensRantil on 17 Apr 2018

@JensRantil really? I could think at least few actually:

atomicity of operation
minimizing Round-Trip-Time (see pipelining feature of Redis)
huge speed improvement of the operation itself due to possible optimization of the process of clearing a tube instead of deleting items one-by-one
better tolerance for errors like net downtime, packets being lost etc.

prgTW on 17 Apr 2018

👍5

@prgTW Thanks for taking the time to answer. Please consider your tone. Stuff like "really?" and "c'mon" isn't gonna help your point through better.

I agree with you that the operation clearly would be faster than emptying the queue one by one. That said, one benefit of emptying the queue one-by-one is that it follow the same process as general consumption which will allow other consumers to not be surprised that their current task was deleted by someone else.

Regarding your RTT time you are assuming your client will execute a single operation at a time, but you should be able to do pipelining with beanstalkd just like you do in Redis. It should give you like a 100x speedup depending on your batch size of operations. https://redis.io/topics/pipelining might be a nice reference if you want to read up on it.

Think of the probability of error doing 20k requests instead of just 1.

Yeah, but that's what we have TCP for, right? :-)

JensRantil on 18 Apr 2018

Yeah, but that's what we have TCP for, right? :-)

Consider also web server timeouts, application timeouts, keepalive timeouts or any other system/application error that might come in unexpectedly during this long running process. I personally would appreciate the atomicity benefit from "clear tube" command.

Sorry for the tone but showing benchmarks on localhost (an idyllic environment) with every message as big as few bytes was a bit naive, but I apologize anyway :)

prgTW on 19 Apr 2018

This feature looks good to me. From performance point of view, it's a must.

ysmolsky on 5 Sep 2019

👍3

I'd like to have this feature too.
Now I am using reserve(0) + delete(job_id) to simulate clear tube (method taken from @JensRantil ) , but it can not delete buried or not ready job.