Beanstalkd: clear tube command

Created on 4 Nov 2009  路  16Comments  路  Source: beanstalkd/beanstalkd

Most helpful comment

@JensRantil really? I could think at least few actually:

  • atomicity of operation
  • minimizing Round-Trip-Time (see pipelining feature of Redis)
  • huge speed improvement of the operation itself due to possible optimization of the process of clearing a tube instead of deleting items one-by-one
  • better tolerance for errors like net downtime, packets being lost etc.

It's easy to say that beanstalk is fast having it installed on localhost and doing benchmarks on it but c'mon, not all of us install beanstalk on the same machine as the queuing/executor scripts. Imagine having 20k queued messages and RTT time even as low as only 1ms. You're wasting 20 seconds just for RTT! Now think of the same situation when RTT time is 10ms (I abstract the reasons of this situation). That's 200 seconds just for RTT time. Think of the probability of error doing 20k requests instead of just 1.

All 16 comments

Was any work ever done on this issue? I know I could definitely use this on a project that extensively uses Beanstalk.

There have been a few pull requests, but nothing merged in yet.
The latest is #87.

Any progress on this clear tube command? I have a tube with: 711769 jobs that I need to clear. It runs already for hours, doing individual deletes, but is advancing very slow. I am looking to get a more efficient way.

Yea, flushing a tube would be nice. The whole queue while we're at it.

:+1: +1 to this idea

@pentium10 Interesting it takes such a long time for you to simply delete your jobs. My little test script does that pretty quickly:

import sys
import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)

if sys.argv[1] == "fill":
  print "Filling"
  for i in range(711769):
    beanstalk.put("hello")
elif sys.argv[1] == "clear":
  run = True
  while run:
    job = beanstalk.reserve(timeout=0)
    if job is None:
      run = False
    else:
      job.delete()

Micro-benchmark:

$ time ./testpy/bin/python test.py fill                                                                                             [20:01:23]
ERROR:root:Failed to load PyYAML, will not parse YAML
Filling
./testpy/bin/python test.py fill  18.73s user 11.42s system 57% cpu 52.188 total
$ time ./testpy/bin/python test.py clear                                                                                            [20:02:06]
ERROR:root:Failed to load PyYAML, will not parse YAML
./testpy/bin/python test.py clear  41.44s user 22.28s system 57% cpu 1:49.88 total

Any idea why clearing the tube is so slow for you?

My jobs are around 800kbytes, maybe try with that size.

@pentium10 711769*800 kbytes = ~543 GBytes. Just making sure here, are you sure you had that many items in memory? Pretty beefy server, huh? ;) Also, are you sure you weren't swapping memory to disk?

I will double check, we have only 64GB.

absolutely needed feature

Could everyone asking for this feature explain why you guys simply can't do what my script in https://github.com/kr/beanstalkd/issues/25#issuecomment-205023390 does? That is, simply reserve and delete all jobs one by one. Beanstalkd is very fast.

@JensRantil really? I could think at least few actually:

  • atomicity of operation
  • minimizing Round-Trip-Time (see pipelining feature of Redis)
  • huge speed improvement of the operation itself due to possible optimization of the process of clearing a tube instead of deleting items one-by-one
  • better tolerance for errors like net downtime, packets being lost etc.

It's easy to say that beanstalk is fast having it installed on localhost and doing benchmarks on it but c'mon, not all of us install beanstalk on the same machine as the queuing/executor scripts. Imagine having 20k queued messages and RTT time even as low as only 1ms. You're wasting 20 seconds just for RTT! Now think of the same situation when RTT time is 10ms (I abstract the reasons of this situation). That's 200 seconds just for RTT time. Think of the probability of error doing 20k requests instead of just 1.

@prgTW Thanks for taking the time to answer. Please consider your tone. Stuff like "really?" and "c'mon" isn't gonna help your point through better.

I agree with you that the operation clearly would be faster than emptying the queue one by one. That said, one benefit of emptying the queue one-by-one is that it follow the same process as general consumption which will allow other consumers to not be surprised that their current task was deleted by someone else.

Regarding your RTT time you are assuming your client will execute a single operation at a time, but you should be able to do pipelining with beanstalkd just like you do in Redis. It should give you like a 100x speedup depending on your batch size of operations. https://redis.io/topics/pipelining might be a nice reference if you want to read up on it.

Think of the probability of error doing 20k requests instead of just 1.

Yeah, but that's what we have TCP for, right? :-)

Yeah, but that's what we have TCP for, right? :-)

Consider also web server timeouts, application timeouts, keepalive timeouts or any other system/application error that might come in unexpectedly during this long running process. I personally would appreciate the atomicity benefit from "clear tube" command.

Sorry for the tone but showing benchmarks on localhost (an idyllic environment) with every message as big as few bytes was a bit naive, but I apologize anyway :)

This feature looks good to me. From performance point of view, it's a must.

I'd like to have this feature too.
Now I am using reserve(0) + delete(job_id) to simulate clear tube (method taken from @JensRantil ) , but it can not delete buried or not ready job.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ramSeraph picture ramSeraph  路  4Comments

ysmolsky picture ysmolsky  路  8Comments

ysmolsky picture ysmolsky  路  4Comments

raju-divakaran picture raju-divakaran  路  3Comments

Minnozz picture Minnozz  路  14Comments