In the resque README
there is a mention of no memory leaks and I am searching for a memory leak in our code base when i came across Resque in htop having at least 250 threads open. I have another project that uses Resque as well, and that project had just 12 threads/processes reported by htop for 12 workers.
Confused I came here, after asking on stackoverflow.
Gemfile
$> ag resq Gemfile.lock
274: resque (1.25.2)
280: resque_mailer (2.2.6)
432: resque (~> 1.25.2)
433: resque_mailer
$ ps aux | grep resq
root 876 0.0 0.0 61676 0 ? Ss Aug08 0:00 su - deploy -c cd /var/www/releases/20150806222346; export PORT=5000; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-1.log 2>&1
root 878 0.0 0.0 61676 0 ? Ss Aug08 0:00 su - deploy -c cd /var/www/releases/20150806222346; export PORT=5001; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-2.log 2>&1
root 879 0.0 0.0 61676 0 ? Ss Aug08 0:00 su - deploy -c cd /var/www/releases/20150806222346; export PORT=5002; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-3.log 2>&1
deploy 880 0.0 0.0 20864 4 ? Ss Aug08 0:00 -su -c cd /var/www/releases/20150806222346; export PORT=5000; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-1.log 2>&1
root 883 0.0 0.0 61676 0 ? Ss Aug08 0:00 su - deploy -c cd /var/www/releases/20150806222346; export PORT=5003; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-4.log 2>&1
deploy 887 0.0 0.0 20864 4 ? Ss Aug08 0:00 -su -c cd /var/www/releases/20150806222346; export PORT=5001; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-2.log 2>&1
deploy 892 0.0 0.0 20864 4 ? Ss Aug08 0:00 -su -c cd /var/www/releases/20150806222346; export PORT=5002; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-3.log 2>&1
deploy 896 0.0 0.0 20864 4 ? Ss Aug08 0:00 -su -c cd /var/www/releases/20150806222346; export PORT=5003; export RAILS_ENV=production; export QUEUES=critical,high,low; export COUNT=2; bundle exec rake environment resque:work >> /var/www/shared/upstart_log/worker-4.log 2>&1
deploy 1087 0.9 8.9 4812772 183464 ? Sl Aug08 22:00 resque-1.25.2: Waiting for critical,high,low
deploy 1113 0.9 8.9 4845800 182512 ? Sl Aug08 21:45 resque-1.25.2: Waiting for critical,high,low
deploy 1186 0.9 10.1 4785708 207892 ? Sl Aug08 22:04 resque-1.25.2: Waiting for critical,high,low
deploy 1223 0.8 9.5 4559812 195652 ? Sl Aug08 20:36 resque-1.25.2: Waiting for critical,high,low
On htop, there are hundreds of lines like the following sample:
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
22013 deploy 20 0 4700M 179M 2776 S 0.0 9.0 0:00.93 resque-1.25.2: Waiting for critical,high,low
22366 deploy 20 0 4700M 179M 2776 S 0.0 9.0 0:01.00 resque-1.25.2: Waiting for critical,high,low
22487 deploy 20 0 4700M 179M 2776 S 0.0 9.0 0:00.94 resque-1.25.2: Waiting for critical,high,low
22523 deploy 20 0 4700M 179M 2776 S 0.0 9.0 0:01.09 resque-1.25.2: Waiting for critical,high,low
..... and on.....
While i know each one of these threads doesn't count towards memory usage, it's the rough total of each line in the ps aux
command above for the Sl
columns, when each of those get to 20% each, that's 80% of the Ram we're using, and then one of the bigger jobs gets triggered on the site pushing the box into Swap and killing the Job mid way through, eventually.
resque_mailer
is not used properly in our app. so it's really just for show as I was hoping that was cause for this behaviour but alas it is not.
netstat -ntp
only has like 4 connections to the Redis DB on another box.
ruby -v
ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux]
Each sharp decline is a restart either by human or the system.
We use that same version of resque in production and do not experience this issue.
This may be helpful in determining where the memory consumption is happening in your code: http://blog.skylight.io/hunting-for-leaks-in-ruby/
The working model of how Resque processes jobs is crudely designed to prevent memory leaks by giving each job a _fork_ of the main worker _process_; the parent process acquires a job, forks and waits for the newly-created child to complete, relying on your system to clean up the child process' memory consumption. Because of this model, memory allocated by one job _cannot_ be leaked into the parent (long-lived worker) process.
That said, since your application is fully loaded in the parent process, it is entirely possible that something in your code is getting activated _before_ the worker enters its work loop (e.g., in app initialization), and that _that_ code is spawning threads unreasonably. It may be helpful to figure out where those threads are coming from to attach gdb to your long-lived worker process & do a thread dump.
Yes, this seems like a bug. But it might not be in Resque. Is there any way we can get a reproduction?
Without a way to reproduce this, I will not be able to investigate. If you can come up with one, please let me know, and I would be happy to re-open. Thank you!
@pjammer did you ever find a solution to this?
OK, for me this was down to still having a call to ActiveRecord::Base.establish_connection
in a Resque.before_fork
hook after upgrading from Rails 4 to Rails 5.2.x
, see https://github.com/rails/rails/pull/31241, after removing the before_fork
hook htop
shows one thread/process per worker and eliminates my Errno::ENOMEM Cannot allocate memory - fork(2)
error.
Most helpful comment
OK, for me this was down to still having a call to
ActiveRecord::Base.establish_connection
in aResque.before_fork
hook after upgrading from Rails 4 to Rails5.2.x
, see https://github.com/rails/rails/pull/31241, after removing thebefore_fork
hookhtop
shows one thread/process per worker and eliminates myErrno::ENOMEM Cannot allocate memory - fork(2)
error.https://github.com/rails/rails/pull/33094