Puma: Puma not restarting (or at least not loading new current) when notified with SIGUSR2

Created on 24 Nov 2013  路  32Comments  路  Source: puma/puma

When my cap deploys, it sends a SIGUSR2 to Puma - tried manually with kill and with pumactl.

pumactl says the server will restart, but puma doesn't run the new code (the message is ' Command restart sent success').

To get the new code served, puma needs an hard restart (i.e. need to send it a SIGINT and then start).

I can't confirm if with SIGUSR2 puma doesn't restart or simply doesn't change working directory - as there is a new deploy. What is clear is that it keeps running the old code.

Setup is ubuntu 12.04, rails 4 and Ruby 1.9.3. Puma is 2.6.0.

Thanks.

bug restart

Most helpful comment

This issue seems to recur in 3.9.0 and 3.9.1.

3.8.2 works totally fine.

All 32 comments

I am having the same problem across different servers. Sometimes it doesn't restart, other times it seems that the process is stopped but never started again. This is on 2.7.1.

Edit: Actually, this may be that I deployed between the update from 2.6 -> 2.7.1. I'll deploy a bunch more today so I'll see if the problem continues with 2.7.1.

I've experienced something like this.

bundle exec pumactl -F config/puma.rb restart wherein my puma.rb contains the pidfile, working directory, binding, etc.

It reports "Command restart sent success", and then it just kills the puma workers and the master process ends. Here's what the log says:

[17058] - Gracefully shutting down workers...
[17058] * Restarting...
Available commands: halt, restart, phased-restart, start, stats, status, stop

Same here, I have to stop then start puma server sometimes. I've noticed the PID actually changed when running restart or phased-restart, but still applying the old code.

What version of ruby are you using?

@evanphx The development side is 2.0.0-p353 and the production is 2.0.0-p195, and puma version is 2.7.1, is that help? Thanks.

ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

I'm seeing the same thing. Running "pumactl -S /my_app/path/shared/pids/puma.state restart" (or phased-restart) keeps the master process pid, and the 2 clustered workers restart with new pids, but the master and clustered workers continue to work out of the previous release path.

I'm running puma (2.7.1) on ruby 1.9.3p448 on ubuntu.

Excerpt from ps:

$ sudo ps -ef |grep 29171
puma     27146 29171  0 18:32 ?        00:00:02 puma: cluster worker: 29171
puma     27150 29171  0 18:32 ?        00:00:06 puma: cluster worker: 29171
puma     29171     1  0 Jan13 ?        00:05:11 /usr/local/ruby/1.9.3-p448/bin/ruby /my_app/path/current/bin/puma -C /my_app/path/shared/config/puma.rb

Excerpt from lsof: (note the current working directory shown for the process is correct but gems are running out of the old path)

$ sudo lsof |grep releases
ruby      27146       puma  cwd       DIR              202,1      4096     401852 /my_app/path/releases/f732c7...
ruby      27146       puma  mem       REG              202,1     85394     279105 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/bcrypt-ruby-3.1.2/lib/bcrypt_ext.so
ruby      27146       puma  mem       REG              202,1   1013965     289051 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/rubyeventmachine.so
...

ruby      27150       puma  cwd       DIR              202,1      4096     401852 /my_app/path/releases/f732c7...
ruby      27150       puma  mem       REG              202,1     85394     279105 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/bcrypt-ruby-3.1.2/lib/bcrypt_ext.so
ruby      27150       puma  mem       REG              202,1   1013965     289051 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/rubyeventmachine.so
...

ruby      29171       puma  cwd       DIR              202,1      4096     401852 /my_app/path/releases/f732c7...
ruby      29171       puma  mem       REG              202,1     85394     279105 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/bcrypt-ruby-3.1.2/lib/bcrypt_ext.so
ruby      29171       puma  mem       REG              202,1   1013965     289051 /my_app/path/releases/c81ea3.../vendor/bundle/ruby/1.9.1/gems/eventmachine-1.0.3/lib/rubyeventmachine.so
...

Any ideas??

Update: We did an strace on the puma pid and watched it during a pumactl restart. We noticed that everything was running out of the correct path until the standard rails boot.rb file (/my_app/path/[new_release]/config/boot.rb) fired and loaded the Gemfile from the previous release's path. When looking at boot.rb it was clear that it if the environment variable BUNDLE_GEMFILE already exists, it would use that rather than re-evaluating and setting it. That env would have already been set in the puma session by /my_app/path/[old_release]/bin/puma (the binstub created by bundler) and rather than using "/my_app/path/current/Gemfile" it would have used the ".realpath". In short, the final value of BUNDLE_GEMFILE would be '/my_app/path/[old_release]/Gemfile' rather than the desired '/my_app/path/current/Gemfile' and so all gemfiles would run out of that path. Our solution is to set the BUNDLE_GEMFILE environment variable in our puma upstart script to "/my_app/path/current/Gemfile".

I can confirm @paustin01's findings because my Puma process's proc directory enviro file holds the previous revision's BUNDLE_GEMFILE and not the new release directory which Puma is running from (as mentioned previously, observable via lsof).

cat /proc/cat /your/puma.pid/enviro

Can anyone provide a simple script or program that can be used to reproduce the behavior?

Is there a reason this got closed? This is still very much an issue, and it's trivial to reproduce: all you need is any project running puma from within a symlinked directory (e.g., deployed by capistrano), install new gems in a new revision, and reload puma with SIGUSR2 with re-use the BUNDLE_GEMFILE env var in Rail's boot.rb thus not loading new gems.

This is definitely still an issue. It's the only reason our apps are still using Unicorn right now.

@sorentwo I worked around this issue by adding BUNDLE_GEMFILE=/deploy_path/current/Gemfile to my init.d script for Puma. If you're using Capistrano, you could similarly add that env var to the commands it runs.

@jcoleman: thanks, that is really helpful to know about. I wasn't aware of the specific point of failure here (the gems/Gemfile), it just seemed like we had random restart failures.

@sorentwo Are you using the prune_bundler option? It's documented here: https://github.com/puma/puma/blob/master/DEPLOYMENT.md#restarting

@evanphx: Nope, I wasn't. In fact, I'd never seen that wiki page. The last time I was setting things up was in November 2013, a little before the page was available. That's an excellent tip though. I'll try an experiment next week and let you know how things go.

@evanphx I didn't realize that the prune_bundler option existed either. My test case isn't easily reproducible now though because I had to restart puma with the updated config.

We're seeing this issue also using Puma 2.9.0 deployed via Chef (it uses the same current symlink strategy as Capistrano). Puma is not using newly deployed code after being issued a SIGUSR2. Doing a stop/start gets Puma to use the new code.

Phased restart still doesn't work even with @jcoleman's solution. Running puma 2.9.1.

But with hot restart, it seemed to work without setting the env variable.

My solution ended up not fixing it for me either. The prune_bundler option frankly just doesn't work.

prune_bundler had a bad bug related to restarts that was fixed in 2.9.2. Did you try it?

hot restart stops loading the new context in puma 2.9.2

Tell me more: what to you mean by new context? What does your config look like? Could you provide some output from puma?

What I mean by context is the line "Gemfile in context: /path/to/Gemfile" in puma.log

threads 0, 16
workers 2

bind "unix:///var/run/wwm.sock"
daemonize false
activate_control_app 'unix:///var/run/pumactl.sock'
stdout_redirect 'log/puma.log', 'log/puma_error.log', true
pidfile 'tmp/pids/puma.pid'
state_path 'tmp/pids/puma.state'
prune_bundler

I'm seeing this same issue as well (phased restart doesn't load new code)

Running Puma 2.11.2 on Ruby 2.2.2

We're deploying with Capistrano.

Here's my puma.rb:

workers 2
threads 5, 5

prune_bundler

rackup      DefaultRackup
environment production

directory '/www/my-app/current'
bind "unix:///var/run/puma.sock"
state_path "/var/run/puma.state"
pidfile "/var/run/puma.pid"

We're issuing this command to start the phased restart:

sudo pumactl --config-file /www/my-app/current/config/puma.rb phased-restart

When we issue the phased-restart the worker processes do restart (new PIDs), but they don't get the new code.

Where should I look for more information to figure out what's happening?

After coming back to this on a fresh day, I'm really not sure what I was experiencing. At one point in time my restarts definitely weren't working properly (as in, the new code definitely wasn't being loaded). This could have easily been a misconfiguration on my part.

After I noticed the problem I started testing by looking at the output from ps aux which included the directory it had loaded in the tag. Ex:

puma: cluster worker 1: 918 [20150429172950]

When I was deploying new code I could see that worker restart, but the content loaded (20150429172950) wasn't changing, so I assumed it still wasn't loading the new code. I have no idea what I did to fix it, but it's loading the new code properly (with the configuration in my previous comment).

Perhaps I should open a new issue about updating the Process tag on the workers to match the new directory.

At the moment I'm not even sure where it's pulling the current release from since i'm running this out of the /www/my-app/current directory.

To my surprise, I got this to work.

Using puma v2.15.3

# in /config/puma.rb
app = "my_app"
root = "/home/deployer/apps/#{app}"

workers  4
threads  1, 2

rackup      DefaultRackup
environment ENV['RAILS_ENV']
daemonize   true

directory "#{root}/current"
pidfile "#{root}/shared/tmp/pids/puma.pid"
stdout_redirect "#{root}/shared/log/#{Time.now.to_i}_production.log", "#{root}/shared/log/#{Time.now.to_i}_production_errors.log"
bind "unix:/tmp/puma.socket"

prune_bundler

It is crucial to stop and start puma with these settings to have the master process pick up :directory and, possibly, :prune_bundler directives.

Then you can bundle exec pumactl -p {pid_path} phased-restart

NB, that it may take a few moments for workers to pick up the new code, and also ps aux will incorrectly output workers' directory as the master processes' original directory, when the new code will actually be taken from the new symlinked current folder.

This issue seems to recur in 3.9.0 and 3.9.1.

3.8.2 works totally fine.

In our case [ruby 2.3.3p222]

puma (3.10.0)
puma_worker_killer (0.1.0)

Puma reloads ok on SIGUSR2 and doesn't reload on SIGUSR1.

It feels like preload_app gets automatically turned on somehow.

This may be related, using prune_bundler and SIGUSR2 via foreman

Just observed something interesting using puma's phased restarts, old code was running against new code in the same process.... I think it has to do with how long it takes for our app code to load, about 3-4 seconds, then boot, about 6-7 seconds

03/27/2017 11:17:19.000 -0500    
  [32187] - Starting phased worker restart, phase: 4
  [32187] + Changing to /srv/rails_server
03/27/2017 11:17:20.000 -0500
  [32187] - Stopping 22094 for phased upgrade...
  [32187] - TERM sent to 22094...
  [1292] + Gemfile in context: /srv/rails_server/Gemfile

timestamp=2017-03-27T16:17:23Z env=production frontend=www level=error thread_id=9400300 
  got a  NoMethodError on a model that a had that method added in the new code

timestamp=2017-03-27T16:17:23Z env=production frontend=www level=info thread_id=9400300 
  next request is the same controller and method, but works

03/27/2017 11:17:33.000 -0500   
[32187] - Worker 0 (pid: 1292) booted, phase: 4

timestamp=2017-03-27T16:17:33Z env=production frontend=www level=info thread_id=7420240  tag=rails message={"app_info":{"pid":1292,"load_time":"4.446515 seconds","boot_time":"7.682576 seconds","config":{"mysql":{"pool":64},"pg":{"pool":64},"dm":{"pool":64},"puma":{"workers":"2","threads":{"min":"64","max":"64"}},"haproxy":{"max_puma_connections":128},"sidekiq":{"concurrency":25,"timeout":8},"machine":{"processor_count":2,"physical_processor_count":1}}


[32187] - Worker 0 (pid: 1292) booted, phase: 4
[32187] - Stopping 22148 for phased upgrade...
[32187] - TERM sent to 22148...


03/27/2017 11:17:43.000 -0500   
[32187] - Worker 1 (pid: 1397) booted, phase: 4

timestamp=2017-03-27T16:17:43Z env=production frontend=www level=info thread_id=7420240  tag=rails message={"app_info":{"pid":1397,"load_time":"3.273302 seconds","boot_time":"6.384623 seconds","config":{"mysql":{"pool":64},"pg":{"pool":64},"dm":{"pool":64},"puma":{"workers":"2","threads":{"min":"64","max":"64"}},"haproxy":{"max_puma_connections":128},"sidekiq":{"concurrency":25,"timeout":8},"machine":{"processor_count":2,"physical_processor_count":1}},

[32187] - Worker 1 (pid: 1397) booted, phase: 4

IN our config/environment.rb tl;dr we

require 'benchmark'
require 'logger'

app_stats = {pid: Process.pid}

time = Benchmark.realtime {
  # Load the Rails application.
  require File.expand_path('../application', __FILE__)
}
app_stats[:load_time] = "#{format('%f', time)} seconds"

time = Benchmark.realtime {
  # Initialize the Rails application.
  Rails.application.initialize!
}
app_stats[:boot_time] = "#{format('%f', time)} seconds"

require File.expand_path('../../lib/app_formatter', __FILE__)
logger = Logger.new(STDOUT)
logger.formatter = AppFormatter.new
logger.info({app_info: app_stats}.to_json)

In our config/puma.rb, we have, in part:

workers 2
threads 32, 32
# AWS ELB "Idle timeout" is set to 30s. In order to prevent ELB 500s from unexpected connection closes, ensure that the backend persistent connection timeout is greater than this idle timeout. https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/config-idle-timeout.html
worker_shutdown_timeout 32
prune_bundler

If I flatten out the timeline:

03/27/2017 11:17:19.000 -0500 Starting phased worker restart
~~ 11:17:22 app code is being loaded
03/27/2017 11:17:23.000 -0500 got exception
~~ 11:17:26 app starts booting
03/27/2017 11:17:33.000 -0500  app booted, new worker booted, old worker sent TERM

https://github.com/puma/puma/blob/master/DEPLOYMENT.md#restarting

Use phased-restart (SIGUSR1 or pumactl phased-restart). 
This tells the master to kill off one worker at a time and restart them in your new code. 
This minimizes downtime and staggers the restart nicely. 

WARNING This means that both your old code and your new code will be running concurrently. 
Most deployment solutions already cause that, but it's worth warning you about it again. 
Be careful with your migrations, etc!

I experienced a similar issue with puma-3.8.2, where it fails to restart after adding more gems. But with puma >= 10.0.0 it works.

https://gist.github.com/x-yuri/52024f512bed39a2a0bd4c6e82d04c9f

P.S. Well, not that simple a script, but what did you expect?...

Yeah. I'm closing this one and let people create separate ones. This is so old it probably has a 1m long beard...

Was this page helpful?
0 / 5 - 0 ratings