Puma: Socket removed after restart

Created on 24 Sep 2019  路  30Comments  路  Source: puma/puma

Using systemd and hot reloading as described(https://github.com/puma/puma/blob/master/docs/systemd.md#socket-activation) the puma socket file is removed when systemctl restart puma.service is issues. This worked until Puma version > 3.12.

ruby 2.6.3
puma 4.2.0

Puma config:

directory '/var/app/current'
environment 'staging'

pidfile '/var/app/shared/tmp/pids/puma.pid'
state_path '/var/app/shared/tmp/pids/puma.state'
stdout_redirect '/var/app/shared/log/puma_access.log', '/var/app/shared/log/puma_error.log', true

workers 2
threads 1, 2

bind 'unix:///var/app/shared/tmp/sockets/puma.sock'
activate_control_app 'tcp://127.0.0.1:9293', { auth_token: '...' }

preload_app!

on_restart do
  puts 'Refreshing Gemfile'
  ENV["BUNDLE_GEMFILE"] = '/var/app/current/Gemfile'
end

before_fork do
  ActiveRecord::Base.connection_pool.disconnect!
end

on_worker_boot do
  SemanticLogger.reopen if defined?(SemanticLogger)

  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
end

puma.service:

[Unit]
Description=Puma HTTP Server
After=network.target

Requires=puma.socket

[Service]
Type=simple

User=appuser
Group=appuser

WorkingDirectory=/var/app/current

Environment=PUMA_DEBUG=1

ExecStart=/var/app/rvm1scripts/rvm-auto.sh . bundle exec --keep-file-descriptors puma -C /var/app/shared/config/puma.rb

Restart=always

TimeoutStopSec=180

[Install]
WantedBy=multi-user.target

puma.socket:

[Unit]
Description=Puma HTTP Server Accept Sockets

[Socket]
SocketUser=appuser
SocketGroup=appuser
ListenStream=/var/app/shared/tmp/sockets/puma.sock

NoDelay=true
ReusePort=true
Backlog=1024

[Install]
WantedBy=sockets.target

To Reproduce

  • With Puma running under systemd with socket activation, confirm that the configured socket file exists.
  • Issue a systemctl restart puma.service
  • Confirm that the configured socket file is now missing
bug

Most helpful comment

@mcg

I think https://github.com/puma/puma/pull/2007 should fix this issue. Once again, if you can test it, that would be much appreciated...

Previously, Puma unlinked all UNIXSocket files regardless of whether they were pre-existing. Now, pre-existing files are not unlinked. I'm hoping that's what was causing the issue with systemd.

All 30 comments

@mcg,

Trying to tie all this together. I'm looking at https://github.com/puma/puma/issues/1842#issuecomment-511000104, and confused, as that's Puma 4.0.1, and it seems to be working.

Hence, what am I missing? If you have any logs, they may be helpful...

@MSP-Greg https://github.com/puma/puma/issues/1842#issuecomment-511000104 isn't working for me, that's my non-working config.

Logs look "normal" tho the socket gets removed:

puma.service log

-- Unit puma.service has finished starting up.
--
-- The start-up result is done.
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: % Registered unix:/var/app/shared/tmp/sockets/puma.sock for activation from LISTEN_FDS
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] Puma starting in cluster mode...
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Version 4.2.0 (ruby 2.6.3-p62), codename: Distant Airhorns
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Min threads: 1, max threads: 2
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Environment: staging
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Process workers: 2
Sep 24 18:39:18 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Preloading application
Sep 24 18:39:29 staging-appserver-0 rvm-auto.sh[8174]: [8174] * Activated unix:///var/app/shared/tmp/sockets/puma.sock
Sep 24 18:39:29 staging-appserver-0 rvm-auto.sh[8174]: [8174] ! WARNING: Detected 4 Thread(s) started in app boot:
Sep 24 18:39:29 staging-appserver-0 rvm-auto.sh[8174]: [8174] Use Ctrl-C to stop

puma_access.log

[12489] - Gracefully shutting down workers...
=== puma startup: 2019-09-24 18:39:29 +0000 ===
[8174] * Starting control server on tcp://127.0.0.1:9293
[8174] - Worker 0 (pid: 9222) booted, phase: 0
[8174] - Worker 1 (pid: 9232) booted, phase: 0

puma_error.log

=== puma startup: 2019-09-24 18:39:29 +0000 ===

@gingerime @mcg

A decision was made to remove UNIXSocket files, and the change took effect in 4.2.0. I believe this may affect systemd when (and only when) a stop/start is done.

I think, but I'm not sure, that it should not affect puma 'phased-restart' or puma 'restart'.

JFYI:

Puma 'phased-restart' leaves all the listener sockets open.

Puma 'restart' replaces itself via Kernel.exec, and doesn't remove the UNIXSocket files.

If the UNIXSocket files exists, but is not connected to a UNIXServer, Puma will delete it and start as it should.

This has been broken since, 4.0.0. not 4.2.0. This effects systemd when used as documented with socket activation and hot restarts. If this change is intentional, it needs to be documented and marked as a breaking change.

@MSP-Greg It's tricky for me to reproduce this, but I believe this also affects phased-restart systemctl reload puma.service, and I see that log file handles aren't being updated with logrotate... I mentioned this on #1962 and also https://github.com/puma/puma/issues/1842#issuecomment-534886672

EDIT: to clarify, my systemd script runs pumactl ... phased-restart when you issue systemctl reload puma.service

EDIT2: my repro was focusing on the restart (not phased-restart) issue so far, which was easier to reproduce... I can try to also reproduce the phased-restart issue, but this might be trickier.

Referencing @gingerlime repo that seems to reproduce this here so it's not lost. https://github.com/gingerlime/puma-test-socket

May be an unrelated code change but while trying to create a reproducible VM for this I notice that the puma_access.log is different for 4.2.0 on restarts.
For 4.2.0

=== puma startup: 2019-09-25 15:36:02 +0000 ===
[17048] * Starting control server on tcp://127.0.0.1:9293
[17048] - Worker 0 (pid: 17059) booted, phase: 0
[17048] - Worker 1 (pid: 17061) booted, phase: 0
[17059] ! Detected parent died, dying
[17061] ! Detected parent died, dying

For 3.21

=== puma startup: 2019-09-25 15:33:40 +0000 ===
[16513] * Starting control server on tcp://127.0.0.1:9293
[16513] - Worker 0 (pid: 16524) booted, phase: 0
[16513] - Worker 1 (pid: 16526) booted, phase: 0
[16513] - Gracefully shutting down workers...

@mcg

Thanks for the info. All this is helpful.

But, I'm only windows locally, and using systemd is something that hasn't been implemented in CI yet. Maybe another Actions workflow can test it, not sure...

From above:

ExecStart=/var/app/rvm1scripts/rvm-auto.sh . bundle exec --keep-file-descriptors puma -C /var/app/shared/config/puma.rb

Restart=always

From @gingerlime's https://github.com/gingerlime/puma-test-socket

ExecStart=/usr/bin/bundle exec --keep-file-descriptors puma -C /app/puma.rb
ExecReload=/usr/bin/bundle exec pumactl -F /app/puma.rb phased-restart

Those are different. As I tried to mention above, the code paths for start/stop/start, start/restart, and start/phased-restart are all different.

Also, Puma's existing CI is mostly 'signal', but people seem to be using pumactl socket control with systemd. I mention this because I've been working in my Puma fork on an updated test framework, and there are some differences between control behavior, dependent on whether signal or socket used.

So, at least from my perspective, it's very important to know what commands/protocl were used with logs showing problems. IOW, start/stop/start, start/restart, or start/phased-restart, and whether signal or socket.

Small point, I've seen 3.12 mentioned, but 3.12.0 and 3.12.1 were released about 9 months apart. I haven't really reviewed the differences, but the exact version may be helpful to know....

I can't speak for @nateberkopec, but I suspect he feels the same way I do. I strongly want to get Puma working and stable in all configurations. For instance, I've got a lot of test changes (improvements?) , but I don't want to push them until Puma is bug free and stable...

Finally, what you've shown above I'll see if I can repo without systemd...

The documented systemd config is using signals. It seems almost like a waste of time to try and reproduce without systemd, at least for my issue as it's using systemd's socket activation.

@mcg

Thanks. I've looked at the systemd files a bit more, and found some systemd man pages (haven't read yet).

A recent change affected closing UNIXSocket files, PR https://github.com/puma/puma/pull/1992 reverts the recent change.

There is one more place where this is done, it was added between 3.12.1 and 4.0.0. I haven't reverted that, but if this doesn't change anything, we can try that next.

I'm assuming that systemd will consider UNIXSockets 'handled' if the file exists, and that the issue now is that because the files have been removed, it's trying to 'hold' them?

@MSP-Greg No change for me using #1992. Socket is still removed.

@mcg

Thanks for checking. I know this is a PITA.

I mentioned I'm working on revising some of the test framework, and I've added a lot of tests just because. Puma is stable, this issue is particular to using Puma with systemd. Excepting the log issue that I haven't looked at.

The other place where files are unlinked only happens when SIGTERM is sent to the process. Could that be the case with your setup?

Otherwise, the remaining unlink statements happen in the code involved with inheriting listeners...

Sanity check. You're issue is with start(1)/stop/start(2) and start(2) is missing the socket?

Sanity check. You're issue is with start(1)/stop/start(2) and start(2) is missing the socket?

@MSP-Greg Correct.

Starting with a system that Puma is not running on. If I systemctl start puma.socket puma.service. The socket is created by systemd, and Puma starts and works. Then if a systemctl restart puma.service is issued, puma is stopped and restarted by systemd. Puma comes back up, but the socket is gone.

@mcg

Thanks. Not going crazy.

=== puma startup: 2019-09-25 15:36:02 +0000 ===
[17048] * Starting control server on tcp://127.0.0.1:9293
[17048] - Worker 0 (pid: 17059) booted, phase: 0
[17048] - Worker 1 (pid: 17061) booted, phase: 0
[17059] ! Detected parent died, dying
[17061] ! Detected parent died, dying

The above seems to indicate that Puma is being killed by maybe SIGKILL?

I don't know if you can try a SIGINT (stop) or SIGTERM (graceful stop)?

JFYI, I'm looking at
https://www.freedesktop.org/software/systemd/man/systemd.service.html
and
https://www.freedesktop.org/software/systemd/man/systemd.kill.html
but not being able to actually use it is not good...

EDIT: Original systemd PR: https://github.com/puma/puma/pull/231 commit: https://github.com/puma/puma/commit/32088ec6ed30dd09069e42c10a73e9fc3ed93688

I also meet this issue without using systemd
https://github.com/puma/puma/issues/1860#issuecomment-534875306

@MSP-Greg

  • Tried sending 2 - SIGINT, Puma restarted, socket removed
  • Tried sending 15 - SIGTERM, Puma restarted, socket removed
  • Tried sending 12 - SIGUSR2, Initially appeared to restart workers as it should, then the Puma primary process died and was restarted by systemd. Socket also removed.

@mcg

Thanks for checking. After looking at both Puma & systemd docs last night, I may have an idea where the issue is, but that's based on some assumptions that may be incorrect.

I need to check some things with Puma first, as I'm going to see if I can mimic the LISTEN_FDS used by systemd, as that's where the info re activated sockets is contained.

Overall, there is one issue that I am confused about regarding start/stop/start operations not working. The implication is that systemd is providing a different environment on the second start than it is on the first. Why is that?

JFYI,

  1. 'inherited sockets' exists when (and only when) Puma is restarted. They do not exist with stop/start, or with phased-restart. These are created based on ENV[/PUMA_INHERIT_\d+/] values.

  2. 'activated sockets' only exist when 'imported' from the systemd environment. These are created based on ENV['LISTEN_FDS'] and ENV['LISTEN_PID'] values.

BTW, sorry for all the info. but months from now, if there are issues with systemd, maybe I or someone else may find them helpful...

Found a bad workaround. chmod the directory the socket is stored in such that the user Puma is running under cannot delete the socket. Restarts work fine.

Perhaps this helps find where the socket is being deleted. With socket deletion blocked by chmod, errors in the puma_error.log.

Errno::EACCES: Permission denied @ apply2files - /var/www/app/shared/tmp/sockets/puma.sock
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/binder.rb:427:in `unlink'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/binder.rb:427:in `block in close_unix_paths'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/binder.rb:427:in `each'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/binder.rb:427:in `close_unix_paths'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/launcher.rb:208:in `close_binder_unix_paths'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/cluster.rb:532:in `ensure in run'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/cluster.rb:532:in `run'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/launcher.rb:172:in `run'
/ruby-2.6.3@app/gems/puma-4.2.0/lib/puma/cli.rb:80:in `run'
/ruby-2.6.3@app/gems/puma-4.2.0/bin/puma:10:in `<top (required)>'
/ruby-2.6.3@app/bin/puma:23:in `load'
/ruby-2.6.3@app/bin/puma:23:in `<top (required)>'

@mcg

Thanks.

Hate to ask, are you sure you tested off of https://github.com/puma/puma/pull/1992? The first diff shown in the first commit removes cluster.rb:532 (you may have to scroll to it, shown in 'orange'), which is shown above? IOW, https://github.com/puma/puma/pull/1992 should have worked?

I tested restarts with #1992 but reverted back to released 4.2.0. I'll go back to #1992 and try this again.

With #1992

Errno::EACCES: Permission denied @ apply2files - /var/www/app/shared/tmp/sockets/puma.sock
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/binder.rb:422:in `unlink'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/binder.rb:422:in `block in close_listeners'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/binder.rb:418:in `each'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/binder.rb:418:in `close_listeners'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/launcher.rb:205:in `close_binder_listeners'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/cluster.rb:391:in `block in setup_signals'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/cluster.rb:491:in `select'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/cluster.rb:491:in `run'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/launcher.rb:172:in `run'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/lib/puma/cli.rb:80:in `run'
/ruby-2.6.3@app/bundler/gems/puma-23a61ad827e5/bin/puma:10:in `<top (required)>'
/ruby-2.6.3@app/bin/puma:23:in `load'
/ruby-2.6.3@app/bin/puma:23:in `<top (required)>'

@mcg

Thank you. Remember how I said in https://github.com/puma/puma/issues/1988#issuecomment-535181160 that there was a 2nd instance where the files were unlinked, but I thought it didn't affect this use case?

I was wrong.

EDIT:

You've been very helpful and patient. Thank you. Personally, I'd like to try to mimic some of the environment settings used by systemd in the test suite, I may have some questions.

@nateberkopec

I think I know what changes are needed for this. Puma has always worked fine if the UNIXSocket files were not removed, so I think we should remove all code that does so. Maybe even test for that.

But, I think there are some duplicate test method names, so maybe we should use file names that also include the test class...

You're working on some refactoring, I'm doing the same with the tests and also puma/control_cli.rb.

Maybe get this, https://github.com/puma/puma/issues/1986, and whatever else patched, do a point release, let the dust settle, and start back on the refactoring work?

Also, maybe set JRuby back to 'allow-failure' until it can be made stable? After all the work on tests, always having to check logs, etc, I'd really like to see a few green checks...

@mcg

I think https://github.com/puma/puma/pull/2007 should fix this issue. Once again, if you can test it, that would be much appreciated...

Previously, Puma unlinked all UNIXSocket files regardless of whether they were pre-existing. Now, pre-existing files are not unlinked. I'm hoping that's what was causing the issue with systemd.

@MSP-Greg Looks like #2007 works. Have tried many restarts and the socket is not removed.

Woo hoo! Great work @MSP-Greg!

Sent with GitHawk

Thanks a lot everyone!! I also tested it and restart works just fine.

However, I'm still seeing issues with logrotate (see #1962) keeping writing to rotated file... but perhaps it's a misconfiguration on my part?

Could it be that there is a regression?

I'm running 4.3.5 and see following again:

Errno::ENOENT: No such file or directory @ apply2files - /home/user/apps/myapp/shared/sockets/puma.myapp.sock
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/binder.rb:368:in `unlink'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/binder.rb:368:in `block in close_listeners'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/binder.rb:363:in `each'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/binder.rb:363:in `close_listeners'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/launcher.rb:205:in `close_binder_listeners'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/cluster.rb:391:in `block in setup_signals'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/cluster.rb:521:in `rescue in run'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/cluster.rb:481:in `run'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/launcher.rb:172:in `run'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/lib/puma/cli.rb:80:in `run'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/gems/puma-4.3.5/bin/puma:10:in `<top (required)>'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/bin/puma:23:in `load'
  /home/user/apps/myapp/shared/bundle/ruby/2.6.0/bin/puma:23:in `<top (required)>'

I have set:

ExecStart=/home/user/.rbenv/bin/rbenv exec bundle exec --keep-file-descriptors puma -C ..puma_config_path..
ExecStop=/home/user/.rbenv/bin/rbenv exec bundle exec --keep-file-descriptors pumactl -F ..puma_config_path.. stop
ExecReload=/home/user/.rbenv/bin/rbenv exec bundle exec --keep-file-descriptors pumactl -F ..puma_config_path.. restart

but also with removing ExecStop and ExecReload my call to sudo systemctl restart puma.service deletes the socket?

@h0jeZvgoxFepBQ2C Running 4.3.5 and not seeing this issue here. We are using systemd as configured in the Puma docs. Type=simple and only ExecStart=/<FULLPATH>/bin/puma -C <YOUR_APP_PATH>/puma.rb. No ExecStop or ExecReload needed.

Was this page helpful?
0 / 5 - 0 ratings