Puma: phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)

Created on 8 Oct 2019  ·  47Comments  ·  Source: puma/puma

Describe the bug

This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.

We use a folder structure similar to this:

/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink

When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.

All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log

[12357] - Starting phased worker restart, phase: 4
[12357] - Starting phased worker restart, phase: 4
[12357] + Changing to /var/local/app/app
[12357] + Changing to /var/local/app/app
[12357] - Stopping 28107 for phased upgrade...
[12357] - Stopping 28107 for phased upgrade...
[12357] - TERM sent to 28107...
[12357] - TERM sent to 28107...
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] Early termination of worker
[7886] Early termination of worker
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
/usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound/usr/lib/ruby/site_ruby/2.5.0
/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
)
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile

❗️ Note that the app now lives in /app/app.d/909 but the errors are pointing to puma in /app/app.d/906 ❗️

Puma config:

#!/usr/bin/env puma

# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"

# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"

# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"

# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"

# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8


# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'


# === Cluster mode ===

# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)

# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler

If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys...)

To Reproduce

I was trying to reproduce this in a clean environment, but I can't manage to do it so far... :-/

Expected behavior
phased-restart should reload everything from the new folder... It does seem to point to the right folder, but there's some mysterious left-overs pointing to something old somewhere??

Server:

  • OS: Ubuntu 18.04 with systemd
  • Puma Version 4.2.1 (it doesn't happen with 3.12)
bug waiting-for-review

Most helpful comment

I dug into what's happening on my repro repo. I've got some notes to share and some thoughts on how to address the problem.

What causes the Could not find nio4r errors?

During the Bundler.setup process that is kicked off on a puma worker when it boots, Bundler tries to find all of the gems referenced by the Gemfile.lock. It searches for gems using an instance of Bundler::Index that is constructed by Bundler::Source::Rubygems#installed_specs. In Bundler::Source::Rubygems#installed_specs, Bundler iterates over the collection of specs (instances of Gem::Specification) corresponding to installed gems by calling Bundler::RubygemsIntegration#all_specs, which in turn calls Gem::Specification.stubs. Gem::Specification.stubs returns a result that includes the return value from Gem.loaded_specs, which is a collection of instances of Gem::Specification that correspond to already-activated gems (this includes nio4r since that gem was activated by the puma master--by puma-wild directly when you're using prune_bundler).

When Bundler::Source::Rubygems#installed_specs iterates over the collection of specs, it rejects specs for which native extensions are missing. It determines whether or not native extensions are missing by executing the method #missing_extensions? on the corresponding Gem::Specification. Gem::Specification#missing_extensions relies on the result from Gem::BasicSpecification#extension_dir. When this method is called on the nio4r spec in the worker, it's called on the same instance that was first created when the puma master activated the gem and returns the extension directory of the gem at the time that the puma master activated it. So, if you remove that directory after the puma master boots up, workers will still try to find files in that directory when they boot. If they're not there, Bundler::Source::Rubygems#installed_specs excludes the gem from its index and Bundler::SpecSet#materialize is, in turn, unable to find the gem and raises GemNotFound.

How to fix it

We're fortunate that the puma master doesn't actually need to have a runtime dependency on nio4r. Basically, we can make it so that puma-wild never activates the nio4r gem in the master process at all. As long as we never require 'nio' in the master process (basically what @MSP-Greg was getting at), we're good. I've tested that change in my repro repo (by patching puma version 4.3.1) and everything works. But if I apply the same patch to puma master (after #1893 was merged), I get a different error that's preventing me from verifying the fix against master. I think that might be an unrelated regression introduced by #1893

Opening a PR for now: #2125

All 47 comments

@mcg @MSP-Greg maybe you guys have some suggestions at least what to test or how to try to reproduce this? I tried it with a toy app and using symlinks in a similar structure, but somehow can't reproduce it in this environment... But it does seem to happen fairly consistently in our staging environment so far.

One small thing I saw in our staging logs, but I don't see in my toy environment is:

Early termination of worker

Maybe it's related to the problem?

Possibly related to the changes we made to prune_bundler lately.

I have a problem similar to this. When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart (SIGUSR1), My app does not get reload due to mismatched gem versions.

See here: #705

So, Is this issue related to this prune_bundler changes made lately?


In development, I got same error. Early termination of worker

When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart

Makes sense if you look at the code for how prune_bundler works. We'll have to figure out how to change/reload Puma's spec.

Slightly OT, but is it worth switching from prune_bundler to the more "traditional"(??) before_fork / on_worker_boot? btw, I find the README section about clustered mode conflicting with Deployment restarting docs e.g.

(README)

If you're using a configuration file, use the preload_app! method:

(Deployment)

Don't use preload!. This dirties the master process and means it will have to shutdown all the workers and re-exec itself to get your new code. It is not compatible with phased-restart and prune_bundler as well.

(also the deployment doc mentions using prune_bundler but there's no mention of it at all on the main README)

Which advice should I follow? :)

If you need phased restarts or prune_bundler, you cannot preload_app. If you don't need either of those things, use preload_app.

Sorry if this is a dumb question, but how do I know if I need prune_bundler?

I do need phased restarts for zero-downtime deploys.

We'll have to figure out how to change/reload Puma's spec.

This will be awesome if we can have a solution for this. At least a workaround until a permanent solution comes up. Since, updating those specs requires full restart and this means downtime.

@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:

/var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'

If this version gets deleted the vendored gems also get deleted.
You probably want to install gems in a shared persistent directory like: /var/local/app/app.d/shared

Thanks for the suggestion! But this is working 100% fine with puma 3 for a number of years now. How come it stopped working with 4??

Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency.
I'm not sure but maybe bundler tries to reload nio4r? Where as before it didn't need to reload anything?

Good point! However isn’t puma 4 potentially breaking prune_bundler then ?

@gingerlime The bug might have been there since before Puma 4, see https://github.com/puma/puma/issues/1593 and https://github.com/bundler/bundler/issues/6667, looks like the same problem you have but under different circumstances

There's a PR trying to fix it https://github.com/puma/puma/pull/1893, perhaps you could try it out?

Puma v4.3.0 is still using Bundler.with_clean_env

https://github.com/puma/puma/blob/395337df4a3b27cc14eeab048016fb1ee85d2f83/lib/puma/launcher.rb#L267

https://github.com/puma/puma/pull/1893 switches to Bundler.with_original_env which reportedly solved the issue for one user in https://github.com/bundler/bundler/issues/6667

Thanks @dentarg very interesting. I'll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can't experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here.

Otherwise, is there a workaround perhaps? maybe moving away from prune_bundler? what's the recommended approach for zero-downtime deploys with the now "old-school" symlinked structure?

As @gingerlime pointed out, I am also interested in any workarounds for zero-downtime deployments.

My team was experiencing this same issue. For us, the problem had to do with how our release process worked, not necessarily with Puma itself. Like @gingerlime, this error happened on our staging environment much more reliably and more frequently than it did in production.

In a nutshell: If your release process removes old releases (specifically, the compiled native extensions on which the puma master relies), new puma workers will check to see if the native extensions are compiled, fail to find that evidence, and bundler will ignore the installed gem, eventually raising Bundler::GemNotFound.

I don't have a ready-to-go public repro repo, but this is the setup (forgive my sloppiness):

  • Create "release_1", a directory containing your application code as well as your gems (bundle install --path .bundle). Point your current release symlink to release_1.
  • Start a puma master with a cluster (workers 2 or something). bundle exec puma -C puma.rb). You must have prune_bundler in there, too.
  • Create "release_2". Point your current release symlink to release_2. Issue a phased_restart. This will succeed since although bundler/setup on the release_2 workers will check for existence of the compiled native extensions in the directory for release_1, they're still present on disk, so everything's good.
  • rm -rf release_1. If you CD tools delete old releases to save space, that might be happening for you automatically.
  • Create "release_3". Point your current release symlink to release_3. Issue a phased restart. This will fail with the error Bundler::GemNotFound, since Gem::Specification#missing_extensions? returns true after failing to find the file at the path returned by the method gem_build_complete_path (that will encode a path to release_1, which was removed already). The workers will die and the master will restart new workers, but those workers will fail with the same error again and again and again.

This problem, I imagine, happens much more frequently with puma 4.x than with puma 3.x because puma 4.x has a runtime dependency on a gem that has native extensions. Any time that the puma master has "activated" a gem that has native extensions, workers can encounter this error if they try to activate the same gems.

My team still experienced a similar issue with puma 3.x, not with the nio4r gem, but instead with ffi, since we were loading PumaWorkerKiller (which has a runtime dependency on ffi) in the puma master. New workers would start, fail to find native extensions for ffi, and exit with the error Bunlder::GemNotFound.

For a quick workaround, just keep your old releases around on your servers (or at the very least the release directory for the version of your application where you first started the puma master). You can safely perform phased restarts as you'd expect.

My team is investigating other solutions, but we're not far along in that process yet. We'll make sure to share what we find.

Everyone having simlar issues should try master branch, as https://github.com/puma/puma/pull/1893 was merged and may help.

I created a repo that demonstrates that this error occurs reliably if you're deleting old releases

https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors

The master branch of that repo gets the error described in this thread. It's using puma 4.3.1.

Could not find nio4r-2.5.2 in any of the sources
Run `bundle install` to install missing gems.
[292] Early termination of worker

The puma-master branch of the repro repo is using puma at commit 8c9b3ebc7a86bdc5488704a0225b76409b9f555f, which includes changes from #1893.

Workers still fail on startup, but with a different error

/usr/src/releases/1582131641325997600/Gemfile not found

From the repro repo

This project reproduces an issue related to phased restarts in Puma. Specifically, it demonstates that if you delete the compiled native extensions that were used when the puma master started, the next phased restart will fail.

Is that a problem Puma should solve? 🤔

@dentarg, don't you think so? So you mean that we will update puma for getting not just for better features but for better security in the future but in fact we will not be able to gracefully deploy due to this issue because it crashes.

I believe this should be solved by Puma as soon as possible. At least a workaround.

For now, I just forcefully reject pull requests for Puma.

Update: I realized that previous quote seems mentioning about 3rd party native extension _that is not related to puma._ If this is the case, I believe if we can do phased-restart on unicorn and/or other web servers (and we do) we should be able to do so with puma too.

Is that a problem Puma should solve?

@dentarg I'm not certain about this either. The bug might not even be with Puma itself--it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It's also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.

I believe this should be solved by Puma as soon as possible. At least a workaround.

@gencer Puma doesn't owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers.

@cjlarose Neither puma nor its developers and also volunteers doesn't owe me anything. I am deeply sorry if I offended anyone. I didn't mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that's what i believe because other web servers doesnt have this issue and puma 3.x also don't have this.) not by puma developers. This could be of course "me" or someone who knows these things better than me.

Please accept my apologies on this matter.

This issue is very messy.

because other web servers don't have this issue and puma 3.x doesn't have this.

Yes, but @cjlarose noticed the issue with extension files, whether they be Puma dependencies (4.x only), or app dependencies (noticed with 3.x). So, it may not really be a '3.x is ok, 4.x is bad' problem. It may just appear that way because 4.x has an extension gem dependency of nio4r.

At various times, there have issues in Ruby with symlinks and the absolute paths of symlinks, where code incorrectly expected the two to be equal. That's just Ruby, who knows about Bundler/RubyGems, especially when changing a symlink to affect what files a running Ruby instance uses.

Two people have shown a log message of 'Could not find nio4r-2.5.1 in any of the sources'. That's a Bundler message, so, either Puma is misconfiguring Bundler, or Bundler is not functioning as expected.

One thing that could be tried is changing:
https://github.com/puma/puma/blob/afb27d5552b4d7b191655b3e96937c96fd4ef12c/lib/puma/reactor.rb#L6
to:

require 'nio' unless Kernel.const_defined?(:NIO)

the above isn't a solution, just trying to bypass Bundler's resolution to see if that's the issue.

* submitted by a Puma volunteer

Seen on CI

More than one job failed with the same error on Travis. And, jobs using the same Rubies pass on Actions. Good find.

I feel guilty, being one who often says "CI is more than green checks & red X's, look at the logs"...

I feel guilty, being one who often says "CI is more than green checks & red X's, look at the logs"...

I think, CI is good, but Travis CI I feel has some unclear patches and phantom behavior.

I think the problem is that some users have the same issue as Travis has, so 'we' need to determine what that is...

I dug into what's happening on my repro repo. I've got some notes to share and some thoughts on how to address the problem.

What causes the Could not find nio4r errors?

During the Bundler.setup process that is kicked off on a puma worker when it boots, Bundler tries to find all of the gems referenced by the Gemfile.lock. It searches for gems using an instance of Bundler::Index that is constructed by Bundler::Source::Rubygems#installed_specs. In Bundler::Source::Rubygems#installed_specs, Bundler iterates over the collection of specs (instances of Gem::Specification) corresponding to installed gems by calling Bundler::RubygemsIntegration#all_specs, which in turn calls Gem::Specification.stubs. Gem::Specification.stubs returns a result that includes the return value from Gem.loaded_specs, which is a collection of instances of Gem::Specification that correspond to already-activated gems (this includes nio4r since that gem was activated by the puma master--by puma-wild directly when you're using prune_bundler).

When Bundler::Source::Rubygems#installed_specs iterates over the collection of specs, it rejects specs for which native extensions are missing. It determines whether or not native extensions are missing by executing the method #missing_extensions? on the corresponding Gem::Specification. Gem::Specification#missing_extensions relies on the result from Gem::BasicSpecification#extension_dir. When this method is called on the nio4r spec in the worker, it's called on the same instance that was first created when the puma master activated the gem and returns the extension directory of the gem at the time that the puma master activated it. So, if you remove that directory after the puma master boots up, workers will still try to find files in that directory when they boot. If they're not there, Bundler::Source::Rubygems#installed_specs excludes the gem from its index and Bundler::SpecSet#materialize is, in turn, unable to find the gem and raises GemNotFound.

How to fix it

We're fortunate that the puma master doesn't actually need to have a runtime dependency on nio4r. Basically, we can make it so that puma-wild never activates the nio4r gem in the master process at all. As long as we never require 'nio' in the master process (basically what @MSP-Greg was getting at), we're good. I've tested that change in my repro repo (by patching puma version 4.3.1) and everything works. But if I apply the same patch to puma master (after #1893 was merged), I get a different error that's preventing me from verifying the fix against master. I think that might be an unrelated regression introduced by #1893

Opening a PR for now: #2125

That's awesome! Thanks so much for everyone for investigating, reproducing, investigating some more and looks like we have a potential solution in the pipeline. 🍾

This is probably a stupid question, but is there a way to "tell" bundler to load from the symlinked path rather than the actual path?

I should mention that the solution I propose basically only works if your puma master process has no dependencies on gems with native extensions. I think that's true for most people, especially if you're not using before_fork. But if you're using puma-worker-killer or another gem that's loaded in the puma master process and also has dependencies on native extensions, the solution I proposed won't be a fix.

A more robust solution is to make it possible that puma workers use independent installations of native extensions from the master. In particular, the current worker-boot process looks like this (assuming prune_bundler):

  1. puma-wild activates nio4r and other dependencies of the master process.
  2. Workers are forked from the master process
  3. Workers execute Bundler.setup
  4. Workers start their main server loop

Instead, we'd do this:

  1. puma-wild activates nio4r and other dependencies of the master process.
  2. Workers are forked, but immediately exec into a brand new ruby process (similar to how prune_bundler works today), thereby leaving them in a state without an activated nio4r or other gems with native extensions.
  3. Workers execute Bundler.setup (activating nio4r and other gems for the first time)
  4. Workers start their main server loop

I understand that this is a pretty dramatic architectural change. Also, I understand that forking from the master to form workers is useful when you're trying to leverage copy-on-write memory (and especially when you're using preload_app!). None of this "fork-and-exec" stuff makes sense if you're trying to leverage COW, but preload_app! is necessarily incompatible with phased restarts anyway. So the new fork-and-exec strategy only makes sense if you want to support phased restarts.

The proposed architecture makes it so that gems activated by the puma master have no effect on activating gems in the workers. One potential downside of this is that workers can technically load a different version of puma than the master, but that problem can be addressed separately.

This is probably a stupid question, but is there a way to "tell" bundler to load from the symlinked path rather than the actual path?

@gingerlime This might be possible, but I think it's undesirable. Gem.loaded_specs represents a collection of gems that have already been activated. If we clear it in the worker,

# don't do this
on_worker_boot do
  Gem.instance_variable_set(:@loaded_specs, {})
end

we get the behavior you're talking about: worker processes end up using the new gem installation path (since the symlink gets re-resolved). Since the files all exist for the native extensions, workers will actually boot up after a phased restart. However, I think registering gems with rubygems in the puma master provides a lot of value. It represents some "truth" about which gems have already been activated. We want Bundler/rubygems to know about this in order to prevent us from doing stuff like activating different versions of nio4r in the workers than we have in the master. Since workers are forked from the master, this would basically make it possible for workers to double-require gems, which could lead to other problems.

@cjlarose Thanks for the writeup. I'll take a more detailed look this week.

https://github.com/puma/puma/issues/1875 tracks the progress of a new architecture in puma that'll hopefully resolve this issue.

2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue.

I've got a branch that fixes this issue. I'm working on fixing my tests for it.

I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it's fine to remove nio4r's compiled native extensions on disk and perform phased restarts. It also supports doing the same for _any_ gem with native extensions loaded by the puma master process, such as ffi if you're using puma_worker_killer. The problem with that PR, though, is that it increases the overall memory footprint of a puma cluster since worker processes can't leverage COW memory by forking from the master process. We ultimately decided to close it.

I opened #2427 which fixes the nio4r issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for _other_ gems that are loaded by the puma master process (like ffi), but I suspect this doesn't matter to most people. That PR will fix this problem for most folks.

For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for _any_ gem loaded by the puma master process, including ffi.

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

Happy to help!

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

Excellent question. I can provide additional clarification.

Without a fix in Bundler for rubygems/rubygems#4004, but with the workaround in puma #2427, it'll be possible to perform phased restarts in a way that upgrades nio4r without a problem. This works because nio4r won't be loaded into the puma master process at all. Workers all independently load their own copy of nio4r.

With a fix in Bundler for rubygems/rubygems#4004 and with the workaround in puma #2427 applied, of course that won't make the situation any different for nio4r: phased restarts will work, even if you're upgrading nio4r. But the story is different for ffi (or really any gem that has native extensions and is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof). This is because the same version of ffi will be loaded into the worker processes when they fork from the master process. Phased restarts will not support upgrading ffi. The value of the Bundler fix is just that you'll be able to freely delete the compiled native extensions on disk without worry that the subsequent phased restart will fail.

To sum it all up, this my advice for running puma after #2427 is merged: if you perform phased restarts to upgrade your application served by puma (assuming you have prune_bundler on, and regardless of whether or not you use a deployment strategy that involves deleting the gems of old releases):

  • Do not try to upgrade puma itself. Workers will fail on boot.
  • Do not try to upgrade any gem that is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof (such as ffi if you're using puma_worker_killer). Workers will fail on boot.
  • If you want to do either of these things, perform a hot restart instead.

If you _do_ have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):

  • Do not put any gems into extra_runtime_dependencies that have native extensions or has dependencies that have native extensions (like puma_worker_killer and ffi) at all. Workers will fail on boot during a phased restart. Hot restarts are your only option here if you need these dependencies.

If you have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 _is_ fixed):

  • Feel free to put gems like puma_worker_killer and ffi into extra_runtime_dependencies. Phased restarts will work so long as you're not changing versions of those gems.

I think I'll go ahead and add this to the restart documentation for puma since it's important to operators to know when it's safe to perform a phased restart. Longer term, another potential idea is for puma to automatically perform a hot restart instead if it detects that it cannot safely perform a phased restart.

Hi

Hey @cjlarose thanks so much for the input, and sorry for the delay. Was on holidays. I'm really keen to try to upgrade Puma now that #2427 is merged. Thanks for your awesome work and patience.

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right? can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

Hope I got things right, but want to fool-proof it in future as much as possible to avoid nasty surprises :)

Released in 5.0.3.

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right?

Because you're _not_ using extra_runtime_dependencies, you don't need to worry about ffi upgrades. Those will work just fine. For puma upgrades, a "hot restart" is the most graceful way to restart.

can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

This is definitely somewhere where puma is lacking. It'd be great if there was an option to pumactl phased-restart that either checked first to see if a phased restart is possible (and use a hot restart if not) or attempted to perform a phased-restart, but automatically fall back to a hot restart if it fails for whatever reason.

On (proprietary) applications that I maintain, we have a kinda complicated start/stop script that basically does this by utilizing the stats command offered by the Puma control/status server through pumactl stats. With stats, we can monitor the progress of a phased restart (by inspecting the phase of each worker). If the phased restart doesn't make progress within a certain timeout, we fall back to a hot restart.

I think pumactl phased-restart should just do this automatically. I opened a separate issue to address this. #2465

Thanks again @cjlarose ! I'll check out pumactl and see if we can script something. And great to see #2465 ! hope there's some way I can help with this.

I'm having trouble after upgrading to puma 5.0.4 unfortunately... I did a hot restart to make sure I'm running puma 5 but after a few deploys (with phased-restarts) I see these errors in my puma.log file and I think the paths are pointing to folders that were deleted... The gem it complains about wasn't actually updated recently.

/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker

Any tips on how to further troubleshot / investigate this?

please let me know if I should open a separate issue for this, but it seems to keep happening for us after a hot restart and a few phased restarts with puma 5.0.4

@gingerlime I think you can open up a new issue as the issue is now about json and not nio4r

Thanks @dentarg will do (it's related to phased-restarts though, but definitely looks like a different thing)

Was this page helpful?
0 / 5 - 0 ratings