Sidekiq: ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds

Created on 28 Mar 2016  路  9Comments  路  Source: mperham/sidekiq

I am intermittently observing the error ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds in our production environment. It'll start occurring and not cease until Sidekiq is restarted. Interestingly enough, the error is not happening in the Phusion Passenger/Rails portion of the app, only within Sidekiq.

I checked the number of connections allowed to the DB and all seems to be fine. my.cnf max_connections is not specified, and I believe it defaults to 100 or 151 out of the box. My rails app has a pool of 20 set (with a reaping_frequency of 60, if this matters). Sidekiq is set to a concurrency of 10. There are no other applications running on the server, consuming database connections.

I checked the processlist within MySQL and see a ton of connections in the sleep state. Maybe this is normal.

mysql> SHOW PROCESSLIST;
+--------+--------------+-----------+--------------------+---------+------+-------+------------------+
| Id     | User         | Host      | db                 | Command | Time | State | Info             |
+--------+--------------+-----------+--------------------+---------+------+-------+------------------+
| 214955 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214956 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214959 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214960 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214961 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214963 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 214964 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 224332 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 225845 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 233156 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 233157 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 238825 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 238882 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 238889 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 239100 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 241117 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 241252 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 242777 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 246312 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 246313 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   13 |       | NULL             |
| 246614 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   | 6773 |       | NULL             |
| 246615 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   15 |       | NULL             |
| 246616 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   23 |       | NULL             |
| 246617 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   | 4080 |       | NULL             |
| 246618 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   | 7070 |       | NULL             |
| 246841 | [REMOVED]_prod | localhost | [REMOVED]_production | Sleep   |   26 |       | NULL             |
| 246860 | [REMOVED]_prod | localhost | NULL                 | Query   |    0 | init  | SHOW PROCESSLIST |
+--------+--------------+-----------+--------------------+---------+------+-------+------------------+

Any help or guidance is greatly appreciated.

Software versions

Rails 4.1.14.1
MySQL 5.6.19-0ubuntu0.14.04.1
mysql2 gem 0.3.20
Sidekiq 3.5.4
Ubuntu 14.04.02 LTS

Sidekiq Initializer

:concurrency: 5
:pidfile: ./tmp/pids/sidekiq.pid
:logfile: ./log/sidekiq.log
development:
  :queues:
    - [REMOVED]_development
staging:
  :concurrency: 1
  :queues:
    - [REMOVED]_staging
production:
  :concurrency: 10
  :queues:
    - [REMOVED]_production

Full Backtrace

2016-03-28T15:32:56.167Z 2839 TID-ow54a682o ChangeAwayStatusWorker JID-462f59a3632454ab7061741e INFO: fail: 5.001 sec
2016-03-28T15:32:56.168Z 2839 TID-ow54a682o WARN: {"class"=>"ChangeAwayStatusWorker", "args"=>[1209606, "[REMOVED]", "[REMOVED]""], "retry"=>5, "queue"=>"[REMOVED]", "dead"=>false, "jid"=>"462f59a3632454ab7061741e", "created_at"=>1459179171.1646957, "enqueued_at"=>1459179171.1647549, "error_message"=>"could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)", "error_class"=>"ActiveRecord::ConnectionTimeoutError", "failed_at"=>1459179176.1673794, "retry_count"=>0}
2016-03-28T15:32:56.168Z 2839 TID-ow54a682o WARN: ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)
2016-03-28T15:32:56.168Z 2839 TID-ow54a682o WARN: /var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `loop'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:181:in `wait_poll'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:136:in `block in poll'
/usr/local/rvm/rubies/ruby-2.0.0-p451/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:146:in `synchronize'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:134:in `poll'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:418:in `acquire_connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:351:in `block in checkout'
/usr/local/rvm/rubies/ruby-2.0.0-p451/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:350:in `checkout'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
/usr/local/rvm/rubies/ruby-2.0.0-p451/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:541:in `retrieve_connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_handling.rb:113:in `retrieve_connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/connection_handling.rb:87:in `connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/relation/delegation.rb:48:in `connection'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/relation/finder_methods.rb:424:in `find_one'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/relation/finder_methods.rb:413:in `find_with_ids'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/relation/finder_methods.rb:68:in `find'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.14.1/lib/active_record/querying.rb:3:in `find'
/var/www/[REMOVED]-production/releases/20160316164820/app/workers/change_away_status_worker.rb:34:in `perform'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/processor.rb:80:in `execute_job'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/processor.rb:56:in `block (2 levels) in process'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:127:in `block in invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/newrelic_rpm-3.14.3.313/lib/new_relic/agent/instrumentation/sidekiq.rb:33:in `block in call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/newrelic_rpm-3.14.3.313/lib/new_relic/agent/instrumentation/controller_instrumentation.rb:362:in `perform_action_with_newrelic_trace'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/newrelic_rpm-3.14.3.313/lib/new_relic/agent/instrumentation/sidekiq.rb:29:in `call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/server/active_record.rb:6:in `call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/server/retry_jobs.rb:74:in `call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/server/logging.rb:11:in `block in call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/logging.rb:30:in `with_context'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/server/logging.rb:7:in `call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:129:in `block in invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:132:in `call'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/middleware/chain.rb:132:in `invoke'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/processor.rb:51:in `block in process'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/processor.rb:104:in `stats'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/sidekiq-3.5.4/lib/sidekiq/processor.rb:50:in `process'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
/var/www/[REMOVED]-production/shared/bundle/ruby/2.0.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
2016-03-28T15:32:56.170Z 2839 TID-ow53q50l4 ChangeAwayStatusWorker JID-a4008a769d95bdd33d41ffa8 INFO: start

Most helpful comment

This type of issue has been dealt with dozens of times in the issues. It's not a Sidekiq bug but typically the application or a gem is spinning up its own threads and not properly handling the AR connection.

All 9 comments

This type of issue has been dealt with dozens of times in the issues. It's not a Sidekiq bug but typically the application or a gem is spinning up its own threads and not properly handling the AR connection.

I saw the previous issues relating to this error, however, I am not spinning up any threads of my own. Just simple ActiveRecord activities and a couple web requests.

What steps can I take to debug the problem?

I don't know of an easy way to debug it.

Are you convinced that this cannot be an issue with Sidekiq, despite that I'm not creating threads?

I won't say it's impossible but everyone reporting the issue in the last year has found something in their application causing the problem. Sidekiq has thousands of users; if there was a basic connection handling bug, the issue tracker would show it.

Well that's too bad. Leaves me in a tough spot. There is absolutely no threading in the project codebase itself. The only gems used are httparty and savon, which don't have any threading calls in them either.

If you think of any steps that can be taken to track the database connection handling, please let me know.

An additional thought: is it a little strange that I'm receiving these errors when it doesn't appear that Sidekiq has reached a maximum of 10 connections to the DB yet? If you look at the processlist, there are 20 connections that have been inactive for about the same amount of time, so perhaps these are the Passenger connections. There are 6 more odd connections, some idle for much longer than I'd expect considering the level of activity on this service.

Some debugging thoughts:

  • Send a TTIN signal to your SK worker process to get the current backtrace from all the threads. https://github.com/mperham/sidekiq/wiki/Signals#ttin
  • Verify the AR pool size is set as you expect in the running SK process with ActiveRecord::Base.connection_config[:pool]

You would likely get more help from StackOverflow at this point, though. Wanna open the question there and drop the link here?

Our project was suffering from this and reducing the number of named queues seemed to fix it. Not sure why this might have caused it but hopefully it might help someone else.

EDIT: Keeping queues to a minimum is recommended here: https://github.com/mperham/sidekiq/wiki/Advanced-Options#queues

Was this page helpful?
0 / 5 - 0 ratings

Related issues

paul-ylz picture paul-ylz  路  4Comments

jlecour picture jlecour  路  4Comments

michaeldiscala picture michaeldiscala  路  4Comments

bartimaeus picture bartimaeus  路  3Comments

mperham picture mperham  路  4Comments