Rails: PostgreSQL prepared statement failure inside transaction results in a permanently cached prepared statement that's never cleaned up.

Created on 24 Sep 2013 · 77Comments · Source: rails/rails

_ActiveRecord 4.0. Verified with PostgreSQL 9.2 and 9.3 and pg gem 0.16 and 0.17_

Assume you have some basic table, User for this example's purposes.

In a Rails console:

User.find(1)
=> .....

At this point, the PostgreSQL adapter has created a prepared statement for selecting a User by ID and cached it in the connection. To verify:

ActiveRecord::Base.connection.instance_variable_get(:@statements).to_a
=> [
  [0] [
    [0] "\"$user\",public-SELECT  \"users\".* FROM \"users\"  WHERE \"users\".\"id\" = $1 LIMIT 1",
    [1] "a1"
  ]
]

Now from a separate session, let's modify the schema for the User table. The use case here is adding a column to a live system without causing downtime to users:

ALTER TABLE users ADD new_metric_column integer;

Back in the previous Rails console, we then run this:

User.transaction { User.find(1) }
=> [2013-09-23 17:53:03] DEBUG ActiveRecord::Base        :    (0.2ms)  BEGIN
[2013-09-23 17:53:03] DEBUG ActiveRecord::Base        :   User Load (0.6ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1  [["id", 1]]
[2013-09-23 17:53:03] ERROR ActiveRecord::Base        : PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block
: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
[2013-09-23 17:53:03] DEBUG ActiveRecord::Base        :    (0.5ms)  ROLLBACK
ActiveRecord::StatementInvalid: PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block
: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
from /Users/nixme/Projects/ML/flabongo/vendor/bundle/ruby/2.0.0/gems/activerecord-4.0.0/lib/active_record/connection_adapters/postgresql_adapter.rb:512:in `exec'

What's happening is that since the shape of users has changed, the a1 prepared statement is no longer valid. The AR PostgreSQL adapter correctly sees this and tries to run DEALLOCATE a1. But the transaction's already invalid at that point, and so even the DEALLOCATE fails, leaving that connection's @statements pool with a permanently bad prepared statement.

For reference, on the PostgreSQL side, the logs show:

ERROR:  cached plan must not change result type
STATEMENT:  SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  DEALLOCATE a1

Now that might seem like an arbitrary transaction, but all AR saves wrap in a transactions, so a validation or callback might be selecting a User. Imagine in a production system, all old app servers are now permanently broken without a restart. This makes live migrations without downtime basically impossible, unless I'm missing something.

I think ideally, the transaction is aborted, then DEALLOCATE is run _outside_ a transaction and thus successfully removes that prepared statement for the connection. Then the original transaction is retried automatically and the necessary prepared statements are re-created automatically.

Another option is just to blow up this one transaction, but at the very least get that DEALLOCATE to succeed. So that any follow-up transactions will work and fail the least number of requests.

For now, we've disabled prepared_statements after monkey-patching some fixes from 4-0-STABLE (see #12023)

/cc @tenderlove since I _think_ you've done some of the PG prepared statement support?

PostgreSQL activerecord

Source

nixme

👍3

Most helpful comment

This issue appears to be getting a little stale again, but I'm seeing this as well on Rails 4.2.5 with pg gem 0.18.4 and PostgreSQL server 9.4.4.

jcarlson on 12 Feb 2016

👍8

All 77 comments

Hi @nixme the bug makes sense, but I'm not sure that we should fix this. ActiveRecord keeps other caches internally about the schema of your tables. If you don't restart your server after modifying the schema, then these caches will not expire and your app will break in other strange ways.

Here is an example showing that the caches are not expired correctly:

oh no

tenderlove on 24 Sep 2013

Well then I guess my question is what is the best practice for AR and no-downtime schema migrations? Without prepared statements, adding columns is actually painless as long as you keep them nullable.

Example:

Old code (and yes, old AR caches about column info) is running against the new schema. The new columns are NULLABLE, maybe have a DEFAULT, etc. So the old AR caches on things like INSERT are fine and everything just works.
We do a rolling deploy of new code which can handle records from both shapes.
Once completely on new code, re deploy with old code removed. Run db script backfills to fix up interim data

I understand that's it's not wise for AR to keep rechecking schema in production. And removing columns is definitely going to kill old caches and you then you have to do tricky things like first deploy with http://www.codinginthecrease.com/news_article/show/85364 and then run the migration.

What about reducing the scope of this issue to just failures around the statement pool's DEALLOCATE calls. Those should work, no? Otherwise that StatementPool is in a stuck state.

nixme on 24 Sep 2013

Actually this from @pedro (http://pedro.herokuapp.com/past/2011/7/13/rails_migrations_with_no_downtime/) is more helpful for those column removal edge cases.

But anyways, to stay on topic: DEALLOCATE. Any easy way for the statement pool to retry a DEALLOCATE fail outside the transaction? Looking at the code, I dont' think the pool knows the current state of being in a transaction and if it can run something basically after the ROLLBACK.

nixme on 24 Sep 2013

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

rafaelfranca on 1 May 2014

We're facing similar issues (but when dropping columns).

Before dropping columns, we're ensuring that AR is not writing to any columns by overriding AR::Base.columns method.

But dropping columns as part of a migration is causing issues because of cached prepared statements. See https://gist.github.com/ketan/577ca47774364d8e4add for an reproducible bug (uses PG)

I'm seeing the following error in the postgres log when this happens.

ERROR:  cached plan must not change result type
STATEMENT:  SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  DEALLOCATE a2

ketan on 16 May 2014

👍1

/cc @tenderlove since he _probably_ remembers something from #3335.

ketan on 16 May 2014

This issue has been automatically closed because of inactivity.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

rails-bot on 27 May 2014

This issue is reproducible on 4.1 and 4.0, the test case in comment https://github.com/rails/rails/issues/12330#issuecomment-43273344 still holds

ketan on 27 May 2014

@ketan I missed you comment early. Reopened. Thank you for reporting back.

rafaelfranca on 27 May 2014

FYI, we are seeing this exact same issue on Rails 4.1.4 and Postgresql 9.3.3.

adambrod on 10 Sep 2014

I experience this with a simple select query and some parameters. Is there a workaround for this, other than just disabling prepared statements?

Thanks!

frenkel on 18 Sep 2014

We are also experiencing this on 4.1.6.

Are there any ideas on how this could be fixed? It makes using prepared statements not that useful if they can cause downtime when running migrations :(

alan on 8 Oct 2014

Confirmed that we're experiencing this in Rails 4.1.7, pg 0.17.0, postgres 9.3.4.

This migration:

class AddNewFieldToContainers < ActiveRecord::Migration
  def change
    add_column :containers, :new_field, :text, default: 'percent_with_average'
  end
end

...led to this error:

ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: cached plan must not change result type : SELECT "containers".* FROM "containers" WHERE "containers"."id" = $1 LIMIT 1

trianglegrrl on 22 Nov 2014

Bumped into this during a production deployment.

Another potential solution to this problem would be to use an explicit column list in selects. Table schema information should be available, since it is used by the insert code path.

Do any rails devs have an idea of whether this is feasible/desirable?

anatolydwnld on 8 Jan 2015

I'm facing the same issue with Rails 4.1.9 and Postgres 9.4.

zedtux on 14 Jan 2015

👍1

We just had this issue with a deployment today. Is the solution to disable prepared statements?

mockdeep on 21 Jan 2015

@mockdeep - yes the solution is to disable prepared statements :(

ketan on 21 Jan 2015

We just ran into this problem today as well.

chendo on 23 Feb 2015

had this problem today after running some migrations

forgottentea-xx on 12 Mar 2015

+1 Just hit this today in production.

rossettistone on 19 Mar 2015

+1 as a workaround, we are currently selecting all columns explicitly.

kushkella on 31 Mar 2015

👍1

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-2-stable, 4-1-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

rails-bot on 25 Aug 2015

Unlocking since it became stale. Please do not comment on this issue with +1. We know people are having the problem, and it will not cause it to be solved any faster.

sgrif on 2 Sep 2015

👍2

@tenderlove We met this literally just now by adding a column...I might be out of context, the issue seems to be related to prepare statements are using * like "SELECT * FROM some_table WHERE foo = ?". Can we change the prepared statements to match exactly the columns instead of "*"? Would that solve the issue?

owenthereal on 30 Sep 2015

So our solution is to disable prepare statements for now which in turn turns off Adequate Record...

owenthereal on 1 Oct 2015

Not sure if this helps. I've ran into this issues 3 times in production. It's always the same thing.

I have two rails apps both hitting same database.
I run a migration that adds columns from first app.
I go to try to use second app and I get error.
It's always when doing an insert or update in second app.

I've never seen it happen to the app(first app) that controls the migrations

I've just got into the habit of when I run migrations I restart the app.

drewhamlett on 14 Oct 2015

I've just got into the habit of when I run migrations I restart the app.

That has always been what you're meant to do.

sgrif on 14 Oct 2015

Restarting is great in theory, but when you're doing gradual deploys it gets more tricky. We've seen a migration run, but requests being handled by a server that hasn't been rebooted yet start throwing errors. One strategy I think I'll give a try is to temporarily disable prepared statements while deploying and then re-enable them again after the migrations have run. That'll involve restarting 3 times, though...

mockdeep on 14 Oct 2015

👍1

@sgrif In theory you shouldn't have to restart other Rails apps talking to the same database. Right? I guess it makes sense though.

drewhamlett on 14 Oct 2015

@tenderlove We met this literally just now by adding a column...I might be out of context, the issue seems to be related to prepare statements are using * like "SELECT * FROM some_table WHERE foo = ?". Can we change the prepared statements to match exactly the columns instead of "*"? Would that solve the issue?

I think this would solve it, but not sure if there are any disadvantages of this method?

jjoos on 22 Oct 2015

@sgrif:

I've just got into the habit of when I run migrations I restart the app.
That has always been what you're meant to do.

It's not possible to restart all servers without causing any errors and downtime...What happened was we ran a migration to add a column and all the servers were raising the error. Even if we restart servers immediately there will be a small time interval the error happens and our users are suffered. Our service is mission critical and this is not affordable. We ended up disabling prepared statement cache: we'd rather to be error-free than faster.

For further info and potential solution, please see my conversation with @tenderlove on Twitter.

owenthereal on 22 Oct 2015

@jingweno I tried digging into this today and could not for the life of me reproduce the issue.

Do you have a gist where you can reliably reproduce it?

samsondav on 31 Oct 2015

I have some tests in the PR I sent that reproduce it. From the notes:

The tests in hot_compatibility_test.rb demonstrate the problem.  "select in transaction after add_column" fails on postgres without any changes. (I expected "association preload with conditions after add column" to also fail without changes with the postgres adapter but it did not, and I'm not sure why -- I kind of punted on a full investigation. All added tests fail the mysql adapter (but not 2).

https://github.com/tmgardner/rails/blob/c0e1141e1a197101fc778cabcb46f878ca786f98/activerecord/test/cases/hot_compatibility_test.rb

toddgardner on 31 Oct 2015

@tmgardner this looks like you already wrote a fix... any reason why this wasn't approved for merge into Rails master?

samsondav on 1 Nov 2015

PR was here: https://github.com/rails/rails/pull/18685

toddgardner on 1 Nov 2015

@tmgardner I took a look at your test case and it seemed to all pass fine on most occasions except about 20% of the time it would fail. It took me a while but I narrowed it down to this test case which reliably re-produces a cached prepared statement failure: https://gist.github.com/samphilipd/8ba5d425b797650bbf3d

However this only comes after dropping and recreating a table. I could not reproduce the original problem where an added or dropped column only causes prepared statements to fail. If you have a test that I can run directly on Rails master that reproduces a failure from adding or removing a column please post it here.

samsondav on 1 Nov 2015

@samphilipd Dusted off my rails dev environment and spent some quality time with byebug. Here's an updated reproducer against current master that fails on an update to date rails-dev-box: https://github.com/tmgardner/rails/pull/2/files

I stripped out all the other stuff.

The old one was busted because the DDL statement causes the connection to drop it's prepared statements; I suspect this is new behavior from the initial reproducer.

toddgardner on 1 Nov 2015

@tmgardner thanks, this is excellent.

Here is the gist I am using to test now: https://gist.github.com/samphilipd/61e3844f4f89b20b1dc4

The interesting thing is that it seems that outside of transactions prepared statement caches are automatically dumped and the individual query is retried successfully on failure:

2015-11-01 21:13:56 UTC LOG:  execute a1: SELECT  "owner_hot_compatibilities".* FROM "owner_hot_compatibilities" WHERE "owner_hot_compatibilities"."id" = $1 LIMIT 1
2015-11-01 21:13:56 UTC DETAIL:  parameters: $1 = '1'
2015-11-01 21:13:56 UTC LOG:  statement: ALTER TABLE "owner_hot_compatibilities" ADD "baz" character varying
2015-11-01 21:13:56 UTC ERROR:  cached plan must not change result type
2015-11-01 21:13:56 UTC STATEMENT:  SELECT  "owner_hot_compatibilities".* FROM "owner_hot_compatibilities" WHERE "owner_hot_compatibilities"."id" = $1 LIMIT 1
2015-11-01 21:13:56 UTC LOG:  statement: DEALLOCATE a1
2015-11-01 21:13:56 UTC LOG:  execute a2: SELECT  "owner_hot_compatibilities".* FROM "owner_hot_compatibilities" WHERE "owner_hot_compatibilities"."id" = $1 LIMIT 1
2015-11-01 21:13:56 UTC DETAIL:  parameters: $1 = '1'
2015-11-01 21:13:56 UTC LOG:  statement: SELECT 1
2015-11-01 21:13:56 UTC LOG:  statement: DEALLOCATE a2

The problem is that within a transaction, we can't retry naively because Postgres 'poisons' the transaction so that all queries fail until it is rolled back.

2015-11-01 21:13:56 UTC LOG:  execute a1: SELECT  "owner_hot_compatibilities".* FROM "owner_hot_compatibilities" WHERE "owner_hot_compatibilities"."id" = $1 LIMIT 1
2015-11-01 21:13:56 UTC DETAIL:  parameters: $1 = '1'
2015-11-01 21:13:56 UTC LOG:  statement: COMMIT
2015-11-01 21:13:56 UTC LOG:  statement: ALTER TABLE "owner_hot_compatibilities" ADD "baz" character varying
2015-11-01 21:13:56 UTC LOG:  statement: BEGIN
2015-11-01 21:13:56 UTC ERROR:  cached plan must not change result type
2015-11-01 21:13:56 UTC STATEMENT:  SELECT  "owner_hot_compatibilities".* FROM "owner_hot_compatibilities" WHERE "owner_hot_compatibilities"."id" = $1 LIMIT 1
2015-11-01 21:13:56 UTC LOG:  statement: DEALLOCATE a1
2015-11-01 21:13:56 UTC ERROR:  current transaction is aborted, commands ignored until end of transaction block
2015-11-01 21:13:56 UTC STATEMENT:  DEALLOCATE a1
2015-11-01 21:13:56 UTC LOG:  statement: ROLLBACK
2015-11-01 21:13:56 UTC LOG:  statement: SELECT 1
2015-11-01 21:13:56 UTC LOG:  statement: DEALLOCATE a1

In this case ActiveRecord attempts to DEALLOCATE the prepared statement before creating a new one. It does this (at least for postgresql) by rescuing in PostgreSQLAdapter#exec_cache and calling StatementPool#delete to flush the prepared statement for this query from the cache.

Within transactions this fails because all commands in the transaction are now poisoned, including the DEALLOCATE command.

I see three potential solutions to this:

Solution 1

Change PostgreSQLAdapter#exec_query so that we detect if we are currently inside a transaction.
In the case that the cached prepared statement has failed within a transaction
- ROLLBACK the transaction first
- DEALLOCATE the cached prepared statement as usual
- Raise some sort of exception that causes the transaction block to retry from the start.

The problem with this approach:

In the case of a long transaction we will retry on each sequential failure. This means if there are many failing prepared statements we could spend quite a bit of time retrying transactions.

Solution 2

Same as the above but if we are inside a transaction then instead of just clearing each prepared statement in turn, we flush the entire prepared statement cache and then retry. I like this approach better because it means we only retry a transaction a maximum of one time.

Solution 3

Disable prepared statement caching inside of transactions.

I prefer solution 2, but it hinges on the fact that we may automatically be retrying transactions without the user's knowledge. I'm not sure if this is done in any other parts of Rails, or if this is an acceptable property of transactions.

@tenderlove or @sgrif I'm willing to work on a patch for this to implement either 2 or 3, do you have any thoughts on this?

EDIT: Aaron agreed that #2 is a reasonable approach (see https://twitter.com/tenderlove/status/660945869861027840). Will start work on a patch for this.

samsondav on 1 Nov 2015

Solution 2 +1

zedtux on 2 Nov 2015

@samphilipd Thanks for looking into this! +1 to solution 2.

owenthereal on 2 Nov 2015

I have some concerns about solution 2; I don't think transaction blocks retried before, and the ruby code may have side effects which make the retry dangerous. I was curious if we'd have problems with them, so I poked around in our code base for what, if anything would break; as expected, for it's generally fine, with a few notable exceptions (like our use of the ar_after_transaction gem).

So I think it's workable but might cause a lot of downstream bugs, and especially hard to find ones since it'll only do retries during migrations. My personal preference is still avoiding SELECT * in favor of listing columns in the schema cache, and leaving it up to the user to make sure the schema cache is correct-ish.

toddgardner on 2 Nov 2015

Yeah, we're definitely not going to automatically retry anything.

matthewd on 2 Nov 2015

@matthewd if we can't automatically retry transactions, the other obvious option I can see is to disable prepared statements completely inside of transactions. This has a performance penalty but I can't think of any other way to avoid hitting this bug.

@tmgardner's change would fix part of the problem - adding columns - but not removing, changing the type or renaming them.

samsondav on 2 Nov 2015

@tmgardner's change would fix part of the problem - adding columns - but not removing, changing the type or renaming them.

I'm fine with that. We might ultimately consider a change that helps with those situations too, but that's a separate matter: we already have other situations (eager_load) where we explicitly enumerate the columns in a table.

We could also more conservatively improve things by holding the deallocate until after we rollback -- so the first request still fails, but we correctly fix our statement cache for next time.

matthewd on 2 Nov 2015

@samphilipd Right, so the "full solution" with the no-select-* plan is you have to do follow the patterns in: http://pedro.herokuapp.com/past/2011/7/13/rails_migrations_with_no_downtime/

If you're removing a column; you have to push code that removes it from the schema cache before the migration. Changing type and renames have to happen in multiple steps.

toddgardner on 2 Nov 2015

What about if we offer a Rails configuration option, e.g. Rails.configuration.on_schema_change_in_transaction set to either fail or retry. Set the default to fail which would mirror the current outcome, and if users are comfortable with retrying they can set the retry option.

samsondav on 2 Nov 2015

We could also more conservatively improve things by holding the deallocate until after we rollback -- so the first request still fails, but we correctly fix our statement cache for next time.

@matthewd actually, this isn't a problem. It looks like we already do deallocate after rolling back. In the SQL logs above, we attempt a deallocation inside the transaction, which fails. Then we deallocate again after the transaction rolls back.

samsondav on 2 Nov 2015

@samphilipd I think that's the disconnect that's issuing the dealloc; in the normal case, it stays in the prepared query cache and keeps failing. I changed the reproducer to show this: https://github.com/tmgardner/rails/pull/3

toddgardner on 2 Nov 2015

@tmgardner aah you are quite right. OK, I will go ahead with properly deallocating outside of a transaction and re-implementing your previous PR of explicitly named columns.

This isn't a complete fix but it's an improvement for sure.

FYI: here is a rough draft of what it looks like to auto-retry https://github.com/samphilipd/rails/commit/408557aaee7c6a9dd14a4056a77fe146c58816a9

samsondav on 2 Nov 2015

@matthewd here is a PR that holds the deallocate until after rollback https://github.com/rails/rails/pull/22170

This actually fixes the original issue addressed right at the top of this thread.

samsondav on 3 Nov 2015

This issue appears to be getting a little stale again, but I'm seeing this as well on Rails 4.2.5 with pg gem 0.18.4 and PostgreSQL server 9.4.4.

jcarlson on 12 Feb 2016

👍8

we're running into this problem trying to maintain no downtime deploys and are having trouble determining something: if prepared statements are turned off, does it expose the query to sql injection if not mitigated some other way?

DVG on 14 Apr 2016

On Wed, Apr 13, 2016 at 7:38 PM Bradley Temple [email protected]
wrote:

we're running into this problem trying to maintain no downtime deploys and
are having trouble determining something: if prepared statements are turned
off, does it expose the query to sql injection if not mitigated some other
way?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/rails/rails/issues/12330#issuecomment-209715922

sgrif on 14 Apr 2016

Any news about this? The problem becomes more urgent with the raise of microservices (and several using same schema....yeah, I know). We had this several times running on DBCP, Flyway and latest Postgres and it's really annoying.

Jeetah on 22 Apr 2016

Yo, this should be fixed in master... my PR was merged in march.

See: https://github.com/rails/rails/commit/50c53340824de2a8000fd2d5551cbce2603dc34a

I reckon this issue can be closed now.

samsondav on 26 Apr 2016

🎉2

@samphilipd — thank you for fixing this.

ketan on 26 Apr 2016

Anyone knows if a correctif patch for Rails 4 is available or will be soon ?

zedtux on 26 Apr 2016

@samphilipd Which version of AR includes/will include your patch?

owenthereal on 27 Apr 2016

@jingweno whichever version includes 50c5334 - I don't know off the top of my head.

samsondav on 27 Apr 2016

FWIW to those looking for a solution now, we've used the outstanding work in this PR to temporarily work around the issue in Rails 4 while we wait to see where this lands. It will be a great feeling to drop our little snippet in favor of this work. Thanks again @samphilipd.

module TransactionRecoverable
  module ClassMethods
    if Rails::VERSION::MAJOR >= 5
      Rails.logger.warn "WARNING: PostGres Transaction concern for failed SQL may not be needed.\n" \
        "Please visit issue: https://github.com/rails/rails/issues/12330\n" \
        "Please visit PR: https://github.com/rails/rails/pull/22170"
    end

    ##
    # Rails will normally automatically heal from PG::FeatureNotSupported: ERROR:  cached plan must not change result type.
    # Except for when it is within a transaction. Let's heal the connection for it and re-attempt to execute.
    # https://github.com/rails/rails/issues/12330
    # https://github.com/samphilipd/rails/commit/408557aaee7c6a9dd14a4056a77fe146c58816a9
    #
    # The monkey patch must match the current signature for transaction.
    # https://github.com/rails/rails/blob/a246a69e92a8ed54eb569a662dbaba16d743b2b7/activerecord/lib/active_record/transactions.rb#L210
    #
    # Providing a class method will automatically handle the instance implementation.
    # https://github.com/rails/rails/blob/a246a69e92a8ed54eb569a662dbaba16d743b2b7/activerecord/lib/active_record/transactions.rb#L309
    def transaction(*args)
      super(*args) do
        yield
      end
    rescue PG::InFailedSqlTransaction => e
      connection.rollback_db_transaction
      connection.clear_cache!

      super(*args) do
        yield
      end
    end
  end

  def self.prepended(base)
    class << base
      prepend ClassMethods
    end
  end
end

HoyaBoya on 27 Apr 2016

https://github.com/rails/rails/commit/50c53340824de2a8000fd2d5551cbce2603dc34a is currently in v5.0.0.beta4 fwiw.

owenthereal on 27 Apr 2016

@sgrif request that we close this issue?

samsondav on 28 Apr 2016

In the future you can mention the issue in your PR to close it automatically when your PR is merged.

sgrif on 28 Apr 2016

Fixed by #22170

sgrif on 28 Apr 2016

Thanks for the backport, @HoyaBoya! Took me a while to figure out how to plug it in.

I ended up sticking this in config/initializers/rails_recoverable_transactions.rb:

# https://github.com/rails/rails/issues/12330#issuecomment-215203613

module TransactionRecoverable
  module ClassMethods
    raise "We may no longer need the monkeypatch #{__FILE__}!" if Rails::VERSION::MAJOR > 4

    def transaction(*args)
      super(*args) do
        yield
      end
    rescue PG::InFailedSqlTransaction => e
      connection.rollback_db_transaction
      connection.clear_cache!

      super(*args) do
        yield
      end
    end
  end
end

class << ActiveRecord::Base
  prepend TransactionRecoverable::ClassMethods
end

If you want to verify that it's plugged in correctly, do this on your local machine:

Add puts "hello" on the line before the first super(*args) do.
Run something like this in a console (replacing Item with some model in your app):

rails runner 'Item.transaction { puts Item.count }'

Make sure it outputs "hello" before the count.

Remove your puts "hello" line again.

henrik on 6 Sep 2016

👍7

@henrik I know it's been a while since your post, but I'm bumping into a related issue. I added your code and it works better. Basically, we only see up to 1 error per number of containers we're running. So that's good that it doesn't keep erroring. But, the few errors we get are PG::FeatureNotSupported: ERROR: cached plan must not change result type.

Did you run into this?

ryanwjackson on 26 Apr 2017

@ryanwjackson Hm. I searched through our app logs and I don't see any errors like that in the last few days. So if we do get them, we don't get them often. I don't think I ever really spent the time trying to understand this problem in depth, though – my contribution to this thread is just taking someone else's solution and explaining in more detail how to add it to your app.

henrik on 26 Apr 2017

@ryanwjackson what version of Rails are you running? Are you sure it includes this PR? (#22170)

I would not expect to see PG::FeatureNotSupported errors. I would expect instead to see PreparedStatementCacheExpired. If you are sure your transactions are side-effect free, you can safely rescue this and retry the transactions.

samsondav on 26 Apr 2017

@samphilipd I am on 4.2.0, so I don't think that PR is in here as it seems to be in 5+. I was operating off the patch posted by @henrik.

So I added some debug statements:

module TransactionRecoverable
  module ClassMethods
    raise "We may no longer need the monkeypatch #{__FILE__}!" if Rails::VERSION::MAJOR > 4

    def transaction(*args)
      puts "PDB: in_transaction.before"
      super(*args) do
        yield
      end
      puts "PDB: in_transaction.after"
    rescue PG::InFailedSqlTransaction => e
      puts "PDB: transaction_rescue.error #{e}"
      puts "PDB: transaction_rescue.rolling_back"
      connection.rollback_db_transaction
      connection.clear_cache!

      puts "PDB: transaction_rescue.before"
      super(*args) do
        yield
      end
      puts "PDB: transaction_rescue.after"
    end
  end
end

class << ActiveRecord::Base
  prepend TransactionRecoverable::ClassMethods
end

And got this output:

Apr 25 22:01:20 Staging web:RFUULYAEHZQ info [api] [d223adcf-f54c-497f-ac3d-99967f4cdf74] Parameters: {"amount"=>"123", "customer_external_id"=>"testing", "description"=>"testing6", "subdomain"=>"api", "format"=>"json", "controller"=>"api/v0/transactions", "action"=>"create"}
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.after
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.after
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.after
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: in_transaction.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info [api] [d223adcf-f54c-497f-ac3d-99967f4cdf74] PG::FeatureNotSupported: ERROR:  cached plan must not change result type
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info : SELECT  "transactions".* FROM "transactions" WHERE "transactions"."id" = $1 LIMIT 1 FOR UPDATE
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: transaction_rescue.error ERROR:  current transaction is aborted, commands ignored until end of transaction block
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: transaction_rescue.rolling_back
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info PDB: transaction_rescue.before
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info WARNING:  there is no transaction in progress
Apr 25 22:01:20 Staging web:RFUULYAEHZQ info [api] [d223adcf-f54c-497f-ac3d-99967f4cdf74] method=POST path=/v0/transactions format=json controller=api/v0/transactions action=create status=500 duration=159.42 view=0.15 db=37.47 request_id=d223adcf-f54c-497f-ac3d-99967f4cdf74 params={"amount"=>"123", "customer_external_id"=>"testing", "description"=>"testing6", "subdomain"=>"api", "format"=>"json"} account=acct_1234567890

ryanwjackson on 26 Apr 2017

Yes, this is the bug fixed in the PR. The solution is upgrading to rails 5

samsondav on 26 Apr 2017

@henrik we're getting the error @ryanwjackson says in Rails 4.2.0... Are you using the patch within the same version?

amedrz on 26 Apr 2017

@henrik @samphilipd We found the issue(s). Basically, the above patch ignores the in_transaction? method checking in the PR: https://github.com/rails/rails/pull/22170/files#diff-cc31fc1dd1e78db64638200f710b2f59R596

So we pulled something together that appears to be working for us:

module TransactionRecoverable
  module ClassMethods
    raise "We may no longer need the monkeypatch #{__FILE__}!" if Rails::VERSION::MAJOR > 4

    def transaction(*args)
      begin
        super(*args) do
          yield
        end
      rescue PG::InFailedSqlTransaction => e
        # Raise error to parent transaction if in child transaction.  This prevents issues with dependency errors in a case where you have a transaction that spawns two child transactions, the second depending on the result of the first.  This happens because the rollback_db_transaction rolls back everything under the parent transaction NOT the current transaction.
        raise e if connection.open_transactions > 0

        connection.rollback_db_transaction
        connection.clear_cache!

        super(*args) do
          yield
        end
      end
    end
  end
end

class << ActiveRecord::Base
  prepend TransactionRecoverable::ClassMethods
end

ryanwjackson on 26 Apr 2017

@amedrz No, we're currently on 4.1 for the app in question.

henrik on 2 May 2017

@ryanwjackson We are facing a similar issue as the one you fixed with your raise e if connection.open_transactions > 0.
Thank you for this fix btw.

I spent some time analyzing this bug on our side before getting to your comment and I noticed that when commenting the rollback_db_transaction line, it seemed to fix the issue...

I did a few tests and I couln't see any record duplicate or issue with the data committed in my PostgresDB. I might be missing something here, but why do we need to rollback the DB transaction at this point ?

Mathiou04 on 29 Nov 2017

Ok, after a few experimentations I think I understand my issue a bit more clearly.

On my side, the error is raised during an after_create hook.
So the ROLLBACK is triggered and it cancels the complete DB transaction (the create and the after_create SQL statements).

But unfortunately, the only thing that is retried is the code (and therefore the SQL) in the after_create hook. So basically there is no crash but my object is not created at all...

Currently I fixed it by replacing the after_create hook with an after_commit hook but this is not very satisfactory.

Any idea how to fix this ?

Mathiou04 on 30 Nov 2017

I'm having a similar issue in my Rails 6 app. Had to disable prepared_statements to get pg out of gridlock.

erdostom on 3 Jan 2020

👍2

here is the migration that triggered the issue. my model has a before_save hook

````
class AddDateToSubmission < ActiveRecord::Migration[6.0]
def change
add_column :submissions, :date, :date

reversible do |change|
  change.up do
    Submission.all.each { |sub| sub.touch; sub.save }
  end
end

end
end
````

erdostom on 3 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Folder with active storage

kaluznyo · 45Comments

Missing helper file helpers//Users/xxxx/Sites/xxxx/app/helpers/application_helper.rb_helper.rb

waissbluth · 126Comments

MissingTemplate issues in Rails 3.0

rishav · 66Comments

Provide encrypted secrets

dhh · 44Comments

dependent => :destroy deletes children before "before_destroy" is executed

lighthouse-import · 50Comments