Active_model_serializers: Thread unsafe behavior

Created on 3 Aug 2018  Â·  10Comments  Â·  Source: rails-api/active_model_serializers

Expected behavior vs actual behavior

ActiveModelSerializer should properly serialize nested resources, but it does not under multi-threading environment. When multiple thread are serializing concurrently, a resource to be serialized can be replaced with another resource by another thread. Is this a known issue?

Steps to reproduce

Given the following data,

$ rails c
irb(main):001:0> Post.first
=> #<Post id: 1, title: "post1", body: "This is post1!", created_at: "2018-08-03 09:00:52", updated_at: "2018-08-03 09:00:52">

irb(main):002:0> Post.first.comments.to_a
=> [#<Comment id: 1, body: "This is a comment of post1", post_id: 1, created_at: "2018-08-03 09:00:52", updated_at: "2018-08-03 09:00:52">]

irb(main):003:0> Post.second             
=> #<Post id: 2, title: "post2", body: "This is post2!", created_at: "2018-08-03 09:00:52", updated_at: "2018-08-03 09:00:52">

irb(main):004:0> Post.second.comments.to_a
=> [#<Comment id: 2, body: "This is a comment of post2", post_id: 2, created_at: "2018-08-03 09:00:52", updated_at: "2018-08-03 09:00:52">]

Example1: with Puma (a thread-based web server)

Puma concurrently processes multiple requests using multi-threading. This example shows that a request replaces a resource which is being used by another concurrent request.

Given these two models; Post and Comment,

# app/models/post.rb
class Post < ApplicationRecord
  has_many :comments
end

# app/models/comment.rb
class Comment < ApplicationRecord                                                                                         
  belongs_to :post                                                                                                         
end

and these two serializers; PostSerializer and CommentSerializer.

# app/serializers/post_serializer.rb
class PostSerializer < ActiveModel::Serializer
  attributes :id, :title, :body
  has_many :comments, serializer: CommentSerializer do
    sleep 3
    object.comments
  end
end

# app/serializers/comment_serializer.rb
class CommentSerializer < ActiveModel::Serializer
  attributes :id, :body
  has_one :post
end 

Then the thread unsafe behavior can be reproduced like below.

  1. Launch Puma with bundle exec rails s
  2. Open your browser
  3. Open http://localhost:3000/posts/1 in a tab
  4. Open http://localhost:3000/posts/2 in another tab
  5. Reload both tabs within 3 seconds, and you'll get the following result
tab1

tab1

tab2

tab2

Example2: with rails runner

The behavior also can be reproduced without browser.

Adding a class method like below,

class Post < ApplicationRecord
  has_many :comments

  class << self
    def concurrent_serialization
      posts = {}
      t1 = Thread.new{posts[:first] =  PostSerializer.new(Post.first, {}).to_json}
      t2 = Thread.new{posts[:second] = PostSerializer.new(Post.second, {}).to_json}
      t1.join
      t2.join
      pp posts
    end
  end
end

running the method results that serialized two posts both have a same comment.

$ bundle exec rails runner "Post.concurrent_serialization"
{:first=>
  "{\"id\":1,\"title\":\"post1\",\"body\":\"This is post1!\",\"comments\":[{\"id\":1,\"body\":\"This is a comment of post1\"}]}",
 :second=>
  "{\"id\":2,\"title\":\"post2\",\"body\":\"This is post2!\",\"comments\":[{\"id\":1,\"body\":\"This is a comment of post1\"}]}"}

Environment

ActiveModelSerializers Version (commit ref if not on tag):

0.10.7

Output of ruby -e "puts RUBY_DESCRIPTION":

ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin17]

OS Type & Version:

MacOS X 10.13.6

Integrated application and version (e.g., Rails, Grape, etc):

  • Rails 5.1.6
  • Puma 3.12.0

Additonal helpful information

I uploaded all needed to reproduce on local.
https://github.com/johnny-miyake/ams_with_thread

My understanding

Below is my understanding based on my code reading.

ActiveModelSerializers stores reflection instances in a class variable of a serializer class. Each reflection instance holds a resource instance in @object. Because of this, the resource instance can be referred or be overwritten by another thread.

First, ActiveModel::Serializer#to_h() is an alias for serializable_hash() in the same class. It calls attributes_hash() and associations_hash().
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L357-L365

# active_model_serializers/lib/active_model/serializer.rb
def serializable_hash(adapter_options = nil, options = {}, adapter_instance = self.class.serialization_adapter_instance)
  adapter_options ||= {}
  options[:include_directive] ||= ActiveModel::Serializer.include_directive_from_options(adapter_options)
  resource = attributes_hash(adapter_options, options, adapter_instance)
  relationships = associations_hash(adapter_options, options, adapter_instance)
  resource.merge(relationships)
end
alias to_hash serializable_hash
alias to_h serializable_hash

attributes_hash() calls attributes().
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L386-L394

# active_model_serializers/lib/active_model/serializer.rb
def attributes_hash(_adapter_options, options, adapter_instance)
  if self.class.cache_enabled?
    fetch_attributes(options[:fields], options[:cached_attributes] || {}, adapter_instance)
  elsif self.class.fragment_cache_enabled?
    fetch_attributes_fragment(adapter_instance, options[:cached_attributes] || {})
  else
    attributes(options[:fields], true)
  end
end

attributes() calls ActiveModel::Serializer::Refrection#value() of each reflection instance.
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L339-L352

# active_model_serializers/lib/active_model/serializer.rb
def attributes(requested_attrs = nil, reload = false)
  @attributes = nil if reload
  @attributes ||= self.class._attributes_data.each_with_object({}) do |(key, attr), hash|
    next if attr.excluded?(self)
    next unless requested_attrs.nil? || requested_attrs.include?(key)
    hash[key] = attr.value(self) # This calls `ActiveModel::Serializer::Refrection#value()`
  end
end

ActiveModel::Serializer::Refrection#value() sets a resource instance to @object. This is where the resource instance is overwritten by another thread
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer/reflection.rb

# active_model_serializers/lib/active_model/serializer/reflection.rb 
def value(serializer, include_slice)
  @object = serializer.object # This line sets a resource to @object which is referred as `object` in a block which is given to `has_many` association
  @scope = serializer.scope

  block_value = instance_exec(serializer, &block) if block
  return unless include_data?(include_slice)

  if block && block_value != :nil
    block_value
  else
    serializer.read_attribute_for_serialization(name)
  end
end

Then ActiveModel::Serializer#associations(), which is called in serializable_hash() above, evaluates a block which is given to an association definition (e.g. has_many :comments in my PostSerializer). The block evaluated in reflection instance's context.
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L344-L349

# active_model_serializers/lib/active_model/serializer.rb
def associations(include_directive = ActiveModelSerializers.default_include_directive, include_slice = nil)
  include_slice ||= include_directive
  return Enumerator.new {} unless object

  Enumerator.new do |y|
    self.class._reflections.each do |key, reflection|
      next if reflection.excluded?(self)
      next unless include_directive.key?(key)

      association = reflection.build_association(self, instance_options, include_slice)

      # This evaluates a block which is given to an association.
      y.yield association
    end
  end
end

Because of above codes, my CommentSerializer serialized a wrong Comment because Post id:1 which was held by a reflection instance had been replaced with Post id:2 by the subsequent request which was processed in another thread.

Most helpful comment

@johnny-miyake
Thanks to your effort in finding out this bug. Base on your repository to reproduce the bug, I tried to figure out what actually happens and fix it.
Let's take a look on following file and observe that _reflections is a class-level variable
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L176

    with_options instance_writer: false, instance_reader: true do |serializer|
      serializer.class_attribute :_reflections
      self._reflections ||= {}
      ...
  end

When rails server boots and load fille app/serializers/post_serializer.rb,

class PostSerializer < ActiveModel::Serializer
  attributes :id, :title, :body
  has_many :comments, serializer: CommentSerializer do
    sleep 3
    object.comments
  end
end

has_many will introduces a reflection in class_attributes _reflections

    def self.has_many(name, options = {}, &block) # rubocop:disable Style/PredicateName
      associate(HasManyReflection.new(name, options, block))
    end

Unfortunately, since this is class level variable, puma server multi-thread access the same variable on all concurrent requests.
The root cause is, when first request has not finished their job, the second request override object, which leads to first request can access resource that belongs to second request
More deeper, object inside your PostSerializer access attr_accessor :object in reflection
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer/reflection.rb#L204

# used in instance exec
attr_accessor :object, :scope

Whenever, a new request comes to server, definitely a new instance of active serializer were created. However, calling serializable_hash() will replace object value by a new one. In case of the old request still have not finished yet and try to access object, bug will happens obviously.

Because of that, I have made a small change to clone _reflections on each request and make object cannot be overridden by another request anymore

@reflections ||= self.class._reflections.deep_dup

I have tested and it actually works
http://localhost:3000/posts/1
{"id":1,"title":"post1","body":"This is post1!","comments":[{"id":1,"body":"This is a comment of post1"}]}
http://localhost:3000/posts/2
{"id":2,"title":"post2","body":"This is post2!","comments":[{"id":2,"body":"This is a comment of post2"}]}

$ bundle exec rails runner "Post.concurrent_serialization"                                                                
{:second=>
  "{\"id\":2,\"title\":\"post2\",\"body\":\"This is post2!\",\"comments\":[{\"id\":2,\"body\":\"This is a comment of post2\"}]}",
 :first=>
  "{\"id\":1,\"title\":\"post1\",\"body\":\"This is post1!\",\"comments\":[{\"id\":1,\"body\":\"This is a comment of post1\"}]}"}

I have pull request https://github.com/rails-api/active_model_serializers/pull/2276
While waiting for AMS team response on it, we can change app server to unicorn or prevent using object inside our serializers

All 10 comments

Wow, great code spelunking!

I'm guessing this only happens with block-style associations.

I wonder how jsonapi-rb handles this.

I've been able to reproduce this with has_one with a block, too.

This sounds like a problematic and high-risk bug – surprising it hasn't come up before (I'm assuming it hasn't). I guess two (or more) requests would have to be processed by the same process, and using the same serialiser, at the same time (in different threads)… which is unlikely, but certainly possible in high traffic backends, and for resources that are fetched frequently.

Any known workarounds, or anything that could be done to mitigate this?

As it stands, this is also a security risk as it has the ability to leak data.

@johnny-miyake
Thanks to your effort in finding out this bug. Base on your repository to reproduce the bug, I tried to figure out what actually happens and fix it.
Let's take a look on following file and observe that _reflections is a class-level variable
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer.rb#L176

    with_options instance_writer: false, instance_reader: true do |serializer|
      serializer.class_attribute :_reflections
      self._reflections ||= {}
      ...
  end

When rails server boots and load fille app/serializers/post_serializer.rb,

class PostSerializer < ActiveModel::Serializer
  attributes :id, :title, :body
  has_many :comments, serializer: CommentSerializer do
    sleep 3
    object.comments
  end
end

has_many will introduces a reflection in class_attributes _reflections

    def self.has_many(name, options = {}, &block) # rubocop:disable Style/PredicateName
      associate(HasManyReflection.new(name, options, block))
    end

Unfortunately, since this is class level variable, puma server multi-thread access the same variable on all concurrent requests.
The root cause is, when first request has not finished their job, the second request override object, which leads to first request can access resource that belongs to second request
More deeper, object inside your PostSerializer access attr_accessor :object in reflection
https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer/reflection.rb#L204

# used in instance exec
attr_accessor :object, :scope

Whenever, a new request comes to server, definitely a new instance of active serializer were created. However, calling serializable_hash() will replace object value by a new one. In case of the old request still have not finished yet and try to access object, bug will happens obviously.

Because of that, I have made a small change to clone _reflections on each request and make object cannot be overridden by another request anymore

@reflections ||= self.class._reflections.deep_dup

I have tested and it actually works
http://localhost:3000/posts/1
{"id":1,"title":"post1","body":"This is post1!","comments":[{"id":1,"body":"This is a comment of post1"}]}
http://localhost:3000/posts/2
{"id":2,"title":"post2","body":"This is post2!","comments":[{"id":2,"body":"This is a comment of post2"}]}

$ bundle exec rails runner "Post.concurrent_serialization"                                                                
{:second=>
  "{\"id\":2,\"title\":\"post2\",\"body\":\"This is post2!\",\"comments\":[{\"id\":2,\"body\":\"This is a comment of post2\"}]}",
 :first=>
  "{\"id\":1,\"title\":\"post1\",\"body\":\"This is post1!\",\"comments\":[{\"id\":1,\"body\":\"This is a comment of post1\"}]}"}

I have pull request https://github.com/rails-api/active_model_serializers/pull/2276
While waiting for AMS team response on it, we can change app server to unicorn or prevent using object inside our serializers

Any updates on this issue? It looks pretty critical, when AMS is used inside multi-threaded env, e.g: Amazon SQS handler.

Wrote a test case as described by @johnny-miyake so that we can test more cases of failure and also test the fix.

https://gist.github.com/GeminPatel/4093be512b220b13ce3e46660e7df918

I'd like to confirm the fix.

ActiveModel::Serializer::Reflection#value() sets a resource instance to @object. This is where the resource instance is overwritten by another thread

I never really liked the way this is handled in AMS for this very reason. It's really surprising (and hence a bug) that the reflection instance should have any state related to serializer instance state.

If we were to add another object, we'd have something like

  • a Reflection, an instance describe the serializer's reflection configuration
  • A ReflectionInstance or (ReflectionValue ?), a reflection as being evaluated on an an instantiated serializer

I think the fix in #2276 basically gets around this issue by creating a 'throw away' copy of the Reflection instance for use when the serializer's reflection is being evaluated.

I'm not excited to add another object, but that's essentially what the fix in https://github.com/rails-api/active_model_serializers/pull/2276/commits/d93777283e171628c3cc916c3dedcdb0aed63fbc is doing so maybe let's be explicit about it.

looking at

# https://github.com/rails-api/active_model_serializers/blob/0-10-stable/lib/active_model/serializer/reflection.rb
def value(serializer, include_slice)
  @object = serializer.object # This line sets a resource to @object which is referred as `object` in a block which is given to `has_many` association
  @scope = serializer.scope

  block_value = instance_exec(serializer, &block) if block
  return unless include_data?(include_slice)

  if block && block_value != :nil
    block_value
  else
    serializer.read_attribute_for_serialization(name)
  end
end

      protected

      # used in instance exec
      attr_accessor :object, :scope

````

what would happen if we changed

```diff
-  @object = serializer.object # This line sets a resource to @object which is referred as `object` in a block which is given to `has_many` association
-  @scope = serializer.scope

-  block_value = instance_exec(serializer, &block) if block
+  block_value = serializer.instance_exec(serializer, &block) if block

I find instance exec/eval hard to reason about, but since we already do this in Field#evaluate_condition I suspect it would be fine and remove the need for adding another object

          if block.arity.zero?
            serializer.instance_exec(&block)
          else
            serializer.instance_exec(serializer, &block)
          end

On the other hand, I've definitely struggled with reflections being mutated by the evaluation of blocks, such as all the work I did to introducing lazy association when I wrote the first (!) reflection tests

test/serializers/reflection_test.rb 1ef7c7d35ba8a8822511329d75370f15cf41bada 876190440f18a0bb7004a5e401b5e5516adf36fc

@johnny-miyake based on your excellent work I've made a PR with a failing test https://github.com/rails-api/active_model_serializers/pull/2299

Closed by #2299

Was this page helpful?
0 / 5 - 0 ratings