Oj: Oj::StreamWriter inconsistently handles Hashes with nil values

Created on 16 Oct 2018  路  23Comments  路  Source: ohler55/oj

I am getting inconsistent results when running the following snippet of code within a irb console:

require 'oj'

io = StringIO.new
w = Oj::StreamWriter.new(io)
w.push_array
child = {"id"=>"1", "name"=>"foo", "nilElement"=>nil}
w.push_value(child)
puts io.string

specifically, the nilElement key/value pair is sometimes added and sometimes not.

# Output 1
"[{\"id\":\"1\",\"name\":\"foo\",\"nilElement\":null}"

# Output 2
"[{\"id\":\"1\",\"name\":\"foo\"}"

Within the same session, the behaviour stays consistent, i.e. nilElement is either always added or never added if w.push_value(child) is called multiple times.

All 23 comments

I'll look at it tonight.

Thank you @ohler55, for the quick answer and for the awesome work 馃

Fixed. Will release after travis tests complete.

and... released

@ohler55 thanks again, unfortunately I now get a hard crash when running my test suite:

json_stream_buffer.rb:8: [BUG] Segmentation fault at 0x0000000000000018 ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]

It's coming straight out of a line where sc_parse is called. I'm using sc_parse to parse a JSON stream and subsequently call the methods above through Oj::StreamWriter.

I am upgrading from 3.6.10.

The error trace is quite massive, I think this is the part which might help:

-- Control frame information -----------------------------------------------
c:0028 p:---- s:0151 e:000150 CFUNC  :sc_parse
c:0027 p:0016 s:0145 e:000144 METHOD .../json_stream_buffer.rb:8
c:0026 p:0012 s:0140 e:000139 METHOD .../base_stream_buffer.rb:69
c:0025 p:0048 s:0134 E:0011b8 METHOD .../base_stream_buffer.rb:57
c:0024 p:0018 s:0129 E:000d78 BLOCK  .../base_stream_buffer_test.rb:41 [FINISH]
c:0023 p:0024 s:0125 e:000124 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:98
c:0022 p:0002 s:0122 e:000121 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:195
c:0021 p:0006 s:0117 e:000116 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:95
c:0020 p:0014 s:0114 e:000113 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:265
c:0019 p:0006 s:0109 e:000108 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:94
c:0018 p:0029 s:0106 e:000105 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:360
c:0017 p:0044 s:0098 E:0025d8 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:211
c:0016 p:0005 s:0091 E:001370 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest/test.rb:93
c:0015 p:0010 s:0087 e:000086 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:960
c:0014 p:0026 s:0080 e:000078 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:334
c:0013 p:0011 s:0072 e:000071 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:321 [FINISH]
c:0012 p:---- s:0068 e:000067 CFUNC  :each
c:0011 p:0007 s:0064 e:000063 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:320
c:0010 p:0029 s:0061 e:000060 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:360
c:0009 p:0029 s:0053 E:002138 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:347
c:0008 p:0109 s:0046 E:001b98 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:319
c:0007 p:0044 s:0037 e:000036 METHOD /usr/local/bundle/gems/railties-5.2.0/lib/rails/test_unit/line_filtering.rb:10
c:0006 p:0010 s:0031 e:000030 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:159 [FINISH]
c:0005 p:---- s:0027 e:000026 CFUNC  :map
c:0004 p:0038 s:0023 e:000022 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:159
c:0003 p:0154 s:0014 e:000013 METHOD /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:136
c:0002 p:0059 s:0007 E:0019a8 BLOCK  /usr/local/bundle/gems/minitest-5.11.3/lib/minitest.rb:63 [FINISH]
c:0001 p:0000 s:0003 E:000110 (none) [FINISH]

EDIT: The error occurs in 3.6.11 too, but not in 3.6.10 or lower. I can open a separate issue if you deem it more appropriate.

Some additional info as well: in the test which is failing, I am calling break within an each loop called onto an Enumerator that wraps sc_parse. Simplifying, this is what I'm doing:

class MyParser
  def initialize(io)
    @io = io
    @writer = Oj::StreamWriter.new(@io)
    @enum = Enumerator.new do |yielder|
      @yielder = yielder
      Oj.sc_parse(self, io)
    end
  end

  # Stream parsing methods
  def push_object
    @writer.push_object
    @yielder << @io.string
  end

  ...
end

MyParser.new(io).enum.each.with_index do |chunk, i|
  break if i > 0
end

The full version implements a buffer that checks whether the JSON can be returned (e.g. a full object has been received) and yields through the enumerator accordingly. The test is there to see that a break is possible when consuming the enumerator.

It probably is a separate issue but I will look at it in any case.

Can you complete the example with the value you are using for io and the other method on MyParser if they are relevant. With a few additions I was able to run the sample but did not see a crash so I must be missing something.

@ohler55 here's a minimal example I put together which leads to the crash:

class MyParser
  attr_accessor :enum

  def initialize
    @io = StringIO.new
    @writer = Oj::StreamWriter.new(@io)

    json_string = Oj.dump({ records: 1.upto(25).map{|i| { id: i, name: "record_#{i}" }} })
    @test_json = StringIO.new(json_string)

    @enum = Enumerator.new do |yielder|
      @yielder = yielder
      Oj.sc_parse(self, @test_json)
    end
  end

  # Stream parsing methods
  def hash_start
    @writer.push_object
  end

  def hash_end
    @writer.pop unless @io.eof
  end

  def hash_key(key)
    @writer.push_key(key)
  end

  def hash_set(h, key, value)
    @writer.push_value(value)
  end

  def array_start
    @writer.push_array
  end

  def array_end
    @writer.pop
  end

  def array_append(a, value)
    yield_data
  end

  def add_value(value);end

  def yield_data
    @writer.pop_all
    @yielder << @io.string
    @io.reopen("")
    array_start
  end
end

Given the module above, the following works:

MyParser.new.enum.to_a

while the following crashes:

MyParser.new.enum.each.with_index do |r, i|
  break if i == 0
end

Finally, as stated previously, the same code works without errors using 3.6.10

Okay, partly fixed. The fix is in the clear-error branch. It no longer crashes but it does seem to have a problem with the parsing that only shows up with the approach you are taking. I'll dig some more tomorrow. Other commitments tonight.

I looked at it again. I was thrown off by the break in the with_index loop. The break causes a parse failure but that is expected. I think it is all good in the branch I pushed last night.

@ohler55 I checked out that branch and my tests now pass, thank you 馃憤

Thanks. I've merge the branch to master and will release in the next day or two.

Awesome, thank you!

@ohler55 the last version doesn't seem to be released yet

@ohler55 I have a follow up question on the last issue we discussed here. With the released 3.6.13 version (and above), the code snippet I provided raises an Oj::ParseError when breaking out of the stream parsing:

Oj::ParseError (Array not terminated (:records) at line 1, column 41 [sparse.c:851])

Is this the intended behaviour? Is there any other recommended way of interrupting the stream parsing without Oj raising an exception?

My use case is to limit a potentially large HTTP response to only the first X records. I am using Oj's stream parsing up until the limit and I want to break out of it once this has been reached.

Of course I could add some logic to swallow that error on my end. I think, though, that interrupting the parsing of a stream might be a common use case - in which case I would expect a break to not raise an exception. What are your thoughts on this?

So you have two component in play, a parser running and code using StreamWriter to generate that code. If you close the stream the writer is using a parse exception is raised. If you raise your own exception to terminate the parsing I think (I have to double check) your exception will show up instead of the parse error. The only other way to break out of the parser would involve changing in the API to somehow signal back to the parser that the parsing should stop. At this point I think raising an exception to abort will impact other users less than changing the API. Have you tried raising your own exception?

@ohler55 thank you as always for the speedy response. I like the idea of a custom exception as an alternative solution. At the same time, the same code is working as I would have expected up until 3.6.10. Not sure at this point if this was intended or just incidental?

It was not intentional. I'm surprised it worked in the past without a parse error.

ok @ohler55 thanks for the input, I will go with a custom exception then.

@ohler55 I did as suggested and I can build a working solution. There is something fishy going on though and I feel like reporting it. Somehow the parser seems to be hijacking the exception I'm throwing and writing its own error message into it.

Given the same MyParser class from above (https://github.com/ohler55/oj/issues/508#issuecomment-430967915)

MyParser.new.enum.each.with_index do |r, i|
  raise StopIteration if i == 0
end

leads to the following:

StopIteration (Array not terminated (:records) at line 1, column 41 [sparse.c:851])

The same happens with any other exception which is raised while parsing, which imho obfuscates the code and complicates troubleshooting issues. If I would do a genuine mistake, for example:

MyParser.new.enum.each.with_index do |r, i|
  self.some_non_existing_method
end

... I would end up with this exception being raised:

NoMethodError (Array not terminated (:records) at line 1, column 41 [sparse.c:851])

Ok, that not correct behaviour. Lets reopen this.

Trying out the error-pass branch now.

Or now the released gem.

@ohler55 sorry for the late follow up, got caught up in other topics. I tested with 3.7.4 and everything works including breaking out of the iteration. I don't have to raise an exception, the original approach I posted now works again:

MyParser.new.enum.each.with_index do |r, i|
  break if i == 0
end

Thank you for your support! Closing this now

Was this page helpful?
0 / 5 - 0 ratings

Related issues

orien picture orien  路  18Comments

kmasuda-aiming picture kmasuda-aiming  路  8Comments

Asmoddym picture Asmoddym  路  6Comments

werleo picture werleo  路  7Comments

ohler55 picture ohler55  路  21Comments