Logstash: [discuss] Event field value mutability

Created on 25 Jul 2017  路  11Comments  路  Source: elastic/logstash

I just became aware of issues #7662 and #7663.

In #7662 it was demonstrated that mutating a String field value extracted from the event using Event.get did not have a consistent behaviour on both the Ruby and Java sides when re-reading the same field value but without updating it using Event.set after the mutation.

When we introduced the new Event getter and setter, it was in part to emphasize on the undefined behaviour of in-place mutating an Event field value.

Currently the only way to make sure the Event has a consistent field value is to set it back in the Event. So for me #7662 does not look like a bug but maybe more a lack of documentation?

The spec in #7662 should be instead:

    it "should propagate changes using setter" do
      e = LogStash::Event.new()
      e.to_java.setField("foo", "bar")
      expect(e.get("foo")).to eq("bar")
      s = e.get("foo")
      s.gsub!(/bar/, 'pff')
      e.set("foo", s)
      expect(e.get("foo")).to eq("pff")
      expect(e.to_java.getField("foo")).to eq("pff")
    end

The undefined behaviour of in-place mutation of a field value has been discussed and agreed upon some time ago. I would like us to make sure we are in sync on this before potentially engaging in more work in that area.

@original-brownbear @guyboertje @jsvd @jordansissel

discuss

All 11 comments

@colinsurprenant do you have the issue in which the undefined behavior was decided upon to link here for reference (or whatever paper trail exists of that decision) for me to read up on?

I think #5035 #5140 #5141 should cover this. Otherwise @guyboertje @jordansissel @ph @suyograo were also part of these discussions and should be able to explain context/history.

We also know we lack a good Event API public documentation.

We have pretty extensive Event API docs here: https://www.elastic.co/guide/en/logstash/current/event-api.html

Here are my thoughts:

Long term: We should make get return values that, if mutated (or even if allowed to mutate), do not and are not expected to modify the value in the Event. This goes for Scalars (String, etc) and Containers (array, map, etc)

Short term: Perhaps document that this behavior is undefined and strongly discourage inplace mutation of strings, lists, or map types?

Research: Determine if mutation of get()'d values is something useful for users and also something we want to support. My thoughts are that we don't, but I'm open to ideas.

@suyograo I stand corrected but wouldn't call it extensive either. I will propose changes to improve.

@jordansissel

Long term: We should make get return values that, if mutated (or even if allowed to mutate), do not and are not expected to modify the value in the Event. This goes for Scalars (String, etc) and Containers (array, map, etc)

I agree that having an undefined behaviour, even if documented, is not ideal. I guess the question is if we want/need to protect the Event from potential mutations outside the set() operations vs the performance cost? In other word, do we need to add code (with a potential performance cost) around this if for practical matters it hasn't been real problem (or has it?)

Maybe in logstash 7.x we can just move toward using java.lang.String, unmodifiableMap/List ?

For the most part java strings act like ruby strings in jruby, same with the map / list types. Where they don't it would only require relatively small patches to plugins to fix any issues.

We would still have to convert from ruby -> java, but we wouldn't ever have cause to do the reverse in this setup.

Personally I think we should:

  • Not support Event field value mutation outside the Event: any modifications to fields should be done using the set() method. This will protect us from implementation changes and make it much easier to support other plugins languages that requires type conversion/proxying.
  • Not try to prevent the mutation of a returned field value outside the Event. I don't think it adds much value, might add complexity and might have some performance cost.
  • Document that mutating a filed value has undefined behaviours and should be avoided and updating an Event field should be done using set().

I think that this is very practical and will not break or change anything in the short term and has impact to only plugins developers which can rely on the API documentation for this.

You make some good points @colinsurprenant . I'm +1 on your proposal.

I am +1 with your proposal @colinsurprenant (and also on the original proposal of using setters)

Was this page helpful?
0 / 5 - 0 ratings