I've given a lot of thought to some past discussions WRT the new Event API and I think there's a solution below that will make everyone happy.
We have a bad API/user experience problem in Logstash 5.0. event[‘foo’][‘bar’] = ‘blah’ and event['foo'].downcase! should either set the value (as it currently does) or raise an error if it doesn't. This follows the principle of least astonishment, a key tool in UX design that definitely applies here. It doesn't matter which one it does, but it should perform one of those two actions. Setting the value is the obvious thing it would do. If it doesn't do the obvious thing it should raise an error.
An API where things can easily look like they do something, and in fact used to be the recommended way of doing things (going by our own code), but now does nothing with no feedback is bad.
Silent failure is the worst possible experience for a user. We should not make that choice intentionally. Even if they eventually learn and understand this API it doesn't mean they would (or should) like it.
Regardless of how well we document it, or how many people can be trained to not do it, for a developer new or old that's still a bad experience. People could easily make an innocent mistake.
What's the fix?
Preserving the old behavior is a controversial topic. Doing so would require a Java event that could operate equally well on Ruby objects as well as Java ones and be switched between ruby and Java internal representations depending on whether it was being used in a Java plugin.
I believe we can all agree as a lowest common denominator that the following would be better than what's currently in 5.0.0. We simply should invoke .freeze recursively on any ruby objects returned. This would only be a handful of lines of code, and would clearly warn and educate users to not mutate data.
All this means is that:
event['foo'].downcase! # doesn't work with current java event, but fails silently
would raise an exception forcing the user to rewrite it correctly as
event['foo'] = event['foo'].downcase # does the right thing
We could even add extra info to FrozenString errors letting users know about event mutability.
Thank you all for considering this proposal! I think it may still be worth talking about a 'full' fix where nothing would change but I'm not so certain that perspective is very popular ATM, though I think it is technically achievable.
event['foo'].downcase! # doesn't work with current java event, but fails silently
Is this true? In MRI with normal Ruby Strings, String#downcase! raises an exception:
>> a.downcase!
RuntimeError: can't modify frozen String
from (irb):4:in `downcase!'
from (irb):4
from /home/jls/.rvm/rubies/ruby-2.2.1/bin/irb:11:in `<main>'
(I like the exception behavior here)
Assuming I'm reading you right ---
I am +1 on moving towards a smaller interface (not allowing event[key][key2]` for example, if possible). I am also +1 on having these exceptions thrown when a user tries to do an unsupported mutation.
@jordansissel yes, that is what I'm proposing.
I think we should add a top-level rescue checking for "can't modify frozen string" and wrap the error with a new Error class saying something like:
Whoa there cowboy!
It looks like you're trying to lasso a frozen object with that ruby rope!
Try invoking .dup to let that buckin' steer out of the corral proper.
Then, see if you can get that ornery bull back in with a direct assignment.
Or we could do something more boring without the rich western imagery. But my point is that we can enhance the default error message.
@jordansissel just to clarify, we've already moved to the smaller interface. It's just a question of whether we throw exceptions or not. Or, whether we DO continue to support the old interface (which is extra work to achieve)
In principle, I am a big +1 on throwing exceptions to users when they wrongly use an API in Ruby Filter (for 5.0). IMO fail fast is the right thing here, and should be done at the code level in addition to docs/breaking change announcements.
I think we all agree to minimize backward compatibilities but having them and documenting them in major version bump is ok.
Personally I don't see this API change as dramatic as @andrewvc sees it. As long as we properly document it and warn about this change it should be sufficient. This is an API change that gets rid of an implementation detail and there will certainly be more such changes where relying on some pre-conception on the nature of returned objects from the Event will not be possible. I do agree we should aim for the least surprise principle, but frankly, assuming that the returned object are references to the Event inner objects and are in-place mutable is just plain wrong to start with - it just happened it was the case and some, not all, developers used that _trick_.
I am +1 on finding ways to warn or prevent these mis-usages. But I don't think returning immutable objects here is necessarily the best approach. Since the returned objects are copies, you should really be free to do what you want with them.
With the new API, one way to modify an event is to do something like this:
# given this event
event["foo"]["bar"] = "baz"
# I want to populate event["foo"] using an intermediate Hash
h = event["foo"]
h["bar2"] = "baz2" # this would fail because h is immutable
h["bar3"] = "baz3" # this would fail because h is immutable
# set that back into event
event["foo"] = h
IMO this approach should be valid.
Also, recursively setting returned objects as immutable seems like a waste too.
And for the event["foo"]["bar"].downcase! situation, well, this is why these are "bang" methods and you should generally avoid using them unless you completely understand the context it is used in. Again the following would not work either:
# given this event
event["foo"]["bar"] = "BAZ"
s = event["foo"]["bar"]
s.downcase! # this would fail because s is immutable
event["foo"]["bar"] = s
Which IMO should also be valid.
@colinsurprenant I think that I'd agree with you if this was a closed source program that only people internal to the company worked on.
However, Logstash being an open source project I think that having an easy to use API is a feature with significant value. An API that _looks_ like it is doing work that it is in fact not doing is frustrating.
I don't believe that a good API requires people to be high priests who have spent sufficient time studying the sacred Logstash user manual to understand the nuances of object references in events in order to get something useful done. This is the principle of least astonishment. We can document any behavior we wish.The ideal API is one that needs as little documentation as possible and still works well.
With regard to your example, I think that's good that the code sample doesn't work with frozen objects. The exception will encourage them to rewrite this as:
event['[foo][bar]'] = event['[foo][bar]'].downcase
which is what we want, right? We want the API to help the user understand, not to befuddle and confuse.
Perhaps we event write a convenience method (inspired by clojure) that makes it look like this.
event.mutate('[foo][bar]') {|v| v.downcase}
# or even more concisely
event.mutate('[foo][bar]', &:downcase)
So, setting the returned values as immutable does have the benefit of providing advanced warning on _potential_ compatibility issues. Maybe instead of doing this systematically we could provide a runtime options to _turn on_ objects immutability? or maybe have it turned on by default in the first release and users can turn it off and eventually leave that off by default and users can turn it on to do sanity checks eventually?
Taking it one step further, we COULD expose mutable objects in the same way. We could say that mutate returns a mutable object that will be properly saved when the block exits. So:
event.mutate('[foo][bar]') do |value|
value['baz'] = 'bot'
value['blah'].downcase!
value
end
Would work correctly, and we'd have an explicit area where mutable operations could work consistently.
@andrewvc
I don't believe that a good API requires people to be high priests who have spent sufficient time studying the sacred Logstash user manual to understand the nuances of object references in events in order to get something useful done.
No, this is basic sanity principles of software engineering and properly documenting how to use and what is expected is totally ok.
This is the principle of least astonishment.
Respecting this principle for the sake of backward compatibility is not a valid point IMO. Major version releases gives us the opportunity to break APIs and correct previous design mistakes.
@colinsurprenant well, we're definitely giving up backwards compatibility here. I'm not arguing that's relevant here. I'm saying the API should be unsurprising. The less it behaves in unanticipated ways, the less we have to document and the more enjoyable it will be to use.
WRT your position on documentation we'll just have to agree to disagree.
@andrewvc
event.mutate('[foo][bar]') do |value|
value['baz'] = 'bot
value['blah'].downcase!
end
and
event.mutate('[foo][bar]') {|v| v.downcase}
# or even more concisely
event.mutate('[foo][bar]', &:downcase)
are completely over-designed. we don't need to go that far.
The principles are super-easy to understand and respect, very healthy and also very common: you can't assume that what is returned from an "opaque" object method/getter/accessor are references that you can use to modify that original object in-place. this completely violates all sanity principles. To update and object you have to use its _setters_ methods.
@andrewvc again, breaking compatibility in a major version release is OK and this is the whole idea with the semantic versioning stuff, which we don't respect to the letter but in its principles.
@colinsurprenant I want to be clear. This isn't about compatibility. This is about having an ergonomic API and a good developer experience. Even if this was v0.0.1-alpha of the logstash API I would have the same concerns.
Well, if «ergonomic API and a good developer experience» means relying on implementation specific behaviours and bad software engineering practices then I am not following. The quality is likely to increase as we use better practices, which in turn are easy to understand by developers.
Let me know if my suggestion of having immutability as a runtime option make sense?
Hi,
I know has derived to many discussions, I will try to share here how I do think the situation, I want to say before that I think we're all willing to archive here the best user experience here, all arguments are very valid and acceptable.
For me, I think the best path to reach the principle of least astonishment is to let users understand that the new way of working is better and encouraged, but as we're introducing a new change we can:
At my understanding 1. is a must, but not necessary covering everyone, specially due to our always best effort not to break stuff.
To make the user more aware of this probable issues we should move towards something like 2. or even better 3., as this together with 1. will make the change very explicit, and for sure make people aware they are entering the breaking change area.
On the other side, (credits to @jsvd for this proposal ), we should strongly think about going into an experimental feature flag, this will help to educate people with breaking changes.
What do you think?
@colinsurprenant
I am just trying to understand this.
Let me know if my suggestion of having immutability as a runtime option make sense?
If we make @andrewvc proposal a flag and turn it on by default and make the event returning immutable objects. When the user configuration works, why they will want to turn this immutability feature off? What are the benefit of doing so? Performance?
I'll just add that the mutate function I'm proposing--which I stole from update-in which is one of the most commonly used functions in Clojure--makes a ton of sense whether we use immutable data types or not. We could even use varargs for the path to make it cleaner.
Some examples
event['foo'] = event['foo'].downcase
# is less concise and requires more repetition than
event.mutate('foo', &:downcase)
tmp = event['[foo][bar]'].dup # returns a Hash
tmp['blah'] = 'bar'
tmp['bot'] = 'baz'
event['[foo][bar]'] = tmp
# is less concise than
event.mutate('foo', 'bar') do |v|
v['blah'] = 'bar'
v['bot'] = 'baz'
v
end
Additionally, mutate can be more efficient since we don't need to do the path lookup twice and just hold on to the object reference. It's also less error prone and is more DRY since we only specify the path once.
@andrewvc I find this mutate method is similar to what #tap is doing.
@ph, it is a lot like tap. Now that you mention it it'd be good to have both! The difference is that my mutate function changes the value to the return of the block. That's really useful because you do things like:
event.mutate('foo','bar') {|v| v.to_s } # Change the type of the object
event.mutate('foo','bar') {|v| v.downcase } # use the return value instead of mutating the actual value
However, tap is nice because you can do
event.tap('foo','bar') {|v| v['foo'] = 'bar'}
# where with mutate you need to do
event.mutate('foo','bar') {|v| v['foo'] = 'bar'; v}
In Clojure all data-structures are immutable (in 99% of situations), so the return value is necessary. That's why it popped into my mind here. Even if we don't use immutable data-structures you still often have the case where you want to replace the data, not mutate it.
Also, thinking it through I think clojure has it right with update, it's a better word than mutate.
@ph because having immutable object _can_ be helpful to identify _potential_ event assignment problem but prevent you from doing assignment to an intermediate inner hash in a totally correct way if this is what you want to do. Also, as I said
recursively setting returned objects as immutable seems like a waste
This is an unnecessary overhead when your plugin is working correctly.
For the mutate function, I wouldn't mind introducing it as a new feature, it's not a bad idea and could be a nice addition but not necessary nor urgent at this point because we should still support the current fieldref style gettters and setters a = event["[foo]"] event["[foo]"] = 1.
Let's not forget here that the vast majority of users are not impacted by any of this. Only a small subset which use Ruby filters and/or custom plugins could be impacted. Also, only a few of our ~200 plugins had to be modified to respect the new semantic.
@colinsurprenant thanks for clarifying this.
Maybe we should keep all the good Closure ideas for when we introduce a Closure API? I mean, if we start in that direction we will yak shave ad nauseam defining new API paradigms that bring very little value to the table.
@colinsurprenant these are just programming ideas. Clojure is functional and ruby is mixed functional / OO. A lot of core clojure ideas (map/reduce, etc.) are ruby natives, so it makes sense here.
When you say that these things bring very little value that's your opinion, not an empirical fact. I respect that, but I disagree. There's FP stuff all over ruby (and matz cited Lisp as a major influence on Ruby).
Good healthy discussion here but it should have taken place in 5035
Bold, unabashed, draw a line in the sand statement - This relaxed approached to the Event API must end. No looks backward.
Here is why.
First, some arguable assertions on the Java Event implementation.
Before reading this thread I thought about freeze too but I rejected it on the grounds of needing to be done on intermediate collections only which is very opaque to the users.
Mutate operations on collections have bad side effects.
Example from Ruby Event
> event = LogStash::Event.new
=> #<LogStash::Event:0x25ce435 @metadata_accessors=#<LogStash::Util::Accessors:0x7ea71fc2 @store={}, @lut={}>, @cancelled=false, @data={"@version"=>"1", "@timestamp"=>"2016-04-15T14:19:31.561Z"}, @metadata={}, @accessors=#<LogStash::Util::Accessors:0x7cd5fcf4 @store={"@version"=>"1", "@timestamp"=>"2016-04-15T14:19:31.561Z"}, @lut={}>>
> event["[a][0]"]
=> nil
> event["[a][0]"] = 42
=> 42
> event["[a]"]
=> {"0"=>42}
> event["[a]"] = [42,24]
=> [42, 24]
> event["[a]"]
=> [42, 24]
> event["[a][0]"]
=> 42
> event["[a]"] = [24,42]
=> [24, 42]
> event["[a][0]"]
=> 42
> event["[a][0]"] = {"a"=> 99, "b" => 98}
=> {"a"=>99, "b"=>98}
> event["[a][0]"]
=> {"a"=>99, "b"=>98}
> event["[a]"]
=> [24, 42]
> event["[a][0]"]
=> {"a"=>99, "b"=>98}
> event["[a][0][b]"]
TypeError: can't convert String into Integer
from org/jruby/RubyFixnum.java:1143:in `[]'
It seems to me that there a list of troublesome operations on intermediate collections including []= and others
Lets remove them.
In the Java Event we control exactly which class we convert Java Map and List to.
I propose that we convert these to EventHash and EventArray (wrappers to their Ruby equivalents)
These classes would not expose any setters or mutation methods e.g. merge! keep_if []=.
Further we can introduce methods that allow some changes to occur in the intermediate collection but also invalidate the LUT and PathCache.
When you say that these things bring very little value that's your opinion, not an empirical fact.
It brings little value because it is absolutely not required, it is perfectly possible to continue using the current API correctly. The fact that it's an empirical fact or not is totally irrelevant to the discussion.
@guyboertje I think that returning different collection types makes a lot of sense. Thanks for the examples demonstrating the confusing behavior of the current impl.
Question: How would this work with values? How would this fix the downcase! problem?
It'd be useful to create a list of test cases for use of the Event API and figure out how we'd want these to work. I think we've wound up going about the process from the internals out instead of the outside in.
@andrewvc
The downcase! problem...
It depends on whether my Dual Value for leaf objects PR gets merged - the downcase! problem not fixed without it.
If it is merged, then it works - and is fixed with or without EventHash like wrappers.
So
Internally it is a collection of collections (Java) but the leaf is a Dual Value [ruby value, java value | null]
if event["[a][b][0]"] points to a leaf then before the value is returned by the Event [] method the JRuby Extension unpacks the ruby value from the Dual Value and returns it - this is the original Ruby value - you can downcase! it. Internally that ruby value still exists in the Dual Value in the collection of collections. This works too event["[a][b]"][0].downcase! because event["[a][b]"] returns a new Array that contains the original leaf Ruby values.
But actually I want to discourage this because its a 'set up for a fall'...
Assume event["[a][b][0]"] -> 42 then event["[a][b][0]"] += 4 works because internally Ruby using Event#[] gets the value (i.e. the original value) and then adds 4 and uses Event#[]= to put it back.
However this event["[a][b]"][0] += 4 fails because Array#[]= is called on a copy.
This issue is becoming a melting pot of many different discussions. On my side I will conclude with this last comment.
Setting returned objects as immutable only if a logstash runtime/command line option is given to help users spot potential problems seems reasonable but I wouldn't enable it by default.
Other new Event API methods/paradigms/philosophies proposals should probably be done and discussed into their own issues.
@andrewvc can we close this now in lieu of #5140 ?
@suyograo yup! Closed.
Most helpful comment
Good healthy discussion here but it should have taken place in 5035
Bold, unabashed, draw a line in the sand statement - This relaxed approached to the Event API must end. No looks backward.
Here is why.
First, some arguable assertions on the Java Event implementation.
Before reading this thread I thought about freeze too but I rejected it on the grounds of needing to be done on intermediate collections only which is very opaque to the users.
Mutate operations on collections have bad side effects.
Example from Ruby Event
It seems to me that there a list of troublesome operations on intermediate collections including
[]=and othersLets remove them.
In the Java Event we control exactly which class we convert Java Map and List to.
I propose that we convert these to EventHash and EventArray (wrappers to their Ruby equivalents)
These classes would not expose any setters or mutation methods e.g.
merge!keep_if[]=.Further we can introduce methods that allow some changes to occur in the intermediate collection but also invalidate the LUT and PathCache.