I just wanted to log this as it caused me much frustration and isn't very consistent for the end user trying to write scripts...
ctx. Previously in transforms it was ctx._source.ctx._source. This hasn't changed since the previous versions._source or the property name.total += doc['goals'][i]; (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-painless.html#_accessing_doc_values_from_painless).For the love of PAINLESS can we please standardize on one common name to get the source doc. This is just out of hand and very unintuitive. I prefer just to have doc be the common variable across all of these.
Hi @niemyjski
Re point 4, the doc[] indicates a completely different access pattern, so I think that should remain distinct from the other 3.
To get the document as part of a pipeline inline script you would use
ctx.
Not sure what you mean here?
Somewhat related, if you want to access source doc and properties in a pipeline processor you just use
_sourceor the property name.
This seems OK to me, where you can leave out the _source as a commonly used shortcut.
But I agree that in scripts we should try to make things more consistent.
@talevy @martijnvg what are your thoughts?
- To get the document as part of a pipeline inline script you would use ctx. Previously in transforms it was ctx._source.
The case here is that you are not necessarily working with the same type of document as the lucene context. Here, we are dealing with something known as a sourceAndMetadata object, which contains both the original _source fields, as well as any other ingest metadata like _id, _type, etc. The reason for this skewed view into the originally sent source document is for a unified retrieval strategy within the mustache scripts of ingest's field templating (independent to painless). We can definitely revisit this to keep a more uniform view into the document, more similar to how the indexing stage sees it.
- Somewhat related, if you want to access source doc and properties in a pipeline processor you just use _source or the property name.
right, that is just a convenience scheme. Also, in this case, we are not using painless, this is a custom Ingest scheme. I don't mean to say that to mean it is exempt from consistency, just saying it is not a painless context.
do you have anything to add about this @martijnvg? I am not sure how we can change things within 5.x to keep things backwards compatible, but totally agree we should revisit this for changes in 6.0, maybe?
@clintongormley point 4 is still accessing the raw document so I think it should be included. In a pipeline inline script you can't do ctx._source it's ctx is the document object. In a pipeline processor (not a script) you can use _source or the propertyname (E.g., {{name}}.
point 4 is still accessing the raw document so I think it should be included.
No it is not accessing the raw document, it is accessing the value from doc-values. So doc[] wouldn't be available in a pipeline or update script.
In a pipeline inline script you can't do ctx._source it's ctx is the document object.
Ah right, that is unfortunate. Would be good to make this consistent
The case here is that you are not necessarily working with the same type of document as the lucene context. Here, we are dealing with something known as a sourceAndMetadata object, which contains both the original _source fields, as well as any other ingest metadata like _id, _type, etc. The reason for this skewed view into the originally sent source document is for a unified retrieval strategy within the mustache scripts of ingest's field templating (independent to painless).
I think it'd be clearer to be able to access ctx._source.some_field and ctx._id etc, while today it looks like it'd be ctx.some_field vs ctx._id? That seems wrong.
We can definitely revisit this to keep a more uniform view into the document, more similar to how the indexing stage sees it.
Yeah, although I don't see a clear path to changing this without breaking bwc.
Discussed in FixitFriday: agreed with @clintongormley 's last comment that we should try to make ingest more consistent with other APIs. The bw compat looks challenging however.
In my 5.1.1 inline painless script for terms agg values I had to use params._source (was hoping for _source ('Variable [_source] is not defined.') and docs seemed to indicate ctx._source ('null_pointer_exception' at ctx.) (only reference I found https://www.elastic.co/guide/en/elasticsearch/reference/5.1/modules-scripting-painless.html#_updating_fields_with_painless). I eventually pieced together params._source from mailing list.
@nezda Thanks for pointing this out. It's super confusing that inside inline scripts _source has to be accessed from the params object and not directly as _source.
This still needs to be documented now that contexts are done.
Painless is very painful. I am looking to do something very simple: change all of the array type elements in my json to have "_nested_" prepended to its key (so that I can do dynamic nested type mapping).
So: { arr: [1,2] } -> { _nested_arr: [1,2] }
In javascript, this was a simple as:
(function transform(c) {
for (var keys = Object.keys(c), i = 0; i < keys.length; i++) {
var key = keys[i],
val = c[key];
if (Array.isArray(val)) {
c["_nested_" + key] = val;
delete c[key];
transform(val);
} else if (typeof val === "object")
transform(val);
}
})(ctx);
But the painless documentation is very confusing!
a. What is the type of ctx/ctx._source (sounds like map), and which one should I be referring to?
b. How does on iterate throughctx/ctx._source?
c. How does one delete a key on ctx/ctx._source?
d. Is painless pass by reference, or pass by value (will updating an element in a map update its parent)?
In javascript this would be much easier.
I would like to discuss creating consistency for input variables in contexts related to source, doc, and params.
There are a few things in this issue:
1) Consistent naming of source, moved to #52593.
source does not have good documentation, moved to #52600.source vs doc is confusing, source is nested, doc is not nested. doc is an accessor for fields in lucene, fields are flat, the . are just part of the field name. source is a json object which is nested. Keys may or may not have . in them. So we could not flatten source without introducing ambiguities.The best option we have is to document then, as will happen in #52600.
If there are any other thoughts related to above, we'd love to hear them in the issues I mentioned.
Regarding the questions posed in this issue:
a. What is the type of ctx/ctx._source (sounds like map), and which one should I be referring to?
ctx._source is a Map on the top level that represents a JSON blob using Maps, Lists, and primitives. For an update script, you need to use ctx._source for ingest you'd use ctx directly.
b. How does on iterate throughctx/ctx._source?
If you know what your source is, you can use a for loop Otherwise iterate through the top-level map and use instanceof to determine the types of values. eg. if (ctx._source['foo'] instanceof Map)...
c. How does one delete a key on ctx/ctx._source?
Use the remove method for Map or List.
d. Is painless pass by reference, or pass by value (will updating an element in a map update its parent)?
Pass by reference except for primitive types.
Please direct further usage questions to https://discuss.elastic.co/.
Most helpful comment
Painless is very painful. I am looking to do something very simple: change all of the array type elements in my json to have "_nested_" prepended to its key (so that I can do dynamic nested type mapping).
So:
{ arr: [1,2] } -> { _nested_arr: [1,2] }In javascript, this was a simple as:
But the painless documentation is very confusing!
a. What is the type of
ctx/ctx._source(sounds like map), and which one should I be referring to?b. How does on iterate through
ctx/ctx._source?c. How does one delete a key on
ctx/ctx._source?d. Is painless pass by reference, or pass by value (will updating an element in a map update its parent)?
In javascript this would be much easier.