Elasticsearch version: 2.3
JVM version: jdk1.8.0_74
OS version: Mac 10.11.4
Description of the problem including expected versus actual behavior: When using _source_exclude
it appears the return json for _source
is being modified and fields are being returned out of order
Steps to reproduce:
PUT /testindex/doc/1
{ "fielda": "one", "fieldb": "two", "fieldc": "three" }
_source_exclude
with both a missing and existing field and notice the order of the json returned as changed:Request
GET /testindex/_search?_source_exclude=nonexistantField
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "testindex",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"fielda": "one",
"fieldc": "three",
"fieldb": "two"
}
}
]
}
}
Request
GET /testindex/_search?_source_exclude=fielda
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "testindex",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"fieldc": "three",
"fieldb": "two"
}
}
]
}
}
Field order does not matter in JSON.
And yes with source filtering we have to generate on the fly a new source.
So I don't see the issue here TBH.
While it doesn't make a difference from a JSON spec standpoint, we have always made the advertisement that _source
is the exact JSON string you sent in saved and returned to you. Our docs even state this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
The _source field contains the original JSON document body that was passed at index time.
Since we've always advertised this (and to be honest in training a I even say you can "curl up" a json document you have in a file, and then request the _source, write it to a file and do a diff and they should be equal) I feel it's important we either adhere to it, or we need to update documentation to reflect that in situations where Elasticsearch needs to parse the _source
field and generate a new one (in the case of filtering like you mention), that the ordering can be different.
I'd be happy to help with documentation updates if we decide to go that route.
Yes. We also modify the source IIRC when you use exclude in mapping.
That's the only features IMO where we do that.
As a user, I'm expecting a modification of the source because I explicitly ask to modify the source :)
But I agree that we should probably add this in documentation.
I added notes where it seemed applicable in the docs. See referenced PR.
Closing. See https://github.com/elastic/elasticsearch/pull/17640#issuecomment-209037691
Why close this? I'm happy to update the document to be more accurate based upon the discussion in this thread. I feel it is very important to be clear to folks about this as opposed to stating The _source field contains the original JSON document body that was passed at index time.
we can add clarification that when _source is returned may be different from what was passed in?
How can the extra clarification hurt?
How can the extra clarification hurt?
Extra clarification can hurt when it obscures more important and more relevant information. Too much information is as bad as not enough, so I'd rather not clutter the docs with something that I don't think needs clarifying. We use JSON, so the assumption is that keys are unordered. Why do we need to repeat this statement? We don't state that JSON has to be UTF8 encoded, because the JSON spec already states this.