'http://localhost:9200/twitter/tweet/_search?q=user:kimchy' returns:
{
"_shards":{
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits":{
"total" : 1,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
]
}
}
But sometimes it would be more useful to get a plain "dump" of the _source data instead:
{
...
"hits":{
"total" : 1,
"hits" : [
{
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
]
}
}
This would be really useful to have. In my case I'm trying to do HTTP response caching but "took" in the results obviously can change on each query even though the results are the same.
Hey,
you can do this with the current elasticsearch release for single documents (but not for searches)
curl -X PUT localhost:9200/foo/bar/1 -d '{ "name":"foo", "f":"a" }'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":2}
curl localhost:9200/foo/bar/1/_source
{ "name":"foo", "f":"a" }
@xstevens If you really need to this for searches, putting a varnish proxy (or something similar) front makes more sense.
@ejain Can you tell what the big difference of only having the source compared to having the source including the metadata is anyway in a search response? Maybe I didnt get your request completely right.
Well this wasn't really my request but it would work for what I want. I'm looking to remove the "took" variable from search results because that's what blows out an HTTP response cache. What I mean by that is, I end up with an entry per took="response time" even though the rest of the data stays the same.
Hey,
I am still not sure, if these are the right approaches to the problem, as I am still unsure about the problem. Maybe you can elaborate on what you want to do. If you simply want to cache the response, is it really important, if the took value is included in the response? I mean, does it matter? If an old took value is sent, because the search response is cached, what does this mean for you? Is that bad?
I am not sure, how your caching is working either. Is that configurable? Or do you simply cache the result of a certain request with a certain body? Maybe you can use the X-Unique-Id header for this (can be specified in the request and is included in the response as well), but I cannot really tell, until I understand your caching strategy (and why you are so focused on some fields :-)
I'm just trying to do basic HTTP response caching with no knowledge that's it is even ElasticSearch that I'm talking to. I'm using Apache HttpClient caching that comes built-in. The reason why the "took" field is a problem is because the caching mechanism is checking on whether the payload (search result in this case) has changed in the background. So it's invalidating the cache more often than it needs to. I can work around this of course by doing my own caching, but I was going to try to avoid that since HttpClient has some other nice checks around Cache-Control headers, etc. for services that give that kind of feedback.
As far as how HttpClient is detecting a payload change I believe their impelmentation is using SHA256(payload).
My use case is that I need to let users download their documents in bulk; this would be a lot more efficient if I didn't have to parse the response and strip out elasticsearch-specific properties.
@spinscale What version introduced _source? I get "No handler found for uri /index/type/NNN/_source" on 0.26.
This feature would be really useful for me as well (I'd like to be able to download documents in bulk and then update them in bulk without having to do surgery).
@therealdpk Judging by the commit/issue, the feature will be available in elasticsearch 1.0. Someone please correct me if I am wrong, but I am curious as well and I do not see it in the 0.90 branch.
@therealdpk it was introduced in 0.90.1
@brusic the issue you referred to is for more fine grained access control to the source without changing the data structure layout when requesting the data (which can happen in few cases)
Sorry for the misinformation. I assumed the _source param would be part of the normal RestGetAction.
Just a correction, the correct header is X-Opaque-Id, not X-Unique-Id:
curl -i -H "X-Opaque-Id: foobar" localhost:9200/_search | grep foobar
Is this feature implemented in the latest beta version?
Shouldn't the _source only option be a part of _search & _msearch similar to _get & _mget.
Given that this isn't a common use case, and can be solved easily on the application side (by extracting the hits only and sha'ing just those), we've decided against making any changes here.
Could this be opened again and reconsidered?
Solving this on the application side is not an option for us because then it's too late.
We have certain queries that only request small amount of data from each document, then in the whole 80-90% of the response is just metadata and is garbage to us and slows down the response times.
Being able to exclude the meta data would be awesome.
Take a look at J枚rg's plugin: https://github.com/jprante/elasticsearch-arrayformat
We're keen to provide a more generic solution to this problem, so I'm going to close this issue in favour of #7401
Most helpful comment
Could this be opened again and reconsidered?
Solving this on the application side is not an option for us because then it's too late.
We have certain queries that only request small amount of data from each document, then in the whole 80-90% of the response is just metadata and is garbage to us and slows down the response times.
Being able to exclude the meta data would be awesome.