Elasticsearch: URI Request that returns just the _source, without metadata

Created on 8 Aug 2012 · 18Comments · Source: elastic/elasticsearch

'http://localhost:9200/twitter/tweet/_search?q=user:kimchy' returns:

{
    "_shards":{
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1", 
                "_source" : {
                    "user" : "kimchy",
                    "postDate" : "2009-11-15T14:12:12",
                    "message" : "trying out Elastic Search"
                }
            }
        ]
    }
}

But sometimes it would be more useful to get a plain "dump" of the _source data instead:

{
    ...
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "user" : "kimchy",
                "postDate" : "2009-11-15T14:12:12",
                "message" : "trying out Elastic Search"
            }
        ]
    }
}

discuss

Source

ejain

❤3 👍1

Most helpful comment

Could this be opened again and reconsidered?

Solving this on the application side is not an option for us because then it's too late.
We have certain queries that only request small amount of data from each document, then in the whole 80-90% of the response is just metadata and is garbage to us and slows down the response times.

Being able to exclude the meta data would be awesome.

kuseman on 13 Aug 2014

❤4

All 18 comments

This would be really useful to have. In my case I'm trying to do HTTP response caching but "took" in the results obviously can change on each query even though the results are the same.

xstevens on 20 Jun 2013

Hey,

you can do this with the current elasticsearch release for single documents (but not for searches)

curl -X PUT localhost:9200/foo/bar/1 -d '{ "name":"foo", "f":"a" }'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":2}                                                                                                                                          

curl localhost:9200/foo/bar/1/_source
{ "name":"foo", "f":"a" }

@xstevens If you really need to this for searches, putting a varnish proxy (or something similar) front makes more sense.

@ejain Can you tell what the big difference of only having the source compared to having the source including the metadata is anyway in a search response? Maybe I didnt get your request completely right.

spinscale on 24 Jun 2013

Well this wasn't really my request but it would work for what I want. I'm looking to remove the "took" variable from search results because that's what blows out an HTTP response cache. What I mean by that is, I end up with an entry per took="response time" even though the rest of the data stays the same.

xstevens on 24 Jun 2013

Hey,

I am still not sure, if these are the right approaches to the problem, as I am still unsure about the problem. Maybe you can elaborate on what you want to do. If you simply want to cache the response, is it really important, if the took value is included in the response? I mean, does it matter? If an old took value is sent, because the search response is cached, what does this mean for you? Is that bad?

I am not sure, how your caching is working either. Is that configurable? Or do you simply cache the result of a certain request with a certain body? Maybe you can use the X-Unique-Id header for this (can be specified in the request and is included in the response as well), but I cannot really tell, until I understand your caching strategy (and why you are so focused on some fields :-)

spinscale on 24 Jun 2013

I'm just trying to do basic HTTP response caching with no knowledge that's it is even ElasticSearch that I'm talking to. I'm using Apache HttpClient caching that comes built-in. The reason why the "took" field is a problem is because the caching mechanism is checking on whether the payload (search result in this case) has changed in the background. So it's invalidating the cache more often than it needs to. I can work around this of course by doing my own caching, but I was going to try to avoid that since HttpClient has some other nice checks around Cache-Control headers, etc. for services that give that kind of feedback.

xstevens on 25 Jun 2013

As far as how HttpClient is detecting a payload change I believe their impelmentation is using SHA256(payload).

xstevens on 25 Jun 2013

My use case is that I need to let users download their documents in bulk; this would be a lot more efficient if I didn't have to parse the response and strip out elasticsearch-specific properties.

ejain on 25 Jun 2013

@spinscale What version introduced _source? I get "No handler found for uri /index/type/NNN/_source" on 0.26.

This feature would be really useful for me as well (I'd like to be able to download documents in bulk and then update them in bulk without having to do surgery).

dpkirchner on 14 Oct 2013

@therealdpk Judging by the commit/issue, the feature will be available in elasticsearch 1.0. Someone please correct me if I am wrong, but I am curious as well and I do not see it in the 0.90 branch.

https://github.com/elasticsearch/elasticsearch/issues/3301

brusic on 14 Oct 2013

@therealdpk it was introduced in 0.90.1

@brusic the issue you referred to is for more fine grained access control to the source without changing the data structure layout when requesting the data (which can happen in few cases)

https://github.com/elasticsearch/elasticsearch/blob/0.90/src/main/java/org/elasticsearch/rest/action/get/RestGetSourceAction.java

spinscale on 15 Oct 2013

Sorry for the misinformation. I assumed the _source param would be part of the normal RestGetAction.

brusic on 15 Oct 2013

Just a correction, the correct header is X-Opaque-Id, not X-Unique-Id:

curl -i -H "X-Opaque-Id: foobar" localhost:9200/_search | grep foobar

karmi on 23 Oct 2013

Is this feature implemented in the latest beta version?
Shouldn't the _source only option be a part of _search & _msearch similar to _get & _mget.

abhijitiitr on 15 Dec 2013

Given that this isn't a common use case, and can be solved easily on the application side (by extracting the hits only and sha'ing just those), we've decided against making any changes here.