Hopefully this is an easy one.
I'm scanning & scrolling as such:
var scanResults = esClient.Search<DeliveryEvent>(s => s
.From(0)
.Size(10)
.MatchAll()
.Query(q => q.Term(f => f.UserID, user)
&& q.Range(r => r.OnField(f => f.ReceivedAt).From("2013-03-06T00:00:00.000-05:00").To("2013-03-25T00:00:00.000-05:00")))
.SearchType(Nest.SearchType.Scan)
.Scroll("60m")
);
var results = esClient.Scroll<DeliveryEvent>("1m", scanResults.ScrollId);
while (results.Documents.Any())
{
results = esClient.Scroll<DeliveryEvent>("1m", results.ScrollId);
do some more stuff
}
}
I'd like to scroll by a larger number than 10 but if I set the number high, the first scroll ( var results = esClient.Scroll
Thx!
Hey JP,
As you know i was out of the country so sorry for not replying sooner. Philly did not like me one bit, delays going and coming during my layovers :-1:
The scroll type will return SIZE per shard so if you have more shards you might get actually more results back then the SIZE specified, The default tests in NEST run on 1 shard and no replicas so the counts are a bit more sane. Is this where the missing documents went?
@jptoto any update on this?
Hey Martijn!
Sorry this took so damn long. Bummer that you had a bad experience in Philly! Our airport is not the best :-/
I think I have a bad understanding of how scan and scroll works but I'll give it my best shot. In our index, we have 10 shards. I THINK it's returning the correct size when I specify.
I just realized my problem. I moved the "results = esClient.Scroll
@jptoto
I have few documents in a folder and I want to check if all the documents in this folder are indexed or not. To do so, for each document name in the folder, I would like to run through a loop for the documents indexed in ES and compare. So I want to retrieve all the documents which are indexed already on to ES server.
To do so , I'm trying to use scroll API. (I'm working on NEST client)
Once I get a scrollId, how do I run loop through search query using the scrollId until I get all the documents?
for eg: var response = client.Search<Document>(s=>s.Scroll("2m")); --> Retrieves only 10 documents
from this response, I got response.scrollId: "Ascsjdbgkjabgkakjfgsadhvkjag"
How do I proceed from here? I just want to add all the filenames of the documents indexed in a list.
TIA
@nasreekar you should be able to use something like the code below. This way you can loop through all the results from the original scan and eventually check all the documents. I used "Model" as the type and Property1, Property2 as pretend properties for the example. You should change those to whatever your object is.
var scanResults = esClient.Search<Model>(s => s
.From(0)
.Size(2000)
.MatchAll()
.Query(q => q.Term(f => f.Property1, "1")
&& q.Term(k => k.Property2, "0")
.SearchType(Nest.SearchType.Scan)
.Scroll("90m")
);
var results = esClient.Scroll<Model>("20m", scanResults.ScrollId);
while (results.Documents.Any())
{
foreach (var doc in results.Documents)
{
// Do something here in the loop
}
results = esClient.Scroll<Model>("20m", results.ScrollId);
}
@jptoto Thanks a ton JP !! Its working. But SearchType(Nest.SearchType.Scan) doesnt seems to be working. I had to use SearchType(Elasticsearch.Net.SearchType.Scan).
@nasreekar Glad it's working! That cold was a bit old so perhaps the namespaces are a bit different now. Glad I could help.
@jptoto To delete the scrolls, after the work is done, can I just add all the scrollId's in the list and delete them at a time or is there any other optimized solution for this?
They should expire on their own.
@jptoto From this link (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-search-context)
Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, more file handlers are required, so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:
That's why asked this question.
I haven't used the clear scroll API before. As long as the file descriptors are set high on your ES servers, and they should be, It shouldn't be a problem. You can also set the scrolls to expire sooner. In the example I gave it's set to 20 minutes.
@nasreekar - Would you mind using either discuss or stackoverflow for questions, and keep github issues for bugs, missing features and issues with the client; it'll help others who may have similar questions :+1:
@russcam Sure Russ. I actually saw JP's take on this issue scroll and as he already answered it, I asked here first. But then I thought its better to ask in SO as you people might be busy. so posted it there too. Anyways from now on I will keep it to SO or discuss. Sorry for it and thanks again. :+1: