Hi
I am querying a GSI index in Dynamo where I have a HashKey and RangeKey defined, I would like to limit the results to a small number (let's say 5) and I need to add a FilterExpression so that when the user queries this index he does not get his own records.
The problem I am facing is that it looks like the limit gets executed before either the queryFilter or the filterExpression. So, basically if one out of the 5 records happens to be affected by the queryFilter or the filterExpression it gets discarded and I end up getting less than 5 elements when looking at the getResults method.
Is this the expected behaviour from the library? A DynamoDB restriction? Is there any way to have the filter executed after the limit (apart from obviously checking the size and querying again or add a higher limit)?
Any help would be much appreciated
Hi @ricardclau,
Yes, IIUC this is the expected behavior from the library. The service API currently only exposes a single "Limit" parameter that can be specified to restrict the size of the results. See the "Limit" section of:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
In particular, the Limit is always applied before filtering on the server side, not after.
Hope this helps.
One quick way to handle this is to use query() instead of queryPage() and only read the first 5 items from the returned list. query() handles pagination for you, so it'll automatically make another call to DynamoDB if needed to fetch the additional records.
// Limit of 5 is passed through to the service on each query; we might get fewer than 5 results though.
List<MyObject> objects = mapper.query(query.withLimit(5));
for (int i = 0; i < 5; ++i) {
// get() will make another call to DynamoDB if the first page of results didn't include enough
// results.
MyObject obj = objects.get(i);
...
}
Thanks for your replies @hansonchar & @david-at-aws
Yes, this is what I suspected, I had some minimal hope that perhaps queryPage would be slightly different than query in terms of the filter applying but this is not the case :)
Regarding using query and then looping, please correct me if I am wrong but query would usually get more elements than my needed 5 (and this is why queryPage was introduced in the API), so it would increase the read throughput consumption, right?
I have done some further research activating full debug logs for org.apache.http.wire and this is what I found (again please comments and correction very much appreciated).
Would a try / catch for that situation be the best way to control the edge case or is there a cleaner way of doing it without having to full scan?
And related to all this, what would be the difference / advantages of queryPage over query? The PaginatedQueryList seems to be much smarter although you need to be careful not scanning the full table. Is that the main reason for them both to exist?
Anyway, I strongly recommend everyone activating this debug log to see what is actually going on vs the app and dynamo, it is very enlightening and helps making sense of everything :)
As you've discovered with the wire logs (which I agree are a great way to understand what's going on under the hood!), PaginatedQueryList is lazy - it'll (usually) make individual Query calls to DynamoDB one at a time as you ask it for more data. size is the exception - it needs to keep calling Query until it has seen all of the query results to report an accurate size.
If there may be fewer than 5 total results and you want to avoid the try/catch, you can use an Iterator, whose hasNext method lets you test whether you're at the end of the list without having to get the size of the list up front:
Iterator<MyObject> iterator = mapper.query(query.withLimit(5)).iterator();
for (int i = 0; iterator.hasNext() && i < 5; ++i) {
MyObject object = iterator.next();
...
}
Yep, the main advantage of scanPage/queryPage is that it's harder to shoot yourself in the foot by accidentally performing a query that consumes a LOT of capacity. With scanPage/queryPage, the 'Limit' parameter sets a hard limit on the maximum amount of capacity that the call will consume, and you have to explicitly write the loop to make multiple calls if you want to use more capacity to get more results. Query/scan are easier to use since they handle this pagination for you automatically, but you may end up accidentally consuming a lot more capacity than you meant to if you don't think through the implications.
Thank you very much @david-at-aws for the confirmations and detailed explanations!
I think we can close the issue and hopefully this thread will be useful for anyone having similar doubts like the ones I had.
Thanks!
@david-at-aws Hi i am using the new aws sdk for PHP. I have the same problem here. I want to apply QueryFilter with query and limit. You said for java sdk we can use query() for this instead of queryPage().
In php aws sdk we have only query() method and it doesn't recursively called if the result set is less than the limit. How can do this in php sdk.
great deep dive. Thanks
Most helpful comment
Thank you very much @david-at-aws for the confirmations and detailed explanations!
I think we can close the issue and hopefully this thread will be useful for anyone having similar doubts like the ones I had.
Thanks!