Postgrest: Pagination of nested documents

Created on 14 Mar 2016 · 11Comments · Source: PostgREST/postgrest

There is currently no way to limit the number of nested documents returned by a GET. (The Range header only applies to the number of documents at the top level).

e.g. in the blog example: if a post has thousands of comments; I don't want to return them all at the top level; I might just want 10 posts and the top 5 comments from each post.

A solution was tangentially discussed as part of https://github.com/begriffs/postgrest/pull/446, but it was closed. So I'm opening a new issue for tracking the feature.

enhancement

Source

daurnimator

Most helpful comment

:+1: from me for implementing this feature.
In order to have everything in one place, i am going to paste here the things i said in the old issue:

While i personally do not immediately see the usefulness of being able to paginate the embedded collections, there are probably many use-cases for limiting the result set. Also being able to order the embedded stuff is probably very useful. I look at limiting/ordering as two features that are very similar when it comes to how they are implemented and how they are expressed in a http request so try to think about both of them at the same time.
The Range header is a great idea and it worked well while the thing being returned was "flat", but now i don't see how one can express with range headers the need to limit stuff for each node in the tree. There is also be this "disconnect" that the limit is expressed in a header and order as a parameter.

I propose that we switch to expressing order/limit/offset in GET parameters while also maintaining backward compatibility for expressing the limit with ranges for the root node of the response, like so

&limit=10&offset=20 --for the top node
&posts.comments.limit=10 -- for a deeper node
&order=.... -- for the top node
&posts.comments.order=.. -- for a deeper node

The semantics of the http request start to resemble SQL (select,limit,offset,order) so i don't think "page" support should be included, it can be modeled by the (js) client.

ruslantalpa on 14 Mar 2016

👍3

All 11 comments

:+1: from me for implementing this feature.
In order to have everything in one place, i am going to paste here the things i said in the old issue:

&limit=10&offset=20 --for the top node
&posts.comments.limit=10 -- for a deeper node
&order=.... -- for the top node
&posts.comments.order=.. -- for a deeper node

The semantics of the http request start to resemble SQL (select,limit,offset,order) so i don't think "page" support should be included, it can be modeled by the (js) client.

ruslantalpa on 14 Mar 2016

👍3

I have partially implemented this feature on my computer and i am looking for some feedback.
Just like the order feature i changed iRange in ApiRequest to be [(String,NonnegRange)].
This way i can specify the range for every level of the tree.
And you do it like so range=0-4&tasks.range=0-2
In addition, this allows to enforce maxRows on every level of the tree.
I have this question:
Range implements the concepts of limit/offset and it works for the top level but does it make sense to implement the offset concept (which comes embedded in range concept) for the embedded entities?
I don't see a way you could use offset to paginate the embedded entities since when you say
&tasks.range=10-20 you are actually paginating within the tasks of each project.
Not sure if i am making myself clear.
Another way to go about it is instead of range=0-5 to have limit=5 parameter. This way there is no confusion about the offset but in this case there is a bit of disconnect between the implementation for the top level and the lower levels (it has to be treated separately).

The best i can think of is this way:
Keep the range concept in the data type but allow specification of offset only for the top level of the tree (either with header of parameters), for example
projects?limit=2&offset=2&tasks.limit=5 (specifying tasks.offset would produce an error since there would not be a offset column)

ruslantalpa on 23 May 2016

this is the branch https://github.com/ruslantalpa/postgrest/tree/multilevel_limit

ruslantalpa on 23 May 2016

Good idea. Could you rise discussion in FHIR.org zulip about this?

niquola on 23 May 2016

Question- how do you know that you're going to get the most recent comments (or equivalent)when you do this? Isn't there no guarantee that you would do that? In which case the limit/offset feature would be kind of useless in many cases. Mightn't it be better to implement some form of subquery expressioning inside of embedded entities? Because as soon as you do the limit/offset bit, someone's going to want to be able to order, someone's going to want to be able to use a where-why not just support filtering with a subset of the same query parameters we use for top level items? I'm not sure if it's possible to just recursively apply the query filter to a string inside the brackets of the foreign-entity embed but if it is, that would be the best way to do it.

It would also make it more obvious what the limit and offset terms were referring to, as I think that the projects?select=*,tasks{*}&limit=2&offset=2&tasks.limit=5 is a bit awkward. It also strikes me as a bit clunky to have to go back to the top level url to find something about the embedded entity, seems like it would be much cleaner if all query parameters pertaining to the embedded entity were contained within the brackets of that entity. i.e. It would make more sense to do something like projects?select=*,tasks{select=*&limit=5&order=due_date.desc}&limit=2&offset=2 or, even better projects?select=*,tasks{select=*&limit=5&order=due_date.desc&overdue=eq.true}&limit=2&offset=2 the implementation could still require that the join be performed first and results filtered so that you wouldn't get unexpected result sets or badly performing queries.

One could still support simple lists of fields to select from the embedded entity so foreign_entity{field1,field2,field3} would be equivalent to foreign_entity{select=field1,field2,field3} but no other query parameters could be set if you used the simple list based approach.

Anyway, I am probably imagining that this is simpler than it really is, sorry that I don't know Haskell and can't help out with the actual coding. Hoping to learn it sometime so I can! (Also, this is the same David as @davidkuep, I just am using my personal github acct).

djk447 on 23 May 2016

@djk447 I think the proposed query strings don't make sense - every ampersand (&) ends the current parameter and begins a new one, on the top level. You can't group them by curly braces. I.e., your query string
select=*,tasks{select=*&limit=5&order=due_date.desc}&limit=2&offset=2
must be read like
select=*,tasks{select=*
limit=5
order=due_date.desc}
limit=2
offset=2

You would have to add a level of URI escaping (masking the characters & and =) for every level of subquerying to do this in a "correct" way, which would seem a bit clumsy to me.

SebAlbert on 23 May 2016

@djk447 about most recent comments.
how do you know you get them here
select * from comments limit 5 ... you don't, you need an order for the limit to make sense ... the same here, you need to also specify the order (which is already implemented)

the way of specifying parameters is a thing that evolved from the original code, you had the ability to query only one table and apply filters to it, so each filter being a separate query parameter made sense, only after that came &select parameter with the embedding and we needed to be backward compatible ... thus tree.path.to.column=... style.

Eventually it could evolve to be something closer to graphQL like so
&select=id,name,tasks(limit: 5, order="asc.name", id: "gt.10"){id, name} or something like that, but that's not in the immediate future

oh, and just reading your post again, i think you are not aware of all the features, you CAN filter/order the embedded items, here is an example

/projects?select=id,name,tasks{id,name,completed}&id=gt.2&order=asc.id&tasks.completed=false&tasks.order=asc.name

ruslantalpa on 23 May 2016

@ruslantalpa @SebAlbert The graphQL thing looks interesting and now totally see the problem you mentioned Seb, but seems like that would fix it. Clearly I don't know what's up with all the features there. It would help if it were mentioned in the docs! Glad that it exists, but would be nice if people knew about it! I'm happy to help update that if that's useful...even if my main contribution is continuing to ask stupid questions...

The thing that happens with the way were doing it now is that it's rather unclear, for instance, what level the filter would act on, ie would the last query return only projects with at least one task where completed=false or would it return all projects and only filter those tasks where completed=false, returning an empty task array for the others? Similarly for ordering (though in the general case it's relatively obvious) I could imagine people being confused if they have a foreign entity that returns a single value, would the top level output then be ordered by the single value? It could be that this basically becomes a feature eventually (especially for the where bit) so that you could select only the tasks where completed=false with the graphQL style syntax and you could select only the projects with at least one task where completed=false with the task.completed in the top level url approach.

djk447 on 24 May 2016

Contributing to the docs would be very much appreciated, it's a hard job that not everybody can do.
As for the query i mentioned it reads like this (it makes more sense if you think of the graphql way of expressing it), i'll write it both ways

/projects?
select=id,name,tasks{id,name,completed}
&id=gt.2
&order=asc.id
&tasks.completed=false
&tasks.order=asc.name

{
  projects(id: "gt.2", order: "asc.id"){
    id
    name
    tasks(completed: false, order: "asc.name"){
    id
    name
    completed
    }
  }
}

notice in graphql projects(...) and tasks(...) look like a function call, that's not by accident, think of them as functions taking parameters and returning something (lists of projects/tasks). tasks(...) function will be called for each of the projects returned by the projects(...) call.
So the query reads like this
Give me all the projects with id greater then 2 and order them by id
For each project i am interested only in id and name fields and for each of those projects (which match the above condition) give me the list of tasks (for that particular project) that are not yet completed and order them by name.
If you imagine the system doing a single step/query at a time, things would look like this

select id, name from projects where id > 2 order by id;
-- result is projects with id (3,4,5)
-- then or each of those projects we call this
select id, name, completed from tasks where project_id = 3 and completed = false order by name;
select id, name, completed from tasks where project_id = 4 and completed = false order by name;
select id, name, completed from tasks where project_id = 5 and completed = false order by name;
-- then we gather all the results and created the response json

ruslantalpa on 24 May 2016

So the query is something like: SELECT p.id, p.name, json_agg(to_json(t.id, t.name, t.completed) )FROM projects p LEFT JOIN LATERAL (SELECT id, name,completed FROM tasks ts WHERE ts.project_id=p.id AND ts.completed=false ORDER BY ts.name) t ON true WHERE p.id>2 GROUP BY p.id,p.name ORDER BY p.id;

(without the last rollup of the jsons)

Just mentioned in the gitter that it would be nice to have a header to return the query postgrest constructs for a call.

And I will gladly contribute to the docs-is there a place where someone's listed some of the features that aren't in there...this one would definitely be useful but would probably be good to have a single place for things that need to be added not sure if I'm missing where that already is, or what.

djk447 on 24 May 2016

This is now implemented on the master branch. Can be closed

ruslantalpa on 1 Jun 2016

Was this page helpful?

0 / 5 - 0 ratings