Got: Pagination support

Created on 3 Feb 2019  ·  20Comments  ·  Source: sindresorhus/got


Issuehunt badges

What would you like to discuss?

With the recent enhancements such as create/extend and hooks, is there any clean way to implement pagination with the current approach?

Note: there exists a discussion in gh-got already but I want to discuss it generically here.

Checklist

  • [x] I have read the documentation.




IssueHunt Summary

szmarczak szmarczak has been rewarded.

Backers (Total: $80.00)

Submitted pull Requests

- #833 Experimental pagination feature

Tips

IssueHunt has been backed by the following sponsors. Become a sponsor


Rewarded on Issuehunt enhancement ✭ help wanted ✭

Most helpful comment

@issuehunt has funded $80.00 to this issue.


All 20 comments

What about:

const options = {
    hooks: {
        pagination: (response, options) => {
            if (!response) {
                // First request
                options.searchParams = {page: 0};
            }

            options.searchParams.page++;
        }
    }
};

for await (const response of got.paginate(url, options)) {
    // ...
}

Shouldn't the hook be called beforePaginate for consistency? And I think the the signature should be (options, isFirstPage) => {}, unless you have other use-cases for the previous response? Should it have some built-in automatic behavior for the Link header?

Shouldn't the hook be called beforePaginate for consistency?

I have no idea. Maybe it should be an option (paginate)... And the hook should be called beforePaginate. I think that way would be better.

unless you have other use-cases for the previous response

The response could give a link to the next page.

Should it have some built-in automatic behavior for the Link header?

Sure, that would be nice.

The response could give a link to the next page.

It should at least be called previousResponse, not response.

I have no idea. Maybe it should be an option (paginate)... And the hook should be called beforePaginate. I think that way would be better.

Yeah, maybe drop the hook aspect from it altogether? I wonder if we could make it got(…, {paginate: 'link'}) for automatic Link-header support and got(…, {paginate: (options, previousResponse) => {}}) for custom handling.

That sounds good. Let's do it that way :)

Some other use-cases:

  1. I want the first 4 pages of a request. Would be nice if we could make got() work normally in that case and just gather up the first 4 pages for us instead of requiring the verbose async iterator coe.
  2. I want to get new pages until I say I'm done. For example, I want to get pages, filter results, and end it when I have enough filtered items. Would be nice if this too was possible without the verbose async iterator syntax.

I would also include a currentPage argument in the paginate method with the page number.

  1. I want the first 4 pages of a request. Would be nice if we could make got() work normally in that case and just gather up the first 4 pages for us instead of requiring the verbose async iterator coe.
const arrayOfResponses = await got.pagination(url, {
    ...options,
    pagination: {
        paginate: (previousResponse, options) => {
            // By default this would be the Link-header thing...
        },
        from: 1, // Default = 1
        to: 4 // If specified, it will return an array.
    }
});

I'm not sure if it should be implemented. You can achieve the same in a few lines of code:

const arrayOfResponses = await Promise.all([
    got(url, {searchParams: {page: 1}}),
    got(url, {searchParams: {page: 2}}),
    got(url, {searchParams: {page: 3}}),
    got(url, {searchParams: {page: 4}})
]);
  1. I want to get new pages until I say I'm done. For example, I want to get pages, filter results, and end it when I have enough filtered items. Would be nice if this too was possible without the verbose async iterator syntax.

Not possible. The Got logic is async.

For my primary use case, I simply want all paginated results with a single got command, e.g. something like await got(url, { paginate: true }).

For my secondary use case, I would like to add a limit to how many results are returned, e.g. maximum 1000. If you prefer that the caller calculate this instead using number of pages returned, it's also OK. But I think users usually think in terms of number of results, not number of pages. The number of returned entries per page is usually an endpoint restriction and users ultimately care about total entries, not total pages. The only downside might be if the number of entries per page is not divisible by the total and hence you need to crop the results.

The number of returned entries per page is usually an endpoint restriction and users ultimately care about total entries, not total pages.

👍 This is my experience too. I take back wanting to specify pages. What I actually want is to be able to specify item count.

I'm not sure if it should be implemented. You can achieve the same in a few lines of code:

Sure, but you can achieve pagination in Got today too, it's just verbose. The point of this issue is figuring out the most common use-case, and making those super easy, while still giving user's the power to do custom logic.

  1. I want to get new pages until I say I'm done. For example, I want to get pages, filter results, and end it when I have enough filtered items. Would be nice if this too was possible without the verbose async iterator syntax.

Not possible. The Got logic is async.

I don't understand this statement. Got being async is exactly what makes this possible. Got would handle getting the required pages by using got.paginate internally and then gather up the results for me, so I only had to call:

got(…, {
    pagination: {
        itemCount: 100,
        filter: item => item.isUnicorn
    }
});

In the above example, Got would fetch items and paginate until it had gathered 100 items with item.isUnicorn === true.

Can you elaborate on:

without the verbose async iterator syntax

What do you mean specifically?

What do you mean specifically?

Instead of:

const results = [];
for await (const response of got.paginate(…)) {
    results.concat(response.items);
}

I could just do something like this:

const results = await got(…, {
    pagination: true
});

And it would gather all the items for me.

Oh. I thought that by without the verbose async iterator syntax you meant the sync iterator :P

That just returns an array, right?

That just returns an array, right?

Yes

@issuehunt has funded $80.00 to this issue.


Unfortunately my use cases typically revolve around querying slow services, so I benefit from processing each paged result as they're received instead of waiting for got to aggregate all pages together into a single result.

I strongly agree with the switch to thinking about results in terms of items and not pages. All I care about is receiving all results that match my URL query. Whether the server uses pagination or not, doesn't really matter to me.

@hutson We could have a transform callback or something so you could do processing. I think this is a common need.

@sindresorhus has rewarded $72.00 to @szmarczak. See it on IssueHunt

  • :moneybag: Total deposit: $80.00
  • :tada: Repository reward(0%): $0.00
  • :wrench: Service fee(10%): $8.00
Was this page helpful?
0 / 5 - 0 ratings

Related issues

jamestalmage picture jamestalmage  ·  3Comments

khizarsonu picture khizarsonu  ·  3Comments

alanzhaonys picture alanzhaonys  ·  4Comments

dominusmars picture dominusmars  ·  3Comments

carvallegro picture carvallegro  ·  4Comments