Python-slack-sdk: [Discussion] Automatic API request pagination in Python 2 and 3

Created on 21 May 2018  Â·  7Comments  Â·  Source: slackapi/python-slack-sdk

Description

More and more Slack API endpoints are supporting pagination. As such, I'd like to add pagination support to the API request handler in this library.

Over the last week and a half, I’ve been thinking _a lot_ about how we would support pagination for API call in the this client which won’t cause the client to lock up the host application while it traverses the cursors on large workspaces…

In Python 3, we can spin it off as an async process, in 2.7 we _could_ spin off a thread, but is that a good practice? I’m debating only supporting auto-pagination in 2.x of slackclient, which may be Python 3 only, so we can use 3's async magic.

What type of issue is this? (place an x in one of the [ ])

  • [x] discussion

Most helpful comment

I think iterating over the pages of the list methods as long as no pagination related arguments were explicitly passed in is the way to go.


I don't think we should be concerned with how long the method takes to complete. The alternative is for either (a) the response to return an error related to the timeout, or (b) the app developer to do the exact same iteration, and possibly terminate earlier than the libraries iteration in the case they were just calling list to do a search. In (a), I think that taking a longer time to return is a clear win. In (b), as long as we ignore iteration when the developer passes in pagination-related arguments, the outcome is identical.

We may be over-thinking it when it comes to Python on concurrency. Python has a global interpreter lock (GIL), and this is not news for developers. If a sophisticated developer chose to optimize their application by handling long-running API methods in the background, they are still capable of doing so and library will be just as helpful. They would just create the thread on their own, call the method, and join the thread back to the main thread - all the same code we could write, minus creating the thread. This at least gives the sophisticated developer a chance to use whichever concurrency/threading/event loop model they wanted.

All 7 comments

Personally, I think putting the control when pagination happens into the hands of developers using the library is best. We should encourage best practices by setting limit automatically so that large data sets return next_cursor. We can even make the interface to take that cursor and use it in a subsequent request simple. But to automatically following paginated collections seems overboard. Often a developers intent can be just to get the first page of a set, to paginate only until xyz record is found, etc.

I think iterating over the pages of the list methods as long as no pagination related arguments were explicitly passed in is the way to go.


I don't think we should be concerned with how long the method takes to complete. The alternative is for either (a) the response to return an error related to the timeout, or (b) the app developer to do the exact same iteration, and possibly terminate earlier than the libraries iteration in the case they were just calling list to do a search. In (a), I think that taking a longer time to return is a clear win. In (b), as long as we ignore iteration when the developer passes in pagination-related arguments, the outcome is identical.

We may be over-thinking it when it comes to Python on concurrency. Python has a global interpreter lock (GIL), and this is not news for developers. If a sophisticated developer chose to optimize their application by handling long-running API methods in the background, they are still capable of doing so and library will be just as helpful. They would just create the thread on their own, call the method, and join the thread back to the main thread - all the same code we could write, minus creating the thread. This at least gives the sophisticated developer a chance to use whichever concurrency/threading/event loop model they wanted.

I wonder whether using generators might be an easy, but not completely transparent way to handle pagination in a way that still gives developers lots of flexibility and can stay efficient. These generators would yield pages of data.

This would also address the concerns of @episod above:

Often a developers intent can be just to get the first page of a set, to paginate only until xyz record is found

For example:

def find_interesting_user():
  # Only need to page through as many users as it takes to find the user we're looking for.
  for page_num, users in enumerate(sc.api_call("users.list")):
    update_indeterminate_progress_bar(page_num)
    for user in users:
      if is_interesting(user):
        return user

Or more simply, but still just as efficient in terms of not making more API calls than needed:

import itertools

def find_interesting_user():
  for user in itertools.chain.from_iterable(sc.api_call("users.list")):
    if is_interesting(user):
      return user

@wiseman I was thinking something similar. If a workspace has 10,000 users, there's no reason we should traverse the entire user set if the user's in the first result set, but I wasn't sure how we'd implement it.

We'll have some time to play around with this soon.

Similar to the suggestion by @wiseman we've implemented pagination via generators with the SlackResponse object in v2. I'm closing this issue as I have no intentions on backporting this to v1.

@RodneyU215 do you have some example code you can share where this pagination implementation is being used?

@alanwill there is some example code in the docstring of the SlackResponse object. Here it is: https://github.com/slackapi/python-slackclient/blob/ff073cf74994adc6022e8296e702012ef5b662b4/slack/web/slack_response.py#L24-L41

If you run into any issues with using it, feel free to open a new issue.

Was this page helpful?
0 / 5 - 0 ratings