Kedro: How to work with long-lived refresh tokens and short-lived access tokens?

Created on 26 May 2020 · 3Comments · Source: quantumblacklabs/kedro

Description

In my scenario I have a service that works with short-lived access tokens and long-lived refresh tokens. So I need to use the long-lived refresh token to get a short-lived token. Than I can get the data with the short-lived token. When short-lived expires I need to get a new one.

I saw this example in the docs where I can pass arbitrary number of parameters to the underlying library (requests):

us_corn_yield_data:
  type: api.APIDataSet
  url: https://quickstats.nass.usda.gov
  params:
    key: SOME_TOKEN
    format: JSON
    commodity_desc: CORN
    statisticcat_des: YIELD
    agg_level_desc: STATE
    year: 2000

But in this case the SOME_TOKEN is the short-lived token for getting the data from the API. But what I want is to check if the short-lived token is still valid and if not request a new one with the long-lived token.

The question is: Is there a built-in support for such a use case in api.APIDataSet or in any other kedro Datasets?

Thank you!

Question

Source

f-istvan

Most helpful comment

Hi @f-istvan:

If you really want to do this at a dataset level, you can use Transformers to wrap _load and _save method with custom logic. However, since you are already using a custom dataset, why can't you update the watermark inside the _save method? Where is this watermark being stored?
If you want to do it at the node level, in 0.16, we introduce the concept of Hooks, which allows you to add custom logic to lifecycle touch-points in the pipeline. So you can also use after_node_run Hook to add your custom logic after the third node runs. One of our users created this guide on actually using Hook for some callback functionality: https://www.youtube.com/watch?v=QVEgdJnUUsQ -- you might find it helpful.
Lastly, it seems like the problem you are trying to solve is a problem of incrementally loading your data, so you might also want to check out our IncrementalDataSet

In any case, I'm glad your original question was solved. Let me close this issue and if you still need help with the second question, please open another issue. This is to help us track progress on which questions still need answering.

limdauto on 27 May 2020

👍3

All 3 comments

You can use the TemplatedConfigLoader to solve this issue. When you create your config loader, you can call the utility function that has the refreshing logic for the short-lived access token and add it to the global_dict parameter of the TemplatedConfigLoader.

Alternatively, if the utility you use to refresh your access token isn't a Python utility, e.g. a binary, you can save your access token to an environment variable and follow the same approach with TemplatedConfigLoader as described above.

The rest of your pipeline code doesn't need to change.

limdauto on 26 May 2020

Hi @limdauto,

This answers my question, thanks a lot.

Is it also possible to yield values from a dataset to an external service? Here is my use case for this:

I have 3 datasets with 3 nodes in my pipeline:

The first dataset gets raw data from an external API and save is locally. This uses a watermark (a timestamp) to get only the relevant data from the API. I can insert this watermark to the dataset externally with TemplatedConfigLoader and that's great.
The second node cleans the raw data and saves it locally.
The last dataset reads the local cleaned data and inserts it to a mongodb. This is a custom dataset. After a successful db insert I want to update the watermark. So when the _save method is done with the db insert I would like to call a callback function or something similar to update an external variable.

In general: Is it possible to change an external variable from a dataset?

Thank you so much!

f-istvan on 27 May 2020

Hi @f-istvan:

If you really want to do this at a dataset level, you can use Transformers to wrap _load and _save method with custom logic. However, since you are already using a custom dataset, why can't you update the watermark inside the _save method? Where is this watermark being stored?
If you want to do it at the node level, in 0.16, we introduce the concept of Hooks, which allows you to add custom logic to lifecycle touch-points in the pipeline. So you can also use after_node_run Hook to add your custom logic after the third node runs. One of our users created this guide on actually using Hook for some callback functionality: https://www.youtube.com/watch?v=QVEgdJnUUsQ -- you might find it helpful.
Lastly, it seems like the problem you are trying to solve is a problem of incrementally loading your data, so you might also want to check out our IncrementalDataSet