Prefect: Option to include item string in mapped task name

Created on 28 Feb 2020  路  10Comments  路  Source: PrefectHQ/prefect

Use Case

Minor, non-urgent feature request: mapped tasks are currently referenced in logs and the Cloud UI using their numeric index, e.g. my_task[0], my_task[1]. These numeric values aren't particularly meaningful and, in some cases, the string value of the mapped item would be more useful during log examination or when viewing Flow runs in the Cloud UI.

Solution

Having an opt-in way to enable use of the string version of a mapped item in the task name would be helpful, e.g.

    @task(mapped_item_in_task_name=True)
    def my_task(item):
        ...

    my_task.map(["apple", "banana", "orange"])

Which would yield task names of my_task[apple], my_task[banana], and my_task[orange] rather than my_task[0], my_task[1]. and my_task[2]

For tasks mapping over multiple parameters, this option could simply choose to use the first parameter. Elements with long string representations could be truncated to a reasonable size for display purposes. This should definitely be opt-in to avoid logging sensitive data.

A better implementation might be to pass an iterable/list to be used for the task names (as a specific kwarg) which might or might not match the parameter to map over (but would need to have the same length):

    fruit = {
        0: "apple",
        1: "banana",
        15: "orange"
    }
    my_task.map(fruit.keys(), mapped_task_names=fruit.values())`

This would also yield task names of my_task[apple], my_task[banana], and my_task[orange].

Alternatives

None available currently.

feature request

Most helpful comment

Hurray!

All 10 comments

This is an interesting enhancement that I want to think about; it might help solve some other enhancements (though its inclusion would require a change to Cloud, which currently types the map_index strictly as an int)

We also need to carefully review where (if anywhere) we have implicitly depended on ordered, numerical indices.

My second-step proposal would actually be to generalize map_index to a more broad key or dynamic_key that could be a stringified int by default, or taken from the item value (or a hash of the item value). The important thing is that on retries this key must be stable, so if the item value can change users shouldn't depend on it as the key (for example, if the value contained a timestamp it or its hash would change on retries, and Core would fail to recover the original state). We would use this key to track a variety of dynamic tasks, including mapping today and some... other... things in the future ;)

Alternative proposal that might be lighter and achieve your purpose: mapped tasks can supply a callable function that returns a "display key" - the backend continues to use ints for ordered simplicity, but you can provide a user-defined display key. That replaces the [0], [1] for display purposes only.

This would be far easier to implement.

The "callable function to return a display key" approach should work well. I don't see any reason to pursue other options mentioned. Thanks!

Just recording another +1 for this feature from another Cloud customer

This feature would be very useful in a pipeline I am working on.

I need to download a set of files provided by an API. Each file is named after a dimension column in the data but the URL of each file is different depending on the date range it contains and other factors.

Sometimes the mapped task fails but it's not immediately obvious which file failed to download.

It would be useful to be able to identify the mapped task with something more informative than the index.

Another use case is a pipeline that gets data for a list of mobile apps. If each mapped task could be named after the app ID it deals with, the UI would be more helpful when something goes wrong.

+1 for this feature, I have a use case where there are multiple classes with the same interface but different functionality, it would be great if I was able to see the status and time of each class when mapping over them.

+1. I am reading tables from a relational db. Mapping over a list of tables. It would be great to know which table failed.

+10. I'm executing a bunch of (60+) SQL-s and scripts from several directories. It's a pain to find out, that the 47th SQL have gone bad.

Update in this comment I posted on the draft PR https://github.com/PrefectHQ/prefect/pull/2974#issuecomment-660339280

Hurray!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GZangl picture GZangl  路  3Comments

petermorrow picture petermorrow  路  3Comments

cicdw picture cicdw  路  4Comments

dkapitan picture dkapitan  路  3Comments

jlowin picture jlowin  路  4Comments