Machinelearningnotebooks: PipelineData: name vs output_name

Created on 6 Sep 2019  路  6Comments  路  Source: Azure/MachineLearningNotebooks

Run environment is set up wrongly when PipelineData.name and PipelineData.output_name are not the same.

E.g. the pipeline definition:

inp_data = data_store.path('hello_world/greeting.txt', data_reference_name='helloworld_data')
out_data = PipelineData('helloworld_output', 
    datastore=data_store, 
    output_name='pipeline_output'
)
step = PythonScriptStep(
    script_name = 'runner.py',
    source_directory='helloworld',
    arguments=[inp_data, out_data], 
    inputs=[inp_data],
    outputs=[out_data],
    compute_target=workspace.compute_targets['cpucluster']
)

generates an environment where data reference variable name contains helloworld_output, the name of the PipelineData, while the script argument references variable, containing pipeline_output which is an output_name of the PipelineData:

$AZUREML_DATAREFERENCE_helloworld_data=/mnt/batch/tasks/.../hello_world/greeting.txt
$AZUREML_DATAREFERENCE_pipeline_output=/mnt/batch/tasks/.../pipeline_output
$AZ_BATCHAI_TASKLET_CMD=... "runner.py" "$AZUREML_DATAREFERENCE_helloworld_data" "$AZUREML_DATAREFERENCE_helloworld_output"

The result is that the second argument is an empty string instead of proper data reference.

I'm using azureml-core ver.1.0.60

Data4ML product-issue

All 6 comments

@MayMSFT looks like you might be able to help out here!

We shall be able to improve ths users experience with Dataset & Pipeline Integration to allow users specify the environment variable name, targeting 9.30
In the mean time, I will ping pipeline PM to look into this Pipeline Data issue

This is my first exposure to the pipeline_output_name parameter, though I could hypothesize about how it might be helpful for our pipeline scenarios.

But you're right that when you pass it pipeline_output it seems that it is interpreting it as a path on the datastore. @sanpil @yanrez is this the expected behavior? what is the use case for this param? are there examples used anywhere?

Seems like a bug. We will investigate.

We have fixed the bug for the environment setup with PipelineData.name and PipelineData.output_name specified and the fix should be included in the next release.

Thank you for your post. It looks as though this issue was resolved so we closed this thread. Should you have additional questions, please continue to post here and and we will gladly continue the discussion.

Was this page helpful?
0 / 5 - 0 ratings