Machinelearningnotebooks: Azure Machine Learning- Triggered Pipeline does not Execute the Python Script

Created on 26 Mar 2019 · 4Comments · Source: Azure/MachineLearningNotebooks

Hi @rastala, @hning86 ,

I am blocked with Azure ML pipeline execution, not sure what is wrong. pipeline does not refresh/register the model when triggered via REST endpoint or Azure portal. I tried looking into the container but no progress yet.

Environment Details,

Python 3.6
Compute - AMLCompute [STANDARD_D2_V2]
Conda packages - scikit-learn, pandas
Pip packages- azureml-sdk[notebooks], azureml-train-automl

Case Steps,

Configure the Environment and Azure ML Pipeline as per the Github sample(starter) but with a single python script( part of pipeline step)
Execute the Pipeline using Jupyter
Step 2 execution creates a trained model and publishes the pipeline
Execute the pipeline using provided REST endpoint
Step 4 completes in 3-5 seconds without updating the Model.
Activity run contains logs from the first run only

Expected Behavior - Triggering the pipeline via the endpoint should execute the custom python script(Train.py) to register/refresh the model in azure ML workspace

Observed Behavior - Triggering the pipeline via the endpoint executes the pipeline but does not register/refresh the Model in azure ML workspace

Appreciate your help,
Thank You

Source

AakanchJoshi

Most helpful comment

I'd suspect you hit automatic reuse where orchestrator doesn't re-run step if it same parameters and same input path.
One way to control this is parameter allow_reuse as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py
Another way to do it on pipeline level is parameter regenerate_outputs as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline%28class%29?view=azure-ml-py#submit-experiment-name--pipeline-parameters-none--continue-on-step-failure-false--regenerate-outputs-false--parent-run-id-none-

All of this is based on speculation of the actual cause. If you can share run id of pipeline which didn't run the step you mentioned, we can look into confirming the root cause from system telemetry

yanrez on 29 Mar 2019

👍4

All 4 comments

@sanpil

swinner95 on 29 Mar 2019

All of this is based on speculation of the actual cause. If you can share run id of pipeline which didn't run the step you mentioned, we can look into confirming the root cause from system telemetry

yanrez on 29 Mar 2019

👍4

Does it make sense to have allow_reuse to be True by default? For example, in the ML Studio, you can enable caching of data but it's not enabled by default.
Even if it does make sense to have allow_reuse=True by default, there should be a way to intimate the developer that the step wasn't run and was skipped, a logging message perhaps?

syedsadiqalinaqvi on 17 May 2019

Reuse is one of the key value-props of using Pipelines in a collaborative environment. It makes the workflow more agile by eliminating the reruns when not necessary. What ML Studio is doing is caching for single person execution, if I am not mistaken. We will look into providing a message when allow_reuse=True at execution time.