Machinelearningnotebooks: Azure Machine Learning- Triggered Pipeline does not Execute the Python Script

Created on 26 Mar 2019  路  4Comments  路  Source: Azure/MachineLearningNotebooks

Hi @rastala, @hning86 ,

I am blocked with Azure ML pipeline execution, not sure what is wrong. pipeline does not refresh/register the model when triggered via REST endpoint or Azure portal. I tried looking into the container but no progress yet.

Environment Details,

  • Python 3.6
  • Compute - AMLCompute [STANDARD_D2_V2]
  • Conda packages - scikit-learn, pandas
  • Pip packages- azureml-sdk[notebooks], azureml-train-automl

Case Steps,

  1. Configure the Environment and Azure ML Pipeline as per the Github sample(starter) but with a single python script( part of pipeline step)
  2. Execute the Pipeline using Jupyter
  3. Step 2 execution creates a trained model and publishes the pipeline
  4. Execute the pipeline using provided REST endpoint
  5. Step 4 completes in 3-5 seconds without updating the Model.
  6. Activity run contains logs from the first run only

Expected Behavior - Triggering the pipeline via the endpoint should execute the custom python script(Train.py) to register/refresh the model in azure ML workspace

Observed Behavior - Triggering the pipeline via the endpoint executes the pipeline but does not register/refresh the Model in azure ML workspace

Appreciate your help,
Thank You

Most helpful comment

I'd suspect you hit automatic reuse where orchestrator doesn't re-run step if it same parameters and same input path.
One way to control this is parameter allow_reuse as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py
Another way to do it on pipeline level is parameter regenerate_outputs as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline%28class%29?view=azure-ml-py#submit-experiment-name--pipeline-parameters-none--continue-on-step-failure-false--regenerate-outputs-false--parent-run-id-none-

All of this is based on speculation of the actual cause. If you can share run id of pipeline which didn't run the step you mentioned, we can look into confirming the root cause from system telemetry

All 4 comments

  • @sanpil

I'd suspect you hit automatic reuse where orchestrator doesn't re-run step if it same parameters and same input path.
One way to control this is parameter allow_reuse as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py
Another way to do it on pipeline level is parameter regenerate_outputs as in https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline%28class%29?view=azure-ml-py#submit-experiment-name--pipeline-parameters-none--continue-on-step-failure-false--regenerate-outputs-false--parent-run-id-none-

All of this is based on speculation of the actual cause. If you can share run id of pipeline which didn't run the step you mentioned, we can look into confirming the root cause from system telemetry

Does it make sense to have allow_reuse to be True by default? For example, in the ML Studio, you can enable caching of data but it's not enabled by default.
Even if it does make sense to have allow_reuse=True by default, there should be a way to intimate the developer that the step wasn't run and was skipped, a logging message perhaps?

Reuse is one of the key value-props of using Pipelines in a collaborative environment. It makes the workflow more agile by eliminating the reruns when not necessary. What ML Studio is doing is caching for single person execution, if I am not mistaken. We will look into providing a message when allow_reuse=True at execution time.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkawchak picture tkawchak  路  5Comments

swanderz picture swanderz  路  5Comments

tylercmsft picture tylercmsft  路  4Comments

swanderz picture swanderz  路  5Comments

swanderz picture swanderz  路  5Comments