Azure-docs: Experiment fails with custom docker image

Created on 5 Jun 2019  Â·  4Comments  Â·  Source: MicrosoftDocs/azure-docs

I have two steps in my experiment. One is "training and testing" step and other is "hosting" step in the pipeline. Both use same compute target (Azure ubuntu vm).

1st step (training and testing):
- Uses default CPU docker image
- Used PythonScriptStep() 
- Run completed sucessfully

```
2nd step (Hosting on tensorflow-serving)

  • Uses my custom docker image where tenserflow-serving is setup (docker hub path to image : sannzay/tf_serve:version4)
  • Used EstimatorStep()
  • Run failed with the following error:

"""
Streaming log file azureml-logs/60_control_log.txt
Running: ['/bin/bash', '/tmp/azureml_runs/train-on-amlcompute_1559222999_3f5ec0ca/azureml-setup/docker_env_checker.sh']

Found materialized image on target: sannzay/tf_serve:version4
Starting project file download.
Finished project file download.
Logging experiment running status in history service.
Running: ['sudo', 'docker', 'run', '--name', '00e7cc5e-97f7-4f4b-894c-cf67e4ae4c9b', '--rm', '-v', '/tmp/azureml_runs/00e7cc5e-97f7-4f4b-894c-cf67e4ae4c9b:/azureml-run', '--shm-size', '1g', '-e', 'AZUREML_TARGET_TYPE=remote', '-e', 'EXAMPLE_ENV_VAR=EXAMPLE_VALUE', '-e', 'AZUREML_CONTEXT_MANAGER_OUTPUTCOLLECTION=eyJPdXRwdXRDb2xsZWN0aW9uIjp0cnVlLCJEaXJlY3Rvcmllc1RvV2F0Y2giOlsibG9ncyJdfQ==', '-e', 'AZUREML_CONTEXT_MANAGER_PROJECTPYTHONPATH=bnVsbA==', '-e', 'AZUREML_RUN_TOKEN_EXPIRY=1559895035', '-e', 'AZUREML_RUN_TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6IjI2MkVBOEQzMTNCNkU3ODA2QTU4QUZGNDJGMjFDN0Y4NjlBQzk5M0YiLCJ0eXAiOiJKV1QifQ.eyJyb2xlIjoiQ29udHJpYnV0b3IiLCJzY29wZSI6Ii9zdWJzY3JpcHRpb25zLzAwMjkwNzZjLTA1OTktNDQ3Ni1hYzNlLTQ4NTc4OWQwOTZhZi9yZXNvdXJjZUdyb3Vwcy9uZXR3b3Jrd2F0Y2hlcnJnL3Byb3ZpZGVycy9NaWNyb3NvZnQuTWFjaGluZUxlYXJuaW5nU2VydmljZXMvd29ya3NwYWNlcy9tbHN3IiwiYWNjb3VudGlkIjoiMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwIiwid29ya3NwYWNlSWQiOiI2NDdmNjZiNy1mYWY4LTQ5MjMtYmFmYS1iYmFjNmY4YjA1ZjkiLCJwcm9qZWN0aWQiOiIwMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAiLCJkaXNjb3ZlcnkiOiJ1cmk6Ly9kaXNjb3Zlcnl1cmkvIiwidGlkIjoiNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjQ3Iiwib2lkIjoiNWQ4MDNkZjMtNzRkOC00OTU3LWJlOTktYTAxODA1MmFlNjkwIiwiZXhwIjoxNTU5ODk1MDM1LCJpc3MiOiJhenVyZW1sIiwiYXVkIjoiYXp1cmVtbCJ9.jteHrctQ3V_OHdIbUNNFPiOO7FlNzvIhjXKwqKmGVbFdtFY3yvCHanvOLzgtcxYXZaBN1P5DWT_1xwVUv7WtdPYDyalm_9NVcJskjGY27AY3B7kCk9FQTMv8iOGF51saMcuuyQQnM8zDfrYRePgW3TXoQDRkcCsOvazqtbCTmbDJr72rQDU-8NbX0kjjG6kdMeRBMwFhUTDk7pwtHASxKbjlTRDRQQE6mmW0Odf1k3y4OFZDzQn9VazFf1XQpI_NmRomcuVbF-Kg7OKq58qXRw-b82A8QvHkwMHgtJzz5kxD8SFsYn36yKVGPiAvWxnfKlmE_KKERTk0mK7kvj3ptg', '-e', 'PYTHONUNBUFFERED=True', '-e', 'AZUREML_COMMUNICATOR=None', '-e', 'AZUREML_FRAMEWORK=Python', '-e', 'AZUREML_ARM_PROJECT_NAME=Hello_World1', '-e', 'AZUREML_ARM_WORKSPACE_NAME=mlsw', '-e', 'AZUREML_ARM_SUBSCRIPTION=0029076c-0599-4476-ac3e-485789d096af', '-e', 'AZUREML_ARM_RESOURCEGROUP=networkwatcherrg', '-e', 'AZUREML_EXPERIMENT_SCOPE=/subscriptions/0029076c-0599-4476-ac3e-485789d096af/resourceGroups/networkwatcherrg/providers/Microsoft.MachineLearningServices/workspaces/mlsw/experiments/Hello_World1', '-e', 'AZUREML_WORKSPACE_SCOPE=/subscriptions/0029076c-0599-4476-ac3e-485789d096af/resourceGroups/networkwatcherrg/providers/Microsoft.MachineLearningServices/workspaces/mlsw', '-e', 'AZUREML_DISCOVERY_SERVICE_ENDPOINT=https://southeastasia.experiments.azureml.net/discovery', '-e', 'AZUREML_RUN_HISTORY_SERVICE_ENDPOINT=https://southeastasia.experiments.azureml.net', '-e', 'AZUREML_SERVICE_ENDPOINT=https://southeastasia.experiments.azureml.net', '-e', 'AZUREML_RUN_CONFIGURATION=azureml-setup/mutated_run_configuration.json', '-e', 'AZUREML_INSTRUMENTATION_KEY=3f82d0ec-ca37-44ae-857f-5a6aa3088672', '-e', 'AZUREML_DRIVERLOG_PATH=azureml-logs/driver_log.txt', '-e', 'AZUREML_CONTROLLOG_PATH=azureml-logs/control_log.txt', '-e', 'AZUREML_LOGDIRECTORY_PATH=azureml-logs/', '-e', 'AZUREML_PIDFILE_PATH=azureml-setup/pid.txt', '-e', 'AZUREML_RUN_ID=00e7cc5e-97f7-4f4b-894c-cf67e4ae4c9b', 'continuumio/anaconda', '/bin/bash', '-c', 'cd /azureml-run && "python" "azureml-setup/run_script.py" "python" "azureml-setup/context_manager_injector.py" "-i" "ProjectPythonPath:context_managers.ProjectPythonPath" "-i" "OutputCollection:context_managers.RunHistory" "-i" "UserExceptions:context_managers.UserExceptions" "hosting.py"']

################
The python interpreter version you are using is 2.7.16. AzureML requires python version 3.5.2+.For system managed environments, the python version can be configured in the Conda dependencies file.For user managed environments, the latest version of python can be downloaded from:https://www.python.org/downloads/
################

Uploading control log...
"""


 Both my vm and custom docker has default python set to python3, yet the error persists. While the first step with default image succeeded, the second step with custom image failed, though they use common compute target.  Please help me solve through this issue and also suggest a better way to host the model.

The code i have used to create the 2nd step is:

estimator = Estimator(source_directory='.',
compute_target=compute,
entry_script='hosting.py',
node_count=1,
process_count_per_node=1,
custom_docker_image='sannzay/tf_serve:version4') #image is pulled from the docker hub
estimator.run_config.environment.python.user_managed_dependencies = True

step2 = EstimatorStep(name="Hosting",
estimator=estimator,
estimator_entry_script_arguments=[],
runconfig_pipeline_params=None,
compute_target=compute)
```
The azure docs I referred to run my experiments are:

For pipelines : https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb

For estimators: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models#distributed-training-and-custom-docker-images

For estimatorstep: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 corsubsvc cxp machine-learninsvc product-question triaged

Most helpful comment

could you please track issues publically. this would help other customers to address the same issue

All 4 comments

@sannzay Thank you for the feedback, we will investigate it and get back to you soon.

Hi @YutongTie-MSFT, thanks for taking my issue into consideration. Have you been able to get any insights of the problem? Meanwhile you are investigating the problem, can you please suggest me to cross check any previous steps that i might have done wrong?

@sannzay Sorry for late response, I failed to reproduce your error. Could you please send us an email at [email protected] with your Azure subscription ID and the URL of this thread? We'd like to discus it offline with more immediately help. Thank you.
We will now proceed to close this thread since for now we don't find anything wrong with the document. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.

could you please track issues publically. this would help other customers to address the same issue

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DanijelMalik picture DanijelMalik  Â·  82Comments

keithdv picture keithdv  Â·  41Comments

smcd253 picture smcd253  Â·  44Comments

xkobal picture xkobal  Â·  42Comments

m-andersen picture m-andersen  Â·  65Comments