Machinelearningnotebooks: Cannot update PythonScriptStep arguments after initial definition

Created on 8 Apr 2019 · 5Comments · Source: Azure/MachineLearningNotebooks

Repro Steps

Create a PythonScriptStep defined like:

get_hyperdrive_metrics_step = PythonScriptStep(
    name='get_hyperdrive_metrics',
    script_name='get_metrics.py',
    arguments=['--input_dir', hyperdrive_json,
               '--output_dir', hyperdrive_metrics
              ],
    compute_target=compute_target,
    inputs=[hyperdrive_json],
    outputs=[hyperdrive_metrics],
    runconfig=batchai_run_config,
    source_directory=os.path.join(os.getcwd(), 'compute'),
    allow_reuse=False
)

Change get_metrics.py's arg from --input_dir to --input_file and change PythonScriptStep definition accordingly to:

get_hyperdrive_metrics_step = PythonScriptStep(
    name='get_hyperdrive_metrics',
    script_name='get_metrics.py',
    arguments=['--input_file', hyperdrive_json,
               '--output_dir', hyperdrive_metrics
              ],
    compute_target=compute_target,
    inputs=[hyperdrive_json],
    outputs=[hyperdrive_metrics],
    runconfig=batchai_run_config,
    source_directory=os.path.join(os.getcwd(), 'compute'),
    allow_reuse=False
)

Submit the step as part of a pipeline.

Error

Run ID: `7e79b9be-a181-4737-8921-ec865ce9408e`'s `80_driver_log.txt`

For debugging's sake I'm printing the parsed arguments and you can see that input_dir is passing even after I change the arguments parameter of get_hyperdrive_metrics_step

all args:  Namespace(input_dir='/mnt/batch/tasks/shared/LS_root/jobs/avadevitsmlsvc/azureml/7e79b9be-a181-4737-8921-ec865ce9408e/mounts/workspaceblobstore/azureml/b4eac43d-8d82-4ff5-ad75-227c7598b87a/hyperdrive_json', output_dir='/mnt/batch/tasks/shared/LS_root/jobs/avadevitsmlsvc/azureml/7e79b9be-a181-4737-8921-ec865ce9408e/mounts/workspaceblobstore/azureml/7e79b9be-a181-4737-8921-ec865ce9408e/hyperdrive_metrics')
cwd: /mnt/batch/tasks/shared/LS_root/jobs/avadevitsmlsvc/azureml/7e79b9be-a181-4737-8921-ec865ce9408e/mounts/azureml_project_share/azureml/7e79b9be-a181-4737-8921-ec865ce9408e
dir of cwd ['.amlignore', 'aml_config', 'assets', 'azureml-logs', 'azureml-setup', 'batch_scoring.py', 'get_data.py', 'get_metrics.py', 'join.py', 'munge_absence.py', 'munge_headcount.py', 'munge_leaver.py', 'munge_productivity.py', 'munge_promo.py', 'munge_roster.py', 'munge_time.py', 'munge_travel.py', 'outputs', 'test.json', 'train.py', 'visualize.py', '__init__.py']


The experiment failed. Finalizing run...
Logging experiment finalizing status in history service
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.10079622268676758 seconds
Traceback (most recent call last):
  File "azureml-setup/context_manager_injector.py", line 161, in <module>
    execute_with_context(cm_objects, options.invocation)
  File "azureml-setup/context_manager_injector.py", line 90, in execute_with_context
    runpy.run_path(sys.argv[0], globals(), run_name="__main__")
  File "/azureml-envs/azureml_2e164768cafa8a117d6ceb2f5ad291b7/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/azureml-envs/azureml_2e164768cafa8a117d6ceb2f5ad291b7/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/azureml-envs/azureml_2e164768cafa8a117d6ceb2f5ad291b7/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "get_metrics.py", line 17, in <module>
    parent = os.path.dirname(args.input_file)
AttributeError: 'Namespace' object has no attribute 'input_file'

Attempts at resolution (all failed)

creating a new step get_hyperdrive_metrics_step2 and submitting the step within a pipeline
creating a copy of get_metrics.py, get_metrics2.py, and passing that to the PythonScriptStep
restarting Jupyter kernel
creating a new Experiment

Source

swanderz

Most helpful comment

@manjumegh @sonnypark @sanpil

Thank you guys so much for the fix. Part of me feels bad for throwing this your way but after a hours of trying to debug myself, I'm glad I reached out. Cheers

swanderz on 10 Apr 2019

👍2

All 5 comments

We are unable to repro. Do you mind sending me a very simple sample to [email protected]?

sanpil on 9 Apr 2019

The error, _AttributeError: 'Namespace' object has no attribute 'input_file'_,
suggests argparse try to use a variable named 'input_file'.
Would you confirm dest paramter on add_argument() for the expected parameter?
I suspect: parser.add_argument( "--input_file", ..., dest="input_dir")

sonnypark on 9 Apr 2019

🚀1

I could reproduce issue:
File ".\test.py", line 13, in main
parent = os.path.dirname(args.input_file)
AttributeError: 'list' object has no attribute 'input_file'

created sample python script test.py, and tried accessing input_file.
In your case, issue is with: "get_metrics.py", please make sure you are accessing input_file in right way.

manjumegh on 9 Apr 2019

🚀1

@swanderz please see @sonnypark suggestion. With that configuration, @manjumegh was able to reproduce the exact error you were seeing. Thx.

sanpil on 9 Apr 2019

🚀1

@manjumegh @sonnypark @sanpil

Thank you guys so much for the fix. Part of me feels bad for throwing this your way but after a hours of trying to debug myself, I'm glad I reached out. Cheers

swanderz on 10 Apr 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

PermissionError: [Errno 13] Permission denied: '.\NTUSER.DAT'. when trying to run ML pipeline

casieo · 4Comments

Run input_datasets parquet to dataframe results in empty dataframe

corticalstack · 5Comments

Stalled pipelines

jarandaf · 4Comments

Pipeline Portal UX: log StepRun's metrics to parent PipelineRun

swanderz · 4Comments

Uploading and registering a dataset overwrites the previous versions

lefaivre · 5Comments