The short story is, when I try to submit an azure ML pipeline run (an azure ML pipeline, not an Azure pipeline) from a jupyter notebook, I get PermissionError: [Errno 13] Permission denied: '.\NTUSER.DAT'. More details:
Relevant code:
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.runtime import AutoMLStep
automl_settings = {
"iteration_timeout_minutes": 20,
"experiment_timeout_minutes": 30,
"n_cross_validations": 3,
"primary_metric": 'r2_score',
"preprocess": True,
"max_concurrent_iterations": 3,
"max_cores_per_iteration": -1,
"verbosity": logging.INFO,
"enable_early_stopping": True,
'time_column_name': "DateTime"
}
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
path = ".",
compute_target=compute_target,
run_configuration=conda_run_config,
training_data = financeforecast_dataset,
label_column_name = 'TotalUSD',
**automl_settings
)
automl_step = AutoMLStep(
name='automl_module',
automl_config=automl_config,
allow_reuse=False)
training_pipeline = Pipeline(
description="training_pipeline",
workspace=ws,
steps=[automl_step])
training_pipeline_run = Experiment(ws
, 'test').submit(training_pipeline)
The training_pipeline step runs for apx 20 seconds, and then I get a long trace, ending in:
~\AppData\Local\Continuum\anaconda2\envs\forecasting\lib\site-
packages\azureml\pipeline\core\_module_builder.py in _hash_from_file_paths(hash_src)
100 hasher = hashlib.md5()
101 for f in hash_src:
--> 102 with open(str(f), 'rb') as afile:
103 buf = afile.read()
104 hasher.update(buf)
PermissionError: [Errno 13] Permission denied: '.\\NTUSER.DAT'
According to Azure's docs on this topic, submitting a pipeline uploads a "snapshot" of the "source directory" you specified. Initially, I hadn't specified a source directory, so, to test that out, I added:
default_source_directory="testing",
as a parameter for the training_pipeline object, but saw the same behavior when I then tried to run it. Not sure if that is the same source directory the documentation is referring to. The docs also say that if no source directory is specified, the "current local directory" is uploaded. I used print (os.getcwd()) to get the working directory and gave "Everyone" full control permissions on the directory (working in a windows env).
All the preceding code works fine, and I can submit an experiment if I use a ScriptRunConfig and run it on attached compute rather than using a pipeline/training cluster.
Any ideas? Thanks in advance to anyone who tries to help.
@casieo what is your compute target? Is it AML Compute? What all does your source directory contain? Its recommended that you keep a separate folder as a step source directory where only needed files(like scripts) for that step, are there.
Can you specify the source_directory in the path of AutoMLConfig instead of using path = "."
@sanpil That was it, thank you. After posting this question but before hearing from you, I had tried specifying the path in the automl config object, by adding the bolded line below, but that did not work.
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
path = ".",
data_script = "c:\users\me\script.py"
compute_target=compute_target,
run_configuration=conda_run_config,
training_data = financeforecast_dataset,
label_column_name = 'TotalUSD',
**automl_settings
)
Here is the config that finally worked:
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
compute_target=compute_target,
run_configuration=conda_run_config,
path = "c:\users\me",
data_script ="script.py",
#training_data = financeforecast_dataset,
#label_column_name = 'TotalUSD',
**automl_settings
)
@casieo
We will now proceed to close this thread. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.
Most helpful comment
@sanpil That was it, thank you. After posting this question but before hearing from you, I had tried specifying the path in the automl config object, by adding the bolded line below, but that did not work.
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
path = ".",
data_script = "c:\users\me\script.py"
compute_target=compute_target,
run_configuration=conda_run_config,
training_data = financeforecast_dataset,
label_column_name = 'TotalUSD',
**automl_settings
)
Here is the config that finally worked:
automl_config = AutoMLConfig(task = 'forecasting',
debug_log = 'automl_errors.log',
compute_target=compute_target,
run_configuration=conda_run_config,
path = "c:\users\me",
data_script ="script.py",
#training_data = financeforecast_dataset,
#label_column_name = 'TotalUSD',
**automl_settings
)