Azure-docs: azureml-sdk 1.0.43 questions

Created on 14 Jun 2019  Â·  14Comments  Â·  Source: MicrosoftDocs/azure-docs

reading over the 1.0.43 release notes our team has a few questions:

  1. what advantages does a SKLearn estimator have over a regular Estimator? we currently have a sklearn based training but we just use Estimator and send it to hyperdrive.
  2. what is the use case for a ModuleStep? does Module = Python module, as in a way to package and make available functions across multiple steps?
  3. does the below mean that we won't have to do the os.makedir('blah', exist_ok=True) anymore?
    >Moved outputs directory creation and outputs directory upload out of the user process.
  4. hash_paths param being deprecated, in favor of hashing the entire source_directory.

    1. imagine a pipeline with 3 PythonScriptSteps, stepA, stepB, and stepC that:



      1. run A-C sequentially,


      2. has corresponding scripts all live in the same source directory.



    2. Will all three steps re-run if I:



      1. re-submit the pipeline,


      2. have allow_reuse on all three steps, and


      3. make 1 change to the script of stepC???


        cc: @j-martens @sanpil @jpe316




Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

assigned-to-author corsubsvc machine-learninsvc product-question triaged

All 14 comments

@swanderz Thanks for the feedback. We are investigating the issue and will update you shortly.

SKLearn estimator uses pre-built image, which speeds up first job start time, it goes through end-to-end tests, so less likely to break if there are changes upstream. you can set framework_version to lock the version to the tested one and make your job reproducible.

Thanks for the feedback @swanderz . I'll work on tracking down the answers with the engineering team.

@maxlux, ok definitely sounds helpful, though I'd like to know more.

  • pre-build image == Docker image?? which one?
  • what kind of end-to-end tests are run?
  • what is meant by "framework"?

Azure ML create docker image every time is runs the job on Azure ML Compute. In case of using estimators, prebuilt docker image will be used.
End-to-end tests for SKLearn - training model, etc. Not sure what is the full set.
framework = SKLearn|PyTorch|TF. In this case framework_version would be SKLearn package version.

Hi @swanderz, did you get the answers as pertains to the docs?

@j-martens in addition to @maxluk 's help, @yanrez let me know the following:

We need to release more changes before those classes become actually useful for people
It was prematurely added into 43 release notes, we are not mentioning it in new release notes, as even new upcoming sdk drop doesn't have enough changes
What we should say at this point is that we are doing more changes in modulestep incoming few releases to make it useful for customers.

Outside of SKlearnStep and ModuleStep, I'm still looking for answers to questions 3 and 4 above. Maybe @sanpil can help?

Hi!
4-we recommend to user different folders for different steps. Main reasoning was previous approach was confusing many users, and we decided to simplify it. Now it's more straightforward in my opinion, but to fully benefit from reuse you need to put training scripts into standalone folders

@wbaumann-microsoft Do you know answer for the item 3 ?

@wbaumann-microsoft Do you know answer for the item 3 ? (Ping)

The only question pending is 3. This should be assigned to @wbaumann-microsoft.

I don't believe that change will have any impact on what folders users need to create -- it was just adjusting what processes the system used for various setup tasks to behave well in multi-process scenarios.

@wbaumann-microsoft I'll take that as an answer. We can now close this month old issue. See you all on the next release notes related issue! 🤣

Was this page helpful?
0 / 5 - 0 ratings