Mlflow: Integration with Automated Machine Learning (AutoML)

Created on 15 Oct 2018  Â·  3Comments  Â·  Source: mlflow/mlflow

Goals

As a machine learning developer, I want to integrate automated machine learning (AutoML) tools or libraries with MLflow, so that I can save time in exploring machine learning and deep learning algorithms.

As a data scientist, I want to leverage AutoML, so that I can quickly explore hundreds or thousands of machine learning models.

Consider The Following AutoML Libraries

  • EpistasisLab's tpot (4,713 stars as of Oct 15)
  • jhfjhfj1's autokeras (3,383 stars as of Oct 15)
  • automl's auto-sklearn (2,629 stars as of Oct 15)
  • hyperopt's hyperopt (2,605 stars as of Oct 15)
  • reiinakano's xcessiv (1,102 stars as of Oct 15)
  • ClimbsRocks's auto_ml (1,013 stars as of Oct 15)

Inspiration for AutoML

AirBnB

Airbnb uses Automated Machine Learning (AML) to accomplish the following:

  1. Benchmarking. Unbiased presentation of challenger models: AML can quickly present a plethora of challenger models using the same training set as your incumbent model. This can aid the data scientist in choosing the best model family.

  2. Diagnostics And Exploration. Detecting Target Leakage: because AML builds candidate models extremely fast in an automated way, we can detect data leakage earlier in the modeling lifecycle. Canonical diagnostics can be automatically generated such as learning curves, partial dependence plots, feature importances, etc.

  3. Automation. Tasks like exploratory data analysis, pre-processing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some some extent with an Automated Machine Learning framework.

DataRobot

DataRobot uses AutoML in their machine learning process.

screen-shot-2018-03-21-at-10 50 46-am-e1521807708174-1024x544

When developing a model with the traditional process, as you can see from Figure (above), the only automatic task is model training. Automated machine learning software automatically executes all the steps outlined in red – manual, tedious modeling tasks that used to require skilled data scientists. The traditional process often takes weeks or months, but with automated machine learning, it takes days at most for business professionals and data scientists to develop and compare dozens of models, find insights and predictions, and solve more business problems much faster.

For additional inspiration, check out the following three Github topics: automated-machine-learning, auto-ml, automl

stale

Most helpful comment

Thanks for raising this. Continuing to improve support for using hyperparam optimization libs is high on the roadmap priorities.

In 0.8.0, which was released 11 days ago, the UI was improved to better support visualizing runs from hyperparam searches and multi-step workfkows.

Quote from the release notes:

Runs that are "nested" inside other runs (e.g., as part of a hyperparameter search or multistep workflow) now show up grouped by their parent run, and can be expanded or collapsed altogether. Runs can be nested by calling mlflow.start_run or mlflow.run while already within a run.

Also, Have you seen this example code demonstrating using MLflow with hyperparam techniques/lib randon search, hyperopt, gpyopt - https://github.com/mlflow/mlflow/tree/master/examples/hyperparam

I would love to know more specifically the shape that folks in the community would like to see further investments in AutoML take. More UI improvements? More Examples or docs? New APIs or commands supported by the CLI? Also, please give examples of proposed APIs, etc.

All 3 comments

Thanks for raising this. Continuing to improve support for using hyperparam optimization libs is high on the roadmap priorities.

In 0.8.0, which was released 11 days ago, the UI was improved to better support visualizing runs from hyperparam searches and multi-step workfkows.

Quote from the release notes:

Runs that are "nested" inside other runs (e.g., as part of a hyperparameter search or multistep workflow) now show up grouped by their parent run, and can be expanded or collapsed altogether. Runs can be nested by calling mlflow.start_run or mlflow.run while already within a run.

Also, Have you seen this example code demonstrating using MLflow with hyperparam techniques/lib randon search, hyperopt, gpyopt - https://github.com/mlflow/mlflow/tree/master/examples/hyperparam

I would love to know more specifically the shape that folks in the community would like to see further investments in AutoML take. More UI improvements? More Examples or docs? New APIs or commands supported by the CLI? Also, please give examples of proposed APIs, etc.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue!

Was this page helpful?
0 / 5 - 0 ratings