Mlflow: [BUG] Failed to load registered model from the FTP store

Created on 30 Jul 2020 · 4Comments · Source: mlflow/mlflow

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

[ ] Yes. I can contribute a fix for this bug independently.
[x] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
[ ] No. I cannot contribute a bug fix at this time.

System information

I have written a small example that should be able to replicate the problem. The structure was taken from issue 3197
Ubuntu 18.04
MLflow binary
Mlflow version 1.10.0
Pyhton version: 3.8

Describe the problem

Loading a model with model = mlflow.sklearn.load_model('models:/' + 'model-reg' + '/Staging') should return the model.
Instead it throws an error.

```shell script
mlflow version: 1.10.0
tracking URI: http://XXX.XXX.XXX.XXX:5000/
artifact URI: ftp://mlflow:[email protected]:2222/mlruns/721c37f3c87e4acdaea15b28029e4df8/artifacts


```shell script
Registered model 'model-reg' already exists. Creating a new version of this model...
Created version '6' of model 'model-reg'.
Traceback (most recent call last):
  File "/home/monitoring/Projekt/MethodClasses/test/method_test/basic_ml_flow_test.py", line 52, in <module>
    test()
  File "/home/monitoring/Projekt/MethodClasses/test/method_test/basic_ml_flow_test.py", line 45, in test
    model = mlflow.sklearn.load_model('models:/' + 'model-reg' + '/Staging')
  File "/home/monitoring/.local/share/virtualenvs/MethodClasses-YeH10dls/lib/python3.8/site-packages/mlflow/sklearn.py", line 334, in load_model
    flavor_conf = _get_flavor_configuration(model_path=local_model_path, flavor_name=FLAVOR_NAME)
  File "/home/monitoring/.local/share/virtualenvs/MethodClasses-YeH10dls/lib/python3.8/site-packages/mlflow/utils/model_utils.py", line 23, in _get_flavor_configuration
    raise MlflowException(
mlflow.exceptions.MlflowException: Could not find an "MLmodel" configuration file at "/tmp/tmp42yolmhz/"

Process finished with exit code 1

The model was previously registered. It shows up in the web UI. The model is downloaded properly to a path in /tmp/.
The method in mlflow.store.artifact.artifact_repo returns the correct path to the files. E.g. /tmp/tmp42yolmhz/model. I manually checked the files MLmodel, conda.yml, model.pkl are inside the copied local dir /tmp/tmp42yolmhz/model.

In mlflow.utils.model_utils the function _get_flavor_configuration throws an exception, because it expects that model_path contains the files MLmodel, conda.yml, model.pkl, but model_path is /tmp/tmp42yolmhz/ with contains onlymodel/` which then contains MLmodel, conda.yml, model.pkl.

Code to reproduce issue

Hopefully this is a reproducible test case with the bare minimum necessary to replicate the problem.

Modified example from issue 3197.

FTP-Server:
```shell script
sudo docker run -d --name ftpd_server -p 2222:21 -p 30000-30009:30000-30009 -e "PUBLICHOST=XXX.XXX.XXX.XXX" -e FTP_USER_NAME=mlflow -e FTP_USER_PASS=mlflow -e FTP_USER_HOME=/home/mlflow stilliard/pure-ftpd


```python
import pandas as pd

import string
import random

import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient

from sklearn.preprocessing import StandardScaler


def test():
    mlflow.set_tracking_uri('http://XXX.XXX.XXX.XXX:5000/')

    def get_random_string(length):
        letters = string.ascii_lowercase
        return ''.join(random.choice(letters) for i in range(length))

    expr_name = get_random_string(32)
    artifact_location = "ftp://mlflow:[email protected]:2222/mlruns"

    if mlflow.get_experiment_by_name(expr_name) is None:
        mlflow.create_experiment(expr_name, artifact_location)

    mlflow.set_experiment(expr_name)

    training_input = pd.DataFrame({'1': [1, 2, 3], '2': [4, 5, 6]})
    predict_input = pd.DataFrame({'1': [10, 20, 30], '2': [40, 50, 60]})

    with mlflow.start_run():
        print('mlflow version:', mlflow.__version__)
        print('tracking URI:', mlflow.get_tracking_uri())
        print('artifact URI:', mlflow.get_artifact_uri())

        scaler = StandardScaler()
        output = scaler.fit_transform(training_input)

        mlflow.sklearn.log_model(scaler, artifact_path='model',
                                 registered_model_name='model-reg')
        client = MlflowClient()
        version = client.get_registered_model('model-reg').latest_versions[0].version
        client.transition_model_version_stage('model-reg', version, 'Staging')

        model = mlflow.sklearn.load_model('models:/' + 'model-reg' + '/Staging')

        print(model)
        print(model.predict(predict_input))

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

If there are informations missing, I can include them here.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

[x] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[x] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: Local serving, model deployment tools, spark UDFs
[ ] area/server-infra: MLflow server, JavaScript dev server
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

[ ] area/uiux: Front-end, user experience, JavaScript, plotting
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Language

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

areartifacts aremodels bug prioritimportant-soon

Source

Nielsmitie

Most helpful comment

@Nielsmitie Thanks for reporting this issue. I tweaked the code to fix #2641 and ran your code, and it worked. I'll make a PR to fix #2641, which should solve fix issue as well.

harupy on 30 Jul 2020

👍2

All 4 comments

@Nielsmitie Thanks for reporting this issue. I tweaked the code to fix #2641 and ran your code, and it worked. I'll make a PR to fix #2641, which should solve fix issue as well.

harupy on 30 Jul 2020

👍2

@Nielsmitie I opened https://github.com/mlflow/mlflow/pull/3204 to fix #2641.

harupy on 31 Jul 2020

Thank you for your time and support. And fixing the problem :-)

Nielsmitie on 31 Jul 2020

Manually verified #3204 also fixed this issue.

harupy on 5 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings