Hi, I am wondering if I have miss something when I try to log model with signature and input example? Can anyone help me please. Thank you!
My tracking-uri and backend-store-uri are the same mysql database uri
and default-artifact-root is ./mlruns
I am trying to train an ALS model using pyspark2 ml package and log it by mlflow,
but on the mlflow ui http://
And I don't see the difference for the one I didn't log model with signature and the one I did, in the process of "train -> models serve -> send data in json file and get prediction ". Could someone tell me how can I see the difference.
The information of model signature and input_example should be there as I see for model of sklearn flavor.



@Astonzzh Thanks for filing this issue. I was able to reproduce the issue.
from pyspark.sql import SparkSession
from pyspark.ml.recommendation import ALS
import mlflow
import mlflow.spark
from mlflow.models.signature import infer_signature
from pyspark.ml import Pipeline
spark = (
SparkSession
.builder
.getOrCreate()
)
# prepare train and test data
train = spark.createDataFrame(
[(0, 0, 4.0),
(0, 1, 2.0),
(1, 1, 3.0)],
["user", "item", "rating"],
)
test = spark.createDataFrame(
[(0, 2), (1, 0), (2, 0)],
["user", "item"]
)
# train model
als = ALS(rank=10, maxIter=10, regParam=0.1, userCol="user", itemCol="item", ratingCol="rating")
pipeline = Pipeline(stages=[als])
model = pipeline.fit(train)
# create signature and example
signature = infer_signature(test, model.transform(test))
example_dict = {'user': 0, 'item': 1}
# log model
mlflow.spark.log_model(
model,
'small_ALS',
signature=signature,
input_example=example_dict
)
I found signature and input_example are not passed to Model.log.
https://github.com/mlflow/mlflow/blob/c2708b5735354e10ec052164253eb74243c5603f/mlflow/spark.py#L174-L177
I have confirmed that adding missing input_example and signature solves this issue.
if is_local_uri(run_root_artifact_uri):
return Model.log(artifact_path=artifact_path, flavor=mlflow.spark, spark_model=spark_model,
conda_env=conda_env, dfs_tmpdir=dfs_tmpdir, sample_input=sample_input,
registered_model_name=registered_model_name,
+ signature=signature,
+ input_example=input_example)


Cool! Thank you!