hi,I have some problems about multi models used in the tensorflow serving. The scenario is : e.g. I have three model, model A, B, C, I want the Inference order is serial: A->B->C,and the output model of A will be used by the input of model B, then, the output of model B will be used by the input of model C. Is there any method to solve the question? thanks
You may have crossed over from the experimental to the serious architecture zone.
Fortunately over the last ten years many common system design problems have been solved often enough to be distilled down to generalized patterns and can be found prepackaged in a box, ready to use.
You may want to use something like celery. http://www.celeryproject.org
Queues and workers are generally one of the best ways to run and combine batch processes in scalable architectures. Celery may be fairly approachable to implement.
You may try to put your three models into a entire graph and you can build a xxx_saved_model.py like official implementation that export three functions as loading respective checkpoint file.
So that you do not need to change TF-serving' action and your work of combination will be done in tensorflow model.
If you want use one of your models alone, you can change outputs in signature_def in your script named xxx_saved_model.py by adding output of the model.For example, your models output respectively a,b,c.
By add 'c' into outputs in signature_def, you can get desirable result. If you want outputs of model A simultaneously, you can solve it by adding 'a' into outputs in the signature_def.
To sum up, your three models share a computaion graph. Changing it actions just by changing outputs.
An addition point. Many production applications will require a cluster of machines to serve the traffic volume.
In those cases, each sub model would run on its own pool of machines. The size of each pool would be adjusted based on the work required for each model.
As many AI models take a significant amount of time to process a request, batching like this will likely not be significantly slower (request to final result time) than streaming.
When things grow more, you probably should look into running your infrastructure under Mesos (or your preferred similar functionality), and adding in Hadoop or Spark.
When your models and application begin to stabilize, and your volume grows, it will start to make economic sense to take what you have learned and invest development costs into increasing the efficiency of the application to balance the rising costs of adding servers.
@superxc you should modify the source code in serving/tensorflow_serving/servables/tensorflow/predict_impl.cc
change the code in function "SavedModelPredict"
write your code to feed for second model before "PostProcessPredictionResult",
then return the out put of your second model using function "PostProcessPredictionResult"
Closing due to staleness. If this is still an issue, please file a new updated issue with current steps to reproduce the bug. If this is a question, please ask it on:
https://stackoverflow.com/questions/tagged/tensorflow-serving
Thanks!
@Z-Zheng how can I use it with TF 2.0 and Estimators.
Chain multiple estimators to create a single SavedModel with a single serving file/output.
Suppose I have 3 estimators with me, first is BoostedTrees, who's output I want to use as input into DNNClassifier, and the output of which I want to use in my custom Estimator.
Is there a way to chain output/input of each other to create a mega estimator of sorts.
Please help me out here.
Most helpful comment
You may try to put your three models into a entire graph and you can build a xxx_saved_model.py like official implementation that export three functions as loading respective checkpoint file.
So that you do not need to change TF-serving' action and your work of combination will be done in tensorflow model.
If you want use one of your models alone, you can change outputs in signature_def in your script named xxx_saved_model.py by adding output of the model.For example, your models output respectively a,b,c.
By add 'c' into outputs in signature_def, you can get desirable result. If you want outputs of model A simultaneously, you can solve it by adding 'a' into outputs in the signature_def.
To sum up, your three models share a computaion graph. Changing it actions just by changing outputs.