What is the best solution to serve a fine-tuned model and returning predictions?
this is a duplicate of issue #679
Thanks Jay for the response, however, bert-as-service will only encode sentences. I've looked at their documentation and have it running and it does not do any prediction.
I think this will help you. https://github.com/SunYanCN/BERT-chinese-text-classification-and-deployment
@JimAva
Although it might not be the most efficient method I find wrapping the prediction in a Flask API to work quite well. It works something like this:
First export your model after training:
estimator.export_saved_model(model_dir, serving_input_receiver_fn)
Then load your model in the API using something in the lines of:
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(model_dir)
result = predict_fn(...)
Now you can use predict_fn to serve predictions. I have a rough implementation that I could share if you need it :)
@sarnikowski I am actually using Flask to serve predictions however, it's very slow. Is it possible for you to share the code with me?
TIA!
For anyone else interested in this, i wrote a rough implementation and made it available here: https://github.com/sarnikowski/bert_in_a_flask
Maybe check this out if you are looking for serving BERT fine-tuned model.
BERT Serving and Inferencing from fine-tuned
Most helpful comment
Although it might not be the most efficient method I find wrapping the prediction in a Flask API to work quite well. It works something like this:
First export your model after training:
estimator.export_saved_model(model_dir, serving_input_receiver_fn)Then load your model in the API using something in the lines of:
Now you can use predict_fn to serve predictions. I have a rough implementation that I could share if you need it :)