I'm using Tensorflow Serving load a widendeep model as online predict service, and the model will update every 10 minutes, we found that the first few requests' latency are high right after the new model is loaded, is this a known issue or any suggestion to figure out this problem?
Some tensorflow graphs perform lazy initialization, making the first request (or few requests) to a newly-loaded model slow. The best way to handle that is to add initialization or dummy "warm-up requests" to the init op which tf-serving calls while loading the model.
@chrisolston Thanks for your explanation and suggestion very much, problem is clear to me.
Here is some suggestions for tf-serving loading model.
a) May tf-serving add a warm up option:
1. we can store a request for each model when request first arrived,
2. when new version of model was loaded, do not make it ready util it is warmed up by the stored request.
b) Add lazy-loading model:
1. For we may start hundreds of tf-serving process, and they start loading and updating new version of model almost the same time, these situation may make the cluster's network and disk quite busy(the model is stored on HDFS), and
make the cluster unstable.
2. So we can loading/updating the model at random time in a specified period to make the network and disk more smooth.
For (a), the recommended approach is to do it within the tf graph, triggered from tf-serving calling the init op during load.
For (b), interesting idea. I would expect various I/O queues to smooth it out anyway but maybe you are hitting timeouts? You could write a custom SourceAdapter that acts as the identity function but adds a random delay -- that would do the trick. Feel free to contribute the SourceAdapter via a PR.
Hi, I have the exact same problem. However, I do not understand how one can add initialization or dummy "warm-up requests" to the init op (I used Keras for training and the SavedModelBuilder for exporting the model). Can you please explain it in more detail, e.g. with a code example?
Thanks!
Ping @chrisolston
same problem
Hi @chrisolston , I have the same problem, can you provide an example on how to call the init op during load?
Hi @chrisolston锛孋urrent version of tf serving try to load warmup request from tf_serving_warmup_requests file. I wonder if tensorflow provides common api to export request to the location or not? Or should we write request to the location manually?
Most helpful comment
Hi, I have the exact same problem. However, I do not understand how one can add initialization or dummy "warm-up requests" to the init op (I used Keras for training and the SavedModelBuilder for exporting the model). Can you please explain it in more detail, e.g. with a code example?
Thanks!