Hi,
i am executing BERT solution on machine with GPU (Tesla K80 - 12 GB) . for question answering prediction for single question is taking more than 5 seconds. Can we reduce it to below 1 second.
Do we need to configure any thing to make it possible ?
Thank you
You have the problem where the model is loaded again from the start?
Yes
for example i have 5 questions which are from parallel requests, for every request model get loads and response is more than 5 seconds.
That's how estimator is working. I managed to export it as pb but didn't have time to test it yet. I used
```
def serving_input_fn():
unique_ids = tf.placeholder(tf.int32, [None], name='unique_ids')
input_ids = tf.placeholder(tf.int32, [None, max_seq_length], name='input_ids')
input_mask = tf.placeholder(tf.int32, [None, max_seq_length], name='input_mask')
segment_ids = tf.placeholder(tf.int32, [None, max_seq_length], name='segment_ids')
input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({
'unique_ids': unique_ids,
'input_ids': input_ids,
'input_mask': input_mask,
'segment_ids': segment_ids,
})()
return input_fn
save_hook = tf.train.CheckpointSaverHook("google_bertbert_large", save_secs=1)
estimator.predict(input_fn=predict_input_fn, hooks=[save_hook])
estimator._export_to_tpu = False
estimator.export_savedmodel(export_dir_base="export", serving_input_receiver_fn=serving_input_fn,checkpoint_path=init_checkpoint)
```
Thank you for your response,
I have executed above piece code by placing in run_squad.py. saved_model.pb got generated, can we use that .pb file for further predictions or how i should utilize that.
Can you please let us know how to proceed further .
I got cause exactly as below
The TensorFlow graph is recreated and the checkpoint is reloaded EVERY time you want to use a trained model to make a prediction on new data.
Hi,
@shivamani-ans @anasuna How did you test the QA model using own question without using the dev-test file?
Hi,
we are sending related paragraphs and question to custom read_squad_examples() which we have implemented to take paragraphs and question.
I am struggling to export my ckpt models file to one consolidate .pb file, may you please let me know how you did it?
Most helpful comment
That's how estimator is working. I managed to export it as pb but didn't have time to test it yet. I used
```
def serving_input_fn():
unique_ids = tf.placeholder(tf.int32, [None], name='unique_ids')
input_ids = tf.placeholder(tf.int32, [None, max_seq_length], name='input_ids')
input_mask = tf.placeholder(tf.int32, [None, max_seq_length], name='input_mask')
segment_ids = tf.placeholder(tf.int32, [None, max_seq_length], name='segment_ids')
input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn({
'unique_ids': unique_ids,
'input_ids': input_ids,
'input_mask': input_mask,
'segment_ids': segment_ids,
})()
return input_fn
save_hook = tf.train.CheckpointSaverHook("google_bertbert_large", save_secs=1)
estimator.predict(input_fn=predict_input_fn, hooks=[save_hook])
estimator._export_to_tpu = False
now you will get graph.pbtxt which is used in SavedModel, and then
estimator.export_savedmodel(export_dir_base="export", serving_input_receiver_fn=serving_input_fn,checkpoint_path=init_checkpoint)
```