Hi, I am trying to use tensor2tensor and Tensorflow Serving together.
The problem I have is that the --export_saved_model option of t2t-trainer.py does not export saved_model.pb and variables as described at https://www.tensorflow.org/serving/serving_basic#train_and_export_tensorflow_model
Then the tensorflow_model_server command fails because it cannot find SavedModel.
t2t-trainer.py --export_saved_model --output_dir /path/to/output/1 ...tensorflow_model_server --model_base_path=/path/to/output2017-10-18 06:54:45.657925: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config: model_name: default model_base_path: /path/to/output
2017-10-18 06:54:45.658343: I tensorflow_serving/model_servers/server_core.cc:434] Adding/updating models.
2017-10-18 06:54:45.658500: I tensorflow_serving/model_servers/server_core.cc:485] (Re-)adding model: default
2017-10-18 06:54:45.825672: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: default version: 1}
2017-10-18 06:54:45.825737: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: default version: 1}
2017-10-18 06:54:45.825764: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: default version: 1}
2017-10-18 06:54:45.827306: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: default version: 1} failed: Not found: Session bundle or SavedModel bundle not found at specified export location
I used tensor2tensor 1.2.4 in Python 3.5.2 for training.
tensorflow_model_server environment is the following (in Python 2.7.12)
# pip freeze | grep tensorflow
tensorflow==1.3.0
tensorflow-serving-api==1.3.0
tensorflow-tensorboard==0.1.2
seemingly related: #349
Hi,
It's been a while since I exported a model with tensor2tensor, but as far as I remember the model is saved in the same directory as the checkpoints, in a directory called export. I didn't set --output_dir as you have.
Say you followed the basic tutorial, the saved model will be at t2t_train/translate_ende_wmt32k/transformer-transformer_base_single_gpu/export/Servo/XXX/, where XXX, is just numbers.
It's saved as saved_model.pbtxt, but that still works with tensorflow model server.
Hope that helps.
--EDIT--
Just ran t2t-trainer again, and can verify what I said above. Note in the output t2t-trainer will tell you exactly where the model has been saved. Don't think versions will make a difference, but I'm using tensor2tensor 1.2.3.
R.
Hi, thanks for your comment.
Without --output_dir, the log is like
INFO:tensorflow:Saving checkpoints for 401 into /tmp/tensor2tensor/model.ckpt.
but there is no saved_model.pbtxt in /tmp/tensor2tensor.
eval/
events.out.tfevents.1508324248.7be41bae3421
flags.txt
graph.pbtxt
hparams.json
model.ckpt-1.data-00000-of-00003
model.ckpt-1.data-00001-of-00003
model.ckpt-1.data-00002-of-00003
model.ckpt-1.index
model.ckpt-1.meta
(omitted for brevity)
model.ckpt-565.data-00000-of-00003
model.ckpt-565.data-00001-of-00003
model.ckpt-565.data-00002-of-00003
model.ckpt-565.index
model.ckpt-565.meta
In my experience, Tensor2tensor will only export the model on the final step, by default it's set to 250000 steps, you're only at step 565, going by the output you provided.
If you want tensor2tensor to generate a model before the 250000 step, set the command line parameter --train_steps to a lower value. Say if you're currently at step 565, set --train_steps=570 for example and training will stop at step 570 and export the model. You can then restart the training if you want, by removing the --train_steps parameter.
Note that the output from tensor2tensor will have something like: SavedModel written to: b"/t2t_train/translate_ende_wmt32k/transformer-transformer_base_single_gpu/export/Servo/temp-b'1508318346'/saved_model.pbtxt" When it saves a model.
--Edit--
Probably not clear above but, checkpoints are not the same as saved models
R.
Thank you @robertBrnnn , I obtained saved_model.pbtxt by adding --train_steps even though I also needed --locally_shard_to_cpu --worker_gpu 0 in my environment.
I assumed saved_model.pbtxt is generated and updated on the fly while training like check point files.
If anyone knows why it can be generated only after training, I would appreciate your comment.
@robertBrnnn hi, i'm exporting saved model now, but it fails
i have search by google, someone said it's caused by flag "allow_soft_placement"
but in function session_config, this flag is set to true
so i don't know how to resolve this.
have some advices?
thanks a lot
errors look like this:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'save/ShardedFilename_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Identity: CPU
ShardedFilename: CPU
[[Node: save/ShardedFilename_1 = ShardedFilename_device="/device:GPU:0"]]
@yuimo Did you try adding --locally_shard_to_cpu --worker_gpu 0?
Just a minor add on. If you want to export your model in binary format, i.e. .pb instead of .pbtxt, you have to modify the make_export_strategy function in /utils/trainer_utils.py
@vishalnus now found in trainer_lib.py, but for some reason, the .pb file created doesn't contain the weights inside it, only the graph description
Most helpful comment
Just a minor add on. If you want to export your model in binary format, i.e. .pb instead of .pbtxt, you have to modify the make_export_strategy function in /utils/trainer_utils.py