
Same here. It takes nearly 7 minutes. I am using one RTX 2080 Super. The image for 2.1.0 is almost instantaneous.
Also the nightly-gpu takes long time just like 2.2.0-gpu (or latest-gpu)

@HuiCheng Can you share a minimal reproducible code for us to reproduce this issue? Thanks!
@HuiCheng Can you share a minimal reproducible code for us to reproduce this issue? Thanks!
docker run --gpus all -it --rm -v /tmp/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu_trt/00000123/:/models/intent/1 -e TF_CPP_MIN_LOG_LEVEL=0 -e MODENAME=intent tensorflow/serving:2.2.0-gpu-libnvrtc10.1
In addition, myself models like this.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 Off | 00000000:04:00.0 Off | Off |
| 33% 30C P8 9W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 8000 Off | 00000000:05:00.0 Off | Off |
| 33% 30C P8 7W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Quadro RTX 8000 Off | 00000000:08:00.0 Off | Off |
| 33% 29C P8 14W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Quadro RTX 8000 Off | 00000000:09:00.0 Off | Off |
| 33% 29C P8 14W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Quadro RTX 8000 Off | 00000000:86:00.0 Off | Off |
| 33% 30C P8 10W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Quadro RTX 8000 Off | 00000000:87:00.0 Off | Off |
| 33% 30C P8 8W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Quadro RTX 8000 Off | 00000000:8A:00.0 Off | Off |
| 33% 31C P8 5W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Quadro RTX 8000 Off | 00000000:8B:00.0 Off | Off |
| 33% 30C P8 15W / 260W | 0MiB / 48601MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I've encountered the same regression https://github.com/tensorflow/serving/issues/1659#issuecomment-639540075 as well.
@HuiCheng @vijaygill Are you using recurrent or dynamic structures in your models?
Having same problem, any further updated on this issue? Thank you.
Quadro RTX 8000 has compute capability 7.5 and we don't ship SASS for it with TensorFlow, so the time spent here is just the CUDA driver JIT compiling the code. Including new CCs into TF pip releases is tricky also because adding new compute capabilities increases binary size, which is a problem as well.
CC @chsigg
It appears to work with 2.2.0-rc2, but then takes longer at 2.2.0
It shouldn't have to JIT from ptx_70 to sass_75, but rather it should just run sass_70 on sm_75 hardware.
I can't find anything that changed between 2.2.0-rc2 and 2.2.0 with regards to compute capabilities. But you could try to run 2.2.0-rc2 with CUDA_FORCE_PTX_JIT=1. If that results in about the same startup delay, that would be a strong signal that it's indeed JITing with 2.2.0.
This has a delay for me on P5000 on 2.2.0-rc2 when setting:
export CUDA_FORCE_PTX_JIT=1
What's the amount of startup delay, is it comparable to 2.2.0 without CUDA_FORCE_PTX_JIT?
Also, did you see the difference in startup delay between 2.2.0-rc2 and 2.2.0 with a Quadro P5000? That's surprising because P5000 are sm_61 and we do ship sass_61.
latest-gpu ( ~7 minute delay)
root@psdzmtm12:/tmp/tfserving# docker run --gpus all -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two tensorflow/serving:latest-gpu
2020-07-25 15:02:51.524702: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-25 15:02:51.524970: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-25 15:02:51.524994: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-25 15:02:51.625676: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-25 15:02:51.625715: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-25 15:02:51.625725: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-25 15:02:51.625758: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-25 15:02:51.626194: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-25 15:02:51.626221: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-25 15:02:51.626311: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-25 15:02:51.628159: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-25 15:02:52.052262: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:02:52.053275: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P5000 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 269.00GiB/s
2020-07-25 15:02:52.053323: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-25 15:02:52.053399: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:02:52.054286: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:02:52.055085: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-25 15:09:31.384926: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-25 15:09:31.384963: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-25 15:09:31.384970: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-25 15:09:31.385170: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:09:31.386138: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:09:31.387078: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:09:31.388006: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15097 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-25 15:09:31.402891: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-25 15:09:31.655548: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-25 15:09:31.659805: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 400034031 microseconds.
2020-07-25 15:09:31.659973: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-25 15:09:31.660196: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-25 15:09:31.663329: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-25 15:09:31.665554: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2.2.0-rc2-gpu (~7 minute delay)
root@psdzmtm12:~# docker run --gpus all -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -e CUDA_FORCE_PTX_JIT=1 -ti tensorflow/serving:2.2.0-rc2-gpu
2020-07-25 15:28:29.226466: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-25 15:28:29.226708: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-25 15:28:29.226746: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-25 15:28:29.327296: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-25 15:28:29.327344: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-25 15:28:29.327360: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-25 15:28:29.327391: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-25 15:28:29.327800: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-25 15:28:29.327835: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-25 15:28:29.327937: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-25 15:28:29.329698: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-25 15:28:29.754814: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:28:29.755736: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P5000 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 269.00GiB/s
2020-07-25 15:28:29.755766: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-25 15:28:29.755850: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:28:29.756774: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:28:29.757575: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-25 15:35:09.773927: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-25 15:35:09.773970: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-25 15:35:09.773979: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-25 15:35:09.774152: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:35:09.775177: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:35:09.776052: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:35:09.776901: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15099 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-25 15:35:09.791106: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-07-25 15:35:10.041748: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-25 15:35:10.046297: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 400718890 microseconds.
2020-07-25 15:35:10.046492: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-25 15:35:10.046690: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-25 15:35:10.050400: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-07-25 15:35:10.052637: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
2.2.0-rc2 (<1 second)
root@psdzmtm12:~# docker run --gpus all -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -e CUDA_FORCE_PTX_JIT=0 -ti tensorflow/serving:2.2.0-rc2-gpu
2020-07-25 15:38:22.689918: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-25 15:38:22.690201: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-25 15:38:22.690225: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-25 15:38:22.790850: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-25 15:38:22.790885: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-25 15:38:22.790902: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-25 15:38:22.790932: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-25 15:38:22.791429: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-25 15:38:22.791460: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-25 15:38:22.791550: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-25 15:38:22.793229: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-25 15:38:23.264489: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.265374: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P5000 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 269.00GiB/s
2020-07-25 15:38:23.265399: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-25 15:38:23.265454: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.266282: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.267116: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-25 15:38:23.853128: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-25 15:38:23.853183: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-25 15:38:23.853202: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-25 15:38:23.853355: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.854269: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.855080: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-25 15:38:23.855889: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15097 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-25 15:38:23.870467: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-07-25 15:38:24.125631: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-25 15:38:24.129947: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 1339003 microseconds.
2020-07-25 15:38:24.130108: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-25 15:38:24.130267: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-25 15:38:24.133763: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-25 15:38:24.136491: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
I found this statement in the release note of TensorFlow 2.3.0:
No longer includes PTX kernels for GPU except for sm_70 to reduce binary size. On systems with NVIDIA庐 Ampere GPUs (CUDA architecture 8.0) or newer, kernels are JIT-compiled from PTX and TensorFlow can take over 30 minutes to start up. This overhead can be limited to the first start up by increasing the default JIT cache size with: export CUDA_CACHE_MAXSIZE=2147483648.
Is there anything we can do to mitigate the overhead while using TensorFlow-Serving with docker?
@paperspace-philip As Christian said, we do ship SASS for 6.1 so the 7 minute delay is unlikely due to SASS JITting. Can you run with the env var TF_CPP_VMODULE set to gpu_device=3 and attach the logs?
@peakji if you're running with on a GPU for which TF does not ship SASS, one option is to build TF from source which will let you pick the GPUs TF needs to support. The TF pip you build this way should not have any SASS-JITting overhead.
2.2.0-gpu TF_CPPP_VMODULE=gpu_device=3
root@psjbsjjwn:/tmp/tfserving# docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -ti -e TF_CPP_VMODULE=gpu_device=3 tensorflow/serving:latest-gpu
2020-07-28 19:04:43.564455: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-28 19:04:43.564711: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-28 19:04:43.564732: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-28 19:04:43.665071: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-28 19:04:43.665104: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-28 19:04:43.665114: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-28 19:04:43.665142: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-28 19:04:43.665483: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-28 19:04:43.665507: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-28 19:04:43.665581: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-28 19:04:43.667490: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-28 19:04:43.685872: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:04:43.686650: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P4000 computeCapability: 6.1
coreClock: 1.48GHz coreCount: 14 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 226.62GiB/s
2020-07-28 19:04:43.686674: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-28 19:04:43.686723: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:04:43.687505: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:04:43.688128: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-28 19:12:04.832272: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1090] TensorFlow compiled with CUDA 10.1 and cuDNN 7.6.4
2020-07-28 19:12:04.832492: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-28 19:12:04.832528: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-28 19:12:04.832535: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-28 19:12:04.832791: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:12:04.833521: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1373] GPUDevice PlatformGpuId 0 TfGpuId 0 on bus 1 numa: 0 pci: 0000:00:05.0 DeviceLocality: bus_id: 1
links {
}
2020-07-28 19:12:04.833592: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:12:04.834173: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:12:04.834734: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7451 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-28 19:12:04.834786: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:249] Created stream[0] = 0x7fbe807ec820
2020-07-28 19:12:04.834812: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:268] Created host_to_device_stream[0] = 0x7fbe84f78090
2020-07-28 19:12:04.834824: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:273] Created device_to_host_stream[0] = 0x7fbe8664e310
2020-07-28 19:12:04.834854: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:289] Created device_to_device_stream[0] = 0x7fbe853cac70
2020-07-28 19:12:04.847821: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-28 19:12:04.995670: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995780: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995818: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995847: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995855: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995864: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:04.995874: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_1 op _Recv on GPU0 stream[0]
2020-07-28 19:12:04.995979: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_3 op _Recv on GPU0 stream[0]
2020-07-28 19:12:04.996025: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_5 op _Recv on GPU0 stream[0]
2020-07-28 19:12:04.996091: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.996154: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.996266: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign_2 op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.996282: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign_2 op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.996417: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign_1 op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.996544: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign_1 op Assign on GPU 0 stream[0]
2020-07-28 19:12:04.997040: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-28 19:12:05.000688: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000709: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000718: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000745: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000758: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000770: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:12:05.000784: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.000847: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.000880: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.000907: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.000922: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.000940: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:12:05.001130: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 441335988 microseconds.
2020-07-28 19:12:05.001259: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-28 19:12:05.001498: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-28 19:12:05.004447: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-28 19:12:05.007299: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2.2.0-rc2 TF_CPP_VMODULE=gpu_device=3
root@psjbsjjwn:/tmp/tfserving# docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -ti -e TF_CPP_VMODULE=gpu_device=3 tensorflow/serving:2.2.0-rc2-gpu
2020-07-28 19:28:03.949456: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-28 19:28:03.949958: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-28 19:28:03.949983: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-28 19:28:04.050402: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-28 19:28:04.050433: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-28 19:28:04.050448: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-28 19:28:04.050510: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-28 19:28:04.051038: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-28 19:28:04.051067: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-28 19:28:04.051299: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-28 19:28:04.053161: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-28 19:28:04.068502: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:04.069104: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P4000 computeCapability: 6.1
coreClock: 1.48GHz coreCount: 14 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 226.62GiB/s
2020-07-28 19:28:04.069129: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-28 19:28:04.069175: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:04.069738: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:04.070284: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-28 19:28:05.899471: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1090] TensorFlow compiled with CUDA 10.0 and cuDNN 7.4.1
2020-07-28 19:28:05.899722: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-28 19:28:05.899741: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-28 19:28:05.899756: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-28 19:28:05.899924: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:05.900600: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1373] GPUDevice PlatformGpuId 0 TfGpuId 0 on bus 1 numa: 0 pci: 0000:00:05.0 DeviceLocality: bus_id: 1
links {
}
2020-07-28 19:28:05.900671: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:05.901228: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-28 19:28:05.901812: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7451 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-28 19:28:05.901867: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:249] Created stream[0] = 0x7ff5f07aa740
2020-07-28 19:28:05.901892: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:268] Created host_to_device_stream[0] = 0x7ff5f07abd90
2020-07-28 19:28:05.901949: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:273] Created device_to_host_stream[0] = 0x7ff5f07abe30
2020-07-28 19:28:05.901965: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:289] Created device_to_device_stream[0] = 0x7ff5b166b1a0
2020-07-28 19:28:05.919546: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-07-28 19:28:06.070724: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.070884: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.070914: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.070940: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.070957: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.070973: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.071001: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_1 op _Recv on GPU0 stream[0]
2020-07-28 19:28:06.071212: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_3 op _Recv on GPU0 stream[0]
2020-07-28 19:28:06.071320: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.071400: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.071519: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign_1 op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.071617: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign_1 op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.071577: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:569] GpuDevice::ComputeAsync save/RestoreV2/_5 op _Recv on GPU0 stream[0]
2020-07-28 19:28:06.071810: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper save/Assign_2 op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.071841: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled save/Assign_2 op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.072410: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-28 19:28:06.076837: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076865: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076875: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076884: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076911: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076932: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a op VariableV2 on GPU 0 stream[0]
2020-07-28 19:28:06.076943: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper c/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077019: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled c/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077041: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper b/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077089: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled b/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077177: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:497] GpuDevice::ComputeHelper a/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077219: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:523] GpuDevice::ComputeHelper scheduled a/Assign op Assign on GPU 0 stream[0]
2020-07-28 19:28:06.077468: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 2026949 microseconds.
2020-07-28 19:28:06.077650: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-28 19:28:06.077823: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-28 19:28:06.081604: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-28 19:28:06.084028: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
@paperspace-philip it looks like the 2.2.0-rc2 branch is built against CUDA 10/cuDNN 7.4 while the 2.2.0 release is built against CUDA 10.1 / cuDNN 7.6. This could explain the discrepancy -- can you please verify this by building 2.2.0-rc2 against 10.1/7.6 as well?
The fact that it takes 7 minutes to start up TF is a problem regardless, but knowing this has to do with CUDA or cuDNN versions will meaningfully narrow down the problem.
I built the following the containers from Dockerfiles in master:
2.2.0 Dockerfile.devel-gpu w/ TF_SERVING_VERSION_GIT_COMMIT=22760f8 (2.2.0-rc2)
docker build -t tensorflow/serving:2.2.0-rc2-10-1-devel-gpu --build-arg TF_SERVING_VERSION_GIT_COMMIT=22760f8 -f tensorflow_serving/tools/docker/Dockerfile.devel-gpu .
2.2.0 Dockerfile.gpu w/ TF_SERVING_BUILD_IMAGE=2.2.0-rc2-10-1
docker build -t tensorflow/serving:2.2.0-rc2-10-1-gpu --build-arg TF_SERVING_BUILD_IMAGE=2.2.0-rc2-10-1 -f tensorflow_serving/tools/docker/Dockerfile.gpu .
Tensorflow Serving version
root@ps5d0rynt:~# docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -ti tensorflow/serving:2.2.0-rc2-10-1-gpu --version
TensorFlow ModelServer: 2.2.0-rc1+dev.sha.22760f84
TensorFlow Library: 2.2.0-rc3
Tensorflow output (~10 mins delay)
root@ps5d0rynt:~# docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -ti tensorflow/serving:2.2.0-rc2-10-1-gpu
2020-07-29 21:55:34.531848: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2020-07-29 21:55:34.532261: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-29 21:55:34.532392: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: half_plus_two
2020-07-29 21:55:34.632926: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2020-07-29 21:55:34.632976: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2020-07-29 21:55:34.632996: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2020-07-29 21:55:34.633103: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/half_plus_two/00000123
2020-07-29 21:55:34.633638: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-29 21:55:34.633665: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2020-07-29 21:55:34.633816: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-29 21:55:34.636975: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-29 21:55:34.657246: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 21:55:34.658270: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: Quadro P4000 computeCapability: 6.1
coreClock: 1.48GHz coreCount: 14 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 226.62GiB/s
2020-07-29 21:55:34.658298: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-07-29 21:55:34.658383: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 21:55:34.659305: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 21:55:34.660054: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-29 22:05:52.801206: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-29 22:05:52.801256: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-29 22:05:52.801266: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-29 22:05:52.801451: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 22:05:52.802468: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 22:05:52.803156: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 22:05:52.803804: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7451 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2020-07-29 22:05:52.823752: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-07-29 22:05:52.996340: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2020-07-29 22:05:53.003393: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 618370291 microseconds.
2020-07-29 22:05:53.003653: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2020-07-29 22:05:53.003874: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: half_plus_two version: 123}
2020-07-29 22:05:53.008771: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-29 22:05:53.012420: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
I was having the same problem with the very slow initialisation with 2.2.0-gpu.
I upgraded my image to 2.3.0-gpu and the problem is resolved.
Hi all,
TF serving 2.2.0 used a slightly different build configuration that caused it to JIT PTX at startup for newer GPUs. This issue has been fixed in TF serving 2.3 and you should not be seeing the same kind of startup delay. Given https://github.com/tensorflow/serving/issues/1663#issuecomment-683384880, it looks like the fix is working as intended, so we should close this issue.
@christisg I can't close this issue so assigning to you for closing.
Most helpful comment
latest-gpu ( ~7 minute delay)
2.2.0-rc2-gpu (~7 minute delay)
2.2.0-rc2 (<1 second)