Tensorrt: BERT Building Error inside a container.

Created on 26 Aug 2019  ·  14Comments  ·  Source: NVIDIA/TensorRT

While building a Bert fine tune mode from the example, I encounterd the Cuda failure: 209, Aborted (core dumped).

OS: ubuntu 16.04
CUDA: 10.0
cudnn: 7.6.3
TensorRT: 5.1.5

This is the output of building the BERT fine-tuned model (downloaded),
with the building instruction is:

python python/bert_builder.py -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert-base-384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2

[TensorRT] INFO: Applying generic optimizations to the graph for inference.
[TensorRT] INFO: Original: 99 layers
[TensorRT] INFO: After dead-layer removal: 98 layers
[TensorRT] INFO: After scale fusion: 98 layers
[TensorRT] INFO: After vertical fusions: 98 layers
[TensorRT] INFO: After swap: 98 layers
[TensorRT] INFO: After final dead-layer removal: 98 layers
[TensorRT] INFO: After tensor merging: 98 layers
[TensorRT] INFO: After concat removal: 98 layers
[TensorRT] INFO: Graph construction and optimization completed in 0.00269389 seconds.
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.03456
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.042368
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.025376
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](15)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](6)
[TensorRT] INFO: Tactic 0 time 0.721664
[TensorRT] INFO: Tactic 1 time 0.677728
[TensorRT] INFO: --------------- Chose 6 (1)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](15)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](6)
[TensorRT] INFO: Tactic 4 time 0.88864
[TensorRT] INFO: Tactic 8 time 0.898816
[TensorRT] INFO: Tactic 5 time 1.05504
[TensorRT] INFO: Tactic 9 time 1.05827
[TensorRT] INFO: --------------- Chose 6 (4)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.093056
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.056096
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 2) [PluginV2Ext](34)
Cuda failure: 209
Aborted (core dumped)

Samples hardware question

Most helpful comment

May I know how did you resolve the error? I am getting the same error. I am building on P100

Thanks

All 14 comments

What is your hardware configuration?

This is the GPU spec
```
description: 3D controller
product: GK210GL [Tesla K80]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@86e4:00:00.0
version: a1
width: 64 bits
clock: 33MHz

And Cpu specs

Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 2596.995
BogoMIPS: 5193.99
```

Also, I tried with several parameters while building the bert_engine such as batch size and sequemce length.

The BERT sample is set up to build on SM 70 and SM 75. Adjusting the CMAKE_CUDA_FLAGS in CMakeLists.txt to target your device may help.

As an update to my earlier comment, the sample uses fp16 intrinsics, and so will not build on K80

Closing this because this was already solved.

May I know how did you resolve the error? I am getting the same error. I am building on P100

Thanks

@jageshmaharjan Could you share how you resolved this? I am getting this exact error and I tried editing CMAKE_CUDA_FLAGS in CMakeLists.txt to target my GTX 1080Ti GPUs, but still no luck.

Hi @Darshcg @tripti-singhal ,

Per Dilip's earlier comment and the BERT demo requirements section (https://github.com/NVIDIA/TensorRT/tree/master/demo/BERT#requirements), I believe the BERT sample will only work on Compute Capability >= 7.0 (Volta, Turing, Ampere+).

@Darshcg P100 (Pascal) is Compute Capability 6.0
@tripti-singhal 1080 Ti (Pascal) is Compute Capability 6.1

Hi @rmccorm4,

Thank you for your response. May I know what to change in the below for P100?
set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS} –expt-relaxed-constexpr –expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -03”)

Thank you

Hi @Darshcg ,

The custom plugins (CUDA kernels) used in this sample and the models associated with it depend on newer hardware features (SM >= 70), and will not run on P100.

However, it should be possible to grab a standard TF / PyTorch BERT reference implementation somewhere and export it to ONNX (preferably ONNX opset 11) with tf2onnx/torch.onnx.export, and convert that to TensorRT 7.1

That reference model will likely be slower than this sample's model after converted to TensorRT, but should still be functional.

Hi @rmccorm4,
This Post(https://github.com/NVIDIA/TensorRT/issues/106#issuecomment-528265655) says that it works for P100.
And also it worked for Tesla k80 also according to https://github.com/NVIDIA/TensorRT/issues/99#issuecomment-531546806

@Darshcg are you running the sample from master branch? Or from an earlier* branch? That comment was seems to be from the TensorRT 5.1 version of the sample

@rmccorm4, I followed this (https://developer.nvidia.com/blog/nlu-with-tensorrt-bert/) exact same. While building an Engine it is throwing the error.

@rmccorm4, when I followed https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt repo the engine was built perfectly. but while inferencing it was throwing the same error.

With TRT7, the engine is being Built on P100, but while running inference it is throwing exact same error.

@rmccorm4, I think I should try with TRT 6 on P100.

nvcr.io/nvidia/tensorrt:19.05-py3: Container with TRT v5.1 with CUDA 10.1.
nvcr.io/nvidia/tensorrt:20.05-py3: Container with TRT v7.0.0.1 with CUDA 10.2
But what about TRTv6 conatiner?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

prathik-naidu picture prathik-naidu  ·  3Comments

mhansinger picture mhansinger  ·  4Comments

Xianqi-Zhang picture Xianqi-Zhang  ·  5Comments

MachineJeff picture MachineJeff  ·  5Comments

aininot260 picture aininot260  ·  3Comments