Tensorrt: BERT Building Error inside a container.

Created on 26 Aug 2019 · 14Comments · Source: NVIDIA/TensorRT

While building a Bert fine tune mode from the example, I encounterd the Cuda failure: 209, Aborted (core dumped).

OS: ubuntu 16.04
CUDA: 10.0
cudnn: 7.6.3
TensorRT: 5.1.5

This is the output of building the BERT fine-tuned model (downloaded),
with the building instruction is:

python python/bert_builder.py -m /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2/model.ckpt-8144 -o bert-base-384.engine -b 1 -s 384 -c /workspace/models/fine-tuned/bert_tf_v2_base_fp16_384_v2

[TensorRT] INFO: Applying generic optimizations to the graph for inference.
[TensorRT] INFO: Original: 99 layers
[TensorRT] INFO: After dead-layer removal: 98 layers
[TensorRT] INFO: After scale fusion: 98 layers
[TensorRT] INFO: After vertical fusions: 98 layers
[TensorRT] INFO: After swap: 98 layers
[TensorRT] INFO: After final dead-layer removal: 98 layers
[TensorRT] INFO: After tensor merging: 98 layers
[TensorRT] INFO: After concat removal: 98 layers
[TensorRT] INFO: Graph construction and optimization completed in 0.00269389 seconds.
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.03456
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.042368
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.025376
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](15)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](6)
[TensorRT] INFO: Tactic 0 time 0.721664
[TensorRT] INFO: Tactic 1 time 0.677728
[TensorRT] INFO: --------------- Chose 6 (1)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](15)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 1) [Fully Connected](6)
[TensorRT] INFO: Tactic 4 time 0.88864
[TensorRT] INFO: Tactic 8 time 0.898816
[TensorRT] INFO: Tactic 5 time 1.05504
[TensorRT] INFO: Tactic 9 time 1.05827
[TensorRT] INFO: --------------- Chose 6 (4)
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.093056
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing <reformat>(9)
[TensorRT] INFO: Tactic 0 time 0.056096
[TensorRT] INFO: 
[TensorRT] INFO: --------------- Timing (Unnamed Layer* 2) [PluginV2Ext](34)
Cuda failure: 209
Aborted (core dumped)

Samples hardware question

Source

jageshmaharjan

👍2

Most helpful comment

May I know how did you resolve the error? I am getting the same error. I am building on P100

Thanks

Darshcg on 14 Jul 2020

👍2

All 14 comments

What is your hardware configuration?

moconnor725 on 26 Aug 2019

This is the GPU spec
```
description: 3D controller
product: GK210GL [Tesla K80]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@86e4:00:00.0
version: a1
width: 64 bits
clock: 33MHz

And Cpu specs

Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping: 2
CPU MHz: 2596.995
BogoMIPS: 5193.99
```

Also, I tried with several parameters while building the bert_engine such as batch size and sequemce length.

jageshmaharjan on 26 Aug 2019

The BERT sample is set up to build on SM 70 and SM 75. Adjusting the CMAKE_CUDA_FLAGS in CMakeLists.txt to target your device may help.

DilipSequeira on 5 Sep 2019

👍1

As an update to my earlier comment, the sample uses fp16 intrinsics, and so will not build on K80

DilipSequeira on 5 Sep 2019

Closing this because this was already solved.

jageshmaharjan on 15 Sep 2019

May I know how did you resolve the error? I am getting the same error. I am building on P100

Thanks

Darshcg on 14 Jul 2020

👍2

@jageshmaharjan Could you share how you resolved this? I am getting this exact error and I tried editing CMAKE_CUDA_FLAGS in CMakeLists.txt to target my GTX 1080Ti GPUs, but still no luck.

tripti-singhal on 14 Jul 2020

Hi @Darshcg @tripti-singhal ,

Per Dilip's earlier comment and the BERT demo requirements section (https://github.com/NVIDIA/TensorRT/tree/master/demo/BERT#requirements), I believe the BERT sample will only work on Compute Capability >= 7.0 (Volta, Turing, Ampere+).

@Darshcg P100 (Pascal) is Compute Capability 6.0
@tripti-singhal 1080 Ti (Pascal) is Compute Capability 6.1

rmccorm4 on 14 Jul 2020

Hi @rmccorm4,

Thank you for your response. May I know what to change in the below for P100?
set(CMAKE_CUDA_FLAGS “${CMAKE_CUDA_FLAGS} –expt-relaxed-constexpr –expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -03”)

Thank you

Darshcg on 14 Jul 2020

Hi @Darshcg ,

The custom plugins (CUDA kernels) used in this sample and the models associated with it depend on newer hardware features (SM >= 70), and will not run on P100.

However, it should be possible to grab a standard TF / PyTorch BERT reference implementation somewhere and export it to ONNX (preferably ONNX opset 11) with tf2onnx/torch.onnx.export, and convert that to TensorRT 7.1

That reference model will likely be slower than this sample's model after converted to TensorRT, but should still be functional.

rmccorm4 on 15 Jul 2020

Hi @rmccorm4,
This Post(https://github.com/NVIDIA/TensorRT/issues/106#issuecomment-528265655) says that it works for P100.
And also it worked for Tesla k80 also according to https://github.com/NVIDIA/TensorRT/issues/99#issuecomment-531546806

Darshcg on 16 Jul 2020

@Darshcg are you running the sample from master branch? Or from an earlier* branch? That comment was seems to be from the TensorRT 5.1 version of the sample

rmccorm4 on 16 Jul 2020

@rmccorm4, I followed this (https://developer.nvidia.com/blog/nlu-with-tensorrt-bert/) exact same. While building an Engine it is throwing the error.

Darshcg on 16 Jul 2020

@rmccorm4, when I followed https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt repo the engine was built perfectly. but while inferencing it was throwing the same error.

With TRT7, the engine is being Built on P100, but while running inference it is throwing exact same error.

@rmccorm4, I think I should try with TRT 6 on P100.

nvcr.io/nvidia/tensorrt:19.05-py3: Container with TRT v5.1 with CUDA 10.1.
nvcr.io/nvidia/tensorrt:20.05-py3: Container with TRT v7.0.0.1 with CUDA 10.2
But what about TRTv6 conatiner?

Darshcg on 16 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings