While repeating the demo of BERT large
python3 builder.py -m /workspace/TensorRT/demo/BERT/models/fine-tuned/bert_tf_v2_large_fp16_128_v2/model.ckpt-8144 -o /workspace/TensorRT/demo/BERT/engines/bert_large_128.engine -b 1 -s 128 --fp16 -c /workspace/TensorRT/demo/BERT/models/fine-tuned/bert_tf_v2_large_fp16_128_v2
results in error:
[TensorRT] INFO: Using configuration file: /workspace/TensorRT/demo/BERT/models/fine-tuned/bert_tf_v2_large_fp16_128_v2/bert_config.json
[TensorRT] INFO: Found 394 entries in weight map
[TensorRT] ERROR: ../builder/cudnnBuilderGraphNodes.cpp (743) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)
[TensorRT] ERROR: ../builder/cudnnBuilderGraphNodes.cpp (743) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)
[TensorRT] INFO: build engine in 2.248 Sec
Traceback (most recent call last):
File "builder.py", line 691, in
main()
File "builder.py", line 682, in main
with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num) as engine:
AttributeError: __enter__
If removing '--fp16' fp32 build works without a problem.
Thanks!
TensorRT Version: 7.1.3.4 (tar installation)
GPU Type: V100 PCIE 16GB
Nvidia Driver Version: 450.36.06
CUDA Version: 11.0
CUDNN Version: 8
Operating System + Version: ubuntu 16.04 4.4.0-184-generic
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 1.15
PyTorch Version (if applicable): 1.6
Baremetal or Container (if container which image + tag): baremetal
I just followed the README in demo/BERT, except my environment is bare-metal not container.
Same problem,+10086
Same problem,+10087.
@srqj are you using master branch or the release/7.1 branch?
@srqj @MrRace @winself can you please confirm you are using TOT (specifically if you have this commit: https://github.com/NVIDIA/TensorRT/commit/4b918415339e4210013021616cbb571b1776959c)
@srqj are you using master branch or the
release/7.1branch?
I use the release/7.1
@srqj @MrRace @winself can you please confirm you are using TOT (specifically if you have this commit: 4b91841)
My builder.py :
@srqj are you using master branch or the
release/7.1branch?
Thanks for your reply, yes I'm using the master branch 6be3234.
@srqj @MrRace @winself can you please confirm you are using TOT (specifically if you have this commit: 4b91841)
Thanks for your reply! Yes, my branch (master 6be3234) contains the commit 4b91841.
@srqj @MrRace @winself
Add full_mask filed will pass the engine build step.
@ttyio @rajeevsrao Plz fix the bug.....
In embLayerNormPlugin.cpp, full_mask field is still used which is removed in 4b91841.
output_fp16 = trt.PluginField("output_fp16", np.array([1 if config.use_fp16 else 0]).astype(np.int32), trt.PluginFieldType.INT32)
full_mask = trt.PluginField("full_mask", np.array([1 if config.use_fp16 else 0]).astype(np.int32), trt.PluginFieldType.INT32)
mha_type = trt.PluginField("mha_type_id", np.array([get_mha_dtype(config)], np.int32), trt.PluginFieldType.INT32)
pfc = trt.PluginFieldCollection([wbeta, wgamma, wwordemb, wtokemb, wposemb, output_fp16, full_mask, mha_type])
fn = emln_plg_creator.create_plugin("embeddings", pfc)
Hello @fumihwh , thanks for helping.
Here is more details:
In release/7.1, the multi-head-attention plugin can accelerated using fused kernel (using tensorcore) when match condition:
the the full_mask is used for embLayerNorm plugin to generate mask that support the tensorcore layout. And we have a bug in the release/7.1 , we only check the first two condition in the plugin.
We found this issue after TRT 7.1 is released to devzone, before open source 7.1 release. To keep the consistence between dev zone binary binary release and open source release, so we have to remove the sm arch check in the builder.py to make the script work, but actually we observe accuracy loss in Volta.
And the commit 4b91841 is checked in after the opensource 7.1 release, it contains fixes to other plugins/samples. However, It introduced an incompatible fix to check the sm arch for builder.py in bert. This is the reason why you see the error. We will revert this part to fix the issue in release/7.1. But the accuracy drop is still there because we will run a unfused multi-head-attention kernel run with a tensorcore mask layout...
@srqj @MrRace @winself , to get the decent accuracy, please use release/7.2, now we deprecated the fragile field full_mask, we have mha_type_id field to specify the type of multi-head-attention, and the mask layout is computed in one place only (inside the plugin implementation). Also we have more specified sequence length that support tensorcore if you are using Turing+ chips. Could you take a try? thanks!
@ttyio
I've tried nvcr.io/nvidia/tensorrt:20.10-py3(TensorRT 7.2.1), it works for me.
@ttyio
I've tried nvcr.io/nvidia/tensorrt:20.10-py3(TensorRT 7.2.1), it works for me.
Thank you @fumihwh
I will close issue since we have policy to close issue with no activity for more than 3 weeks. Please reopen if you still have questions. Thanks!
Most helpful comment
@srqj @MrRace @winself
Add full_mask filed will pass the engine build step.
@ttyio @rajeevsrao Plz fix the bug.....
In
embLayerNormPlugin.cpp,full_maskfield is still used which is removed in 4b91841.