Tensorrt: RNN's output is different from tensorflow lstm

Created on 6 Feb 2020 · 4Comments · Source: NVIDIA/TensorRT

Description

Hello, Thanks for developing TensorRT Engine. I saw something strange outputs from RNNv2 layer.

It's really strange. I'm working with TRT python api to convert tf seq2seq model. My sequence_length of rnn is 33. The output shape is (batch_size, 33, 512). The first output [0, 0,:] was the same as the output [0,0 ,:] of tensorflow. However, values started to differ from the next sequence.

The rnn input of INetwork have a same value as the rnn input of tensorflow.

The average errors of 33 time-steps are as follows.

for i in range(33): 
    print(i, np.mean(np.abs(inet_out[i, :] - tf_out[i, :])))

```
0 1.2609632e-07 <== wow ...!
1 0.0015718038
2 0.010808524
3 0.010692278
4 0.012011731
5 0.016582778
6 0.021841913
7 0.026234297
8 0.028343435
9 0.027489826
10 0.023572126
11 0.026155617
12 0.029127385
13 0.036912326
14 0.04262212
15 0.038910516
16 0.027831417
17 0.024144478
18 0.022921633
19 0.02656816
20 0.033705484
21 0.040616006
22 0.039923944
23 0.030223113
24 0.029302634
25 0.030033736
26 0.03281911
27 0.041052207
28 0.041627046
29 0.035327498
30 0.036981985
31 0.03735178
32 0.030929128

<!-- A clear and concise description of the bug or issue. -->


## Environment

**TensorRT Version**:  7.0.0.11
**GPU Type**: T4
**Nvidia Driver Version**: 440.33
**CUDA Version**: 10.2.89
**CUDNN Version**: 7.6.5
**Operating System + Version**: Ubuntu 18.04
**Python Version (if applicable)**: 3.6
**TensorFlow Version (if applicable)**: 1.14.0 


## Relevant Files
TF RNN
```python
    cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=kernel_initializer)

    outputs_fw, state_fw = tf.nn.dynamic_rnn(
        cell_fw, bottom_sequence,
        sequence_length=sequence_length,
        time_major=True,
        dtype=tf.float32,
        scope=scope + '/fw')

    return outputs_fw

INetworkDefinition

def encoder(net, w_dict, cnn_out):
    permute_layer = net.add_shuffle(cnn_out)
    permute_layer.reshape_dims = trt.Dims3(0, 512, 33)  # (-1, 512, 33)
    permute_layer.second_transpose = trt.Permutation([0, 2, 1])  # (-1, 33, 512)

    bilstm_enc = net.add_rnn_v2(permute_layer.get_output(0),
                                layer_count=1,
                                hidden_size=512,
                                max_seq_length=33,
                                op=trt.RNNOperation.LSTM)
    bilstm_enc.direction = trt.RNNDirection.UNIDIRECTION
    bilstm_enc.input_mode = trt.RNNInputMode.LINEAR

In my opinion, the weight conversion seems to be correct since the first time-step output came out correctly.

Any Ideas..?

Happy to see you again Ryan :p

Python TensorFlow RNN bug

Source

dhkim0225

All 4 comments

Hi @dhkim0225 ,

Sorry for the delay, I'll start looking into this tomorrow.

rmccorm4 on 13 Feb 2020

Yup, always, thanks a lot !! :)

dhkim0225 on 13 Feb 2020

🚀1

Hi @dhkim0225 ,

Would it be possible for you to make a similar repro but comparing TensorRT RNNv2 output to TF's cudnnLSTM op (https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/cudnn_rnn/CudnnLSTM), and see if that still has the issue?

rmccorm4 on 14 Feb 2020

@rmccorm4

Sorry for late reply.
Finally, I solved a problem.
It was because of forget_bias.

I didn't understand following statement in tensorrt documentation.