Tensorrt: Is there any smart way to make one-hot encoding with python api?

Created on 24 Jan 2020 · 7Comments · Source: NVIDIA/TensorRT

Description

I defined a seq2seq model using python api. My attention decoder needs to perform one-hot encoding every 1 timestep for next time-step input.

Initially, I created an identity matrix tensor and used gather layer. The shape tensor was obtained using the topk layer (k = 1). However, since there are more than 7000 classes, memory error is occured when building the ICduaEngine. The workspace was set to 14GB. And the decoder's max timestep is 25.

Is there any smart way to make one-hot encoding with tensorrt python api?
[Please support assign/scatter operation... T^T]

My dumb code is following ...

def one_hot(net, prev_logit, i_mat):
    # prev_logit => (-1, 7021, 1, 1)
    topk_layer = net.add_topk(input=prev_logit,
                              op=trt.TopKOperation.MAX,
                              k=1,
                              axes=(1 << 1))

    reshape_pred_idx = net.add_shuffle(topk_layer.get_output(1))
    reshape_pred_idx.reshape_dims = trt.Dims([-1])
    one_hot = net.add_gather(input=i_mat,
                             indices=reshape_pred_idx.get_output(0),
                             axis=0)

P.S. I don't know why am I have to use axes like above. I thought "axes=(1<<0)" is right, but "axes=(1<<1)" works with a channel axis. (python api documentation said axes=(1<<0) is right)

P.S.2 This is just my guess. So please let me know if I'm wrong.
Memory consumption for i_mat (my dumb one-hot encoding, num_class => 7021)
7021 x 7021 x 4 = 197177764 (byte) (== 188 MB)

Since my decoder max_length==25, so memory consumption for i_mat is
188 * 25 = 4700 MB

Environment

TensorRT Version: 7.0.0.11
GPU Type: T4
Nvidia Driver Version: 440.33
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6

Python question

Source

dhkim0225

All 7 comments

@pranavm-nvidia any idea on this?

rmccorm4 on 25 Jan 2020

Well, I'm currently try to make it with iloop layer + islicelayer +i concat layer.
Please give me opinions whether this is good or not.

Pseudocode

zeros_src = trt zeros constant (7021 length)
one_src = trt 1 constant (1 length)
idx = one-hot idx

res = list()
For i in batch:  # (iloop layer)
    a = zeros_src[:idx-1]
    b = one_src
    c = zeros_src[idx:]
    res.append(Concat([a, b, c]))

dhkim0225 on 26 Jan 2020

Above idea doesn't work for me since I use dynamic batch size... :(

IloopLayer is for static loop similar to tf.while_loop as you know.

dhkim0225 on 26 Jan 2020

Might be able to use the EQUAL elementwise op for this.
Generate a sequence tensor:
x = [0, 1, 2, 3, 4, ..., N]
and then:
x == index gives you a one-hot vector.

Also, out of curiosity, in your original implementation, why do you need one identity matrix for each timestep? Are they different?

pranavm-nvidia on 27 Jan 2020

🎉1

@rmccorm4 Long time no see :)
@pranavm-nvidia Thank you for your reply, and I'll try it now.

About your question, actually I don't know the exact reason why it is used, since I didn't train this network, but I can guess about the reason.

One-hot vector goes into embedding layer after produce.
Maybe, it's because the embedding layer is not well-trained in training logic. I think that performing one-hot encoding before embedding layer will have a better result if embedding layer makes 'dumb' manifold.

It's just my guess :/ . I'll ask to model-maker(?) about it after a week later (next meeting)

dhkim0225 on 28 Jan 2020

👍1

Sorry. This was a stupid question.

The embedding layer originally collects vectors via gather() function, which can be implemented simply using IGatherLayer in tensorrt.

The model I am switching to tensorrt implements the embedding layer using fullyconnected layer.
Since one-hot encoding is done before forward fully_connected layer, I could solve the problem by adding IGatherLayer to the fully_connected weight constant.

Pseudocode

src = net.add_constant(weight=fc_weight)
embedded = net.add_gather(fc_weight, indices)

So, one-hot encoding was just for embedding layer.

dhkim0225 on 30 Jan 2020

🎉1

Glad it worked out @dhkim0225, please close if your issue is resolved :slightly_smiling_face:

rmccorm4 on 30 Jan 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Some bugs in "ICaffeParser"

yflv-yanxia · 3Comments

RNN's output is different from tensorflow lstm

dhkim0225 · 4Comments

Could not find 0.pgm in data directories:

sbbug · 5Comments

How to build TensorRT parsers?

peijason · 3Comments

How to make large batch inferences

float123 · 6Comments