Practical-pytorch: dot multiplication go wrong in pytorch 0.2 in attention module

Created on 17 Aug 2017 · 12Comments · Source: spro/practical-pytorch

there is some problem regarding to the attn module

energy = self.attn(encoder_output)
energy = hidden.dot(energy)

it seems dot function in pytorch 0,2 only support vector

Source

SB233

Most helpful comment

you can try:
energy = torch.squeeze(hidden).dot(torch.squeeze(energy))

SeanLee97 on 19 Sep 2017

👍3 👎1

All 12 comments

I have the same issue. Here is the stack trace

RuntimeError                              Traceback (most recent call last)
<ipython-input-21-153451c5590c> in <module>()
      9 
     10     # Run the train function
---> 11     loss = train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
     12 
     13     # Keep track of loss

<ipython-input-17-9703d5331834> in train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length)
     31         # Teacher forcing: Use the ground-truth target as the next input
     32         for di in range(target_length):
---> 33             decoder_output, decoder_context, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_context, decoder_hidden, encoder_outputs)
     34             loss += criterion(decoder_output[0], target_variable[di])
     35             decoder_input = target_variable[di] # Next target is next input

/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-15-1e8710146be2> in forward(self, word_input, last_context, last_hidden, encoder_outputs)
     30 
     31         # Calculate attention from current RNN state and all encoder outputs; apply to encoder outputs
---> 32         attn_weights = self.attn(rnn_output.squeeze(0), encoder_outputs)
     33         context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x 1 x N
     34 

/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-14-3c700c5b6bb1> in forward(self, hidden, encoder_outputs)
     22         # Calculate energies for each encoder output
     23         for i in range(seq_len):
---> 24             attn_energies[i] = self.score(hidden, encoder_outputs[i])
     25 
     26         # Normalize energies to weights in range 0 to 1, resize to 1 x 1 x seq_len

<ipython-input-14-3c700c5b6bb1> in score(self, hidden, encoder_output)
     35         elif self.method == 'general':
     36             energy = self.attn(encoder_output)
---> 37             energy = hidden.dot(energy)
     38             return energy
     39 

/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py in dot(self, other)
    629 
    630     def dot(self, other):
--> 631         return Dot.apply(self, other)
    632 
    633     def _addcop(self, op, args, inplace):

/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/blas.py in forward(ctx, vector1, vector2)
    209         ctx.save_for_backward(vector1, vector2)
    210         ctx.sizes = (vector1.size(), vector2.size())
--> 211         return vector1.new((vector1.dot(vector2),))
    212 
    213     @staticmethod

RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /pytorch/torch/csrc/generic/TensorMethods.cpp:23020

torch=(0.2.0.post1)

RobRomijnders on 26 Aug 2017

you can try:
energy = torch.squeeze(hidden).dot(torch.squeeze(energy))

SeanLee97 on 19 Sep 2017

👍3 👎1

You can also use mm() if you make sure both items have 2 dimensions

dhpollack on 28 Sep 2017

👍1

Thanks for this. It looks like PyTorch devteam removed implicit flattening of matrices for dot product, which is what causes this glitch. Here's the discussion:

https://github.com/pytorch/pytorch/issues/2313

cooganb on 22 Oct 2017

@dhpollack can you explain mm() ?

cooganb on 22 Oct 2017

The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.

But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.

dhpollack on 22 Oct 2017

I'm wondering about speed, is there an easy way to invert one of the
matrices? If it's two nx1 matrix, shouldn't that be a quick fix?

On Sun, Oct 22, 2017 at 5:41 PM, David Pollack notifications@github.com
wrote:

The mm() function is normal matrix multiplication of 2d matrices. So if A
is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3
matrix.

But ultimately I think that bmm() should be used because it's the same
thing but allows for batches. I reworked the example on my computer. I'll
post a snippet tomorrow.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spro/practical-pytorch/issues/51#issuecomment-338511447,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHxGKTe-SB32IMCHxH6SbUu9L_l1hfnvks5su7aRgaJpZM4O5_Km
.

cooganb on 23 Oct 2017

yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class

https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b651b#file-pytorch_attention_audio-py-L314

dhpollack on 23 Oct 2017

thanks so much for this snippet, super helpful to see.

so, just to be clear, your method reshapes the matrix, then conducts batch
matrix multiplication, correct? that's what I previously meant by
invert--should have said transpose!

On Mon, Oct 23, 2017 at 7:01 AM, David Pollack notifications@github.com
wrote:

yes, that is an easy fix, but it's more efficient to avoid for loops. I
was playing with this code for a different application, but you can see
below how I don't have for loops in the Attn class

https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b65
1b#file-pytorch_attention_audio-py-L314

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spro/practical-pytorch/issues/51#issuecomment-338623728,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHxGKRhT-t2SQdc-ssYzu3BCwQ5GB-D1ks5svHITgaJpZM4O5_Km
.

cooganb on 23 Oct 2017

yes, you could transpose a Nx1 vector and then use the for loop.

dhpollack on 23 Oct 2017

@dhpollack

Last question, promise: is there a major difference between your solution and using torch.squeeze(vector1).dot(torch.squeeze(vector2) ?

cooganb on 23 Oct 2017

I was attempting to do all the multiplications in one shot and trying to avoid using squeeze/unsqueeze/view/cat operations as much as possible. I think avoiding those should make things faster but I haven't tested it.

dhpollack on 23 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings