there is some problem regarding to the attn module
energy = self.attn(encoder_output)
energy = hidden.dot(energy)
it seems dot function in pytorch 0,2 only support vector
I have the same issue. Here is the stack trace
RuntimeError Traceback (most recent call last)
<ipython-input-21-153451c5590c> in <module>()
9
10 # Run the train function
---> 11 loss = train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
12
13 # Keep track of loss
<ipython-input-17-9703d5331834> in train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length)
31 # Teacher forcing: Use the ground-truth target as the next input
32 for di in range(target_length):
---> 33 decoder_output, decoder_context, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_context, decoder_hidden, encoder_outputs)
34 loss += criterion(decoder_output[0], target_variable[di])
35 decoder_input = target_variable[di] # Next target is next input
/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
<ipython-input-15-1e8710146be2> in forward(self, word_input, last_context, last_hidden, encoder_outputs)
30
31 # Calculate attention from current RNN state and all encoder outputs; apply to encoder outputs
---> 32 attn_weights = self.attn(rnn_output.squeeze(0), encoder_outputs)
33 context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x 1 x N
34
/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
<ipython-input-14-3c700c5b6bb1> in forward(self, hidden, encoder_outputs)
22 # Calculate energies for each encoder output
23 for i in range(seq_len):
---> 24 attn_energies[i] = self.score(hidden, encoder_outputs[i])
25
26 # Normalize energies to weights in range 0 to 1, resize to 1 x 1 x seq_len
<ipython-input-14-3c700c5b6bb1> in score(self, hidden, encoder_output)
35 elif self.method == 'general':
36 energy = self.attn(encoder_output)
---> 37 energy = hidden.dot(energy)
38 return energy
39
/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py in dot(self, other)
629
630 def dot(self, other):
--> 631 return Dot.apply(self, other)
632
633 def _addcop(self, op, args, inplace):
/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/blas.py in forward(ctx, vector1, vector2)
209 ctx.save_for_backward(vector1, vector2)
210 ctx.sizes = (vector1.size(), vector2.size())
--> 211 return vector1.new((vector1.dot(vector2),))
212
213 @staticmethod
RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /pytorch/torch/csrc/generic/TensorMethods.cpp:23020
torch=(0.2.0.post1)
you can try:
energy = torch.squeeze(hidden).dot(torch.squeeze(energy))
You can also use mm() if you make sure both items have 2 dimensions
Thanks for this. It looks like PyTorch devteam removed implicit flattening of matrices for dot product, which is what causes this glitch. Here's the discussion:
@dhpollack can you explain mm() ?
The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.
But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.
I'm wondering about speed, is there an easy way to invert one of the
matrices? If it's two nx1 matrix, shouldn't that be a quick fix?
On Sun, Oct 22, 2017 at 5:41 PM, David Pollack notifications@github.com
wrote:
The mm() function is normal matrix multiplication of 2d matrices. So if A
is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3
matrix.But ultimately I think that bmm() should be used because it's the same
thing but allows for batches. I reworked the example on my computer. I'll
post a snippet tomorrow.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spro/practical-pytorch/issues/51#issuecomment-338511447,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHxGKTe-SB32IMCHxH6SbUu9L_l1hfnvks5su7aRgaJpZM4O5_Km
.
yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class
https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b651b#file-pytorch_attention_audio-py-L314
thanks so much for this snippet, super helpful to see.
so, just to be clear, your method reshapes the matrix, then conducts batch
matrix multiplication, correct? that's what I previously meant by
invert--should have said transpose!
On Mon, Oct 23, 2017 at 7:01 AM, David Pollack notifications@github.com
wrote:
yes, that is an easy fix, but it's more efficient to avoid for loops. I
was playing with this code for a different application, but you can see
below how I don't have for loops in the Attn classhttps://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b65
1b#file-pytorch_attention_audio-py-L314—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spro/practical-pytorch/issues/51#issuecomment-338623728,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHxGKRhT-t2SQdc-ssYzu3BCwQ5GB-D1ks5svHITgaJpZM4O5_Km
.
yes, you could transpose a Nx1 vector and then use the for loop.
@dhpollack
Last question, promise: is there a major difference between your solution and using torch.squeeze(vector1).dot(torch.squeeze(vector2) ?
I was attempting to do all the multiplications in one shot and trying to avoid using squeeze/unsqueeze/view/cat operations as much as possible. I think avoiding those should make things faster but I haven't tested it.
Most helpful comment
you can try:
energy = torch.squeeze(hidden).dot(torch.squeeze(energy))