Models: tutorials/rnn/translate Matrix multiplication error

Created on 1 Jan 2017 · 30Comments · Source: tensorflow/models

tensorflow/models/tutorials/rnn/translate/seq2seq_model.py

Hello everyone,

When I ran the translate model, I encountered the following issue:

File "translate.py", line 294, in main
train()
File "translate.py", line 153, in train
model = create_model(sess, False)
File "translate.py", line 132, in create_model
dtype=dtype)
File "/Users/richard_xiong/Documents/DeepLearningMaster/RNN/seq2seq_model.py", line 181, in __init__
softmax_loss_function=softmax_loss_function)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/seq2seq.py", line 1130, in model_with_buckets
softmax_loss_function=softmax_loss_function))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/seq2seq.py", line 1058, in sequence_loss
softmax_loss_function=softmax_loss_function))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/seq2seq.py", line 1022, in sequence_loss_by_example
crossent = softmax_loss_function(logit, target)
File "/Users/richard_xiong/Documents/DeepLearningMaster/RNN/seq2seq_model.py", line 117, in sampled_loss
num_classes=self.target_vocab_size),
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/nn.py", line 1412, in sampled_softmax_loss
name=name)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/nn.py", line 1219, in _compute_sampled_logits
inputs, sampled_w, transpose_b=True) + sampled_b
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
transpose_b=transpose_b, name=name)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2242, in create_op
set_shapes_for_outputs(ret)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1617, in set_shapes_for_outputs
shapes = shape_func(op)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1568, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,1024].

It seems it's the intrinsic matrix multiplication error in the function 'tf.nn.seq2seq.model_with_buckets()'

Does anyone have any ideas? Thank you!

bug

Source

richardxiong

Most helpful comment

I think this change that I just merged should solve the issue: https://github.com/tensorflow/models/pull/982. It looks like the sampled_loss method in seq2seq_model.py had its arguments reversed.

nealwu on 16 Mar 2017

👍4

All 30 comments

I encounter the same issue with python 3.4 and TensorFlow version 12.1. Any inputs?

bxshi on 3 Jan 2017

@richardxiong If you are using r0.12, try tensorflow/models/rnn/translate/ instead.

bxshi on 3 Jan 2017

@bxshi I'm using python 2.7 and both version 12.0 and 12.1 have the same issue. It seems the directory has already changed and the current folder has been moved to tensorflow/models/tutorial/rnn/translate/

Any ideas?

richardxiong on 3 Jan 2017

@richardxiong Ahh my bad. I did not realize that you already using the translate.py in the main repo instead of the one from the tensorflow/models.

bxshi on 3 Jan 2017

Hi Richard, can you try running it on the nightly build of TensorFlow? Let me know if that works better. A number of these models were updated with new code that may not work with r0.12 unfortunately.

nealwu on 6 Jan 2017

I have tried this with tensorflow/tensorflow’s master branch with no problem, but it does not work with r0.12 branch.

On Jan 5, 2017, at 8:47 PM, Neal Wu notifications@github.com wrote:

Hi Richard, can you try running it on the nightly build of TensorFlow? Let me know if that works better.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

bxshi on 6 Jan 2017

Hi Neal @nealwu
I tried run on the nightly build latest version, it works fine. One thing to note is the module of RNN cells are still in the contrib.rnn, instead of nn.rnn_cell (it seems it doesn't update the folder, but the code works).

Also as a reference @bxshi , thanks for the help!

richardxiong on 6 Jan 2017

@richardxiong I believe tf.contrib.rnn is the latest version of the code and tf.nn.rnn_cell is deprecated, at least in master.

nealwu on 6 Jan 2017

@nealwu Okay thanks for letting me know Neal!

richardxiong on 6 Jan 2017

@richardxiong I tried the code from directory tensorflow/models/tutorial/rnn/translate/, but still encountered the problems. Any ideas?

zhihuizheng on 17 Jan 2017

@zhihuizheng are you using the master branch of TensorFlow? the translate model does not compatible with version 0.12.

bxshi on 17 Jan 2017

@bxshi Oh no, I am using master+v0.12. How can I fix it?

zhihuizheng on 17 Jan 2017

@zhihuizheng, you can either use the nightly build (which I believe does not enable SSE) from

https://github.com/tensorflow/tensorflow/blob/master/README.md

Or you can first pip/pip3 uninstall tensorflow, and then compile TensorFlow under the master branch.

I think the command is

bazel build --copt=-march=native -c opt //tensorflow/tools/pip_package:build_pip_package

you can find more details on tensorflow.org

bxshi on 17 Jan 2017

FYI, here is the guide on installing from source: https://www.tensorflow.org/get_started/os_setup#installing_from_sources

nealwu on 17 Jan 2017

I am getting the same error. Can anyone explain smoothly how to get rid of it?
The error is:
ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_
loss/sequence_loss_by_example/sampled_softmax_loss/LogUniformCandidateSampler' (
op: 'LogUniformCandidateSampler') with input shapes: [?].

yashkumar6640 on 23 Feb 2017

getting the same error.......

gaojun4ever on 6 Mar 2017

Same problem for me. I tried using the nightly build version, but the problem there is that the "current" seq2seq API (as in version 1.0) is completely reworked (see this commit), which means I cannot use the nightly version without throwing away my current implementation.

@ebrevdo, what's the roadmap for the new seq2seq API? Is there already any documentation available for the new API?
@nealwu, do you know an "easy" work around for the problem until a newer version is available?

I hope it doesn't sound too demanding, but the problem is that I'm currently implementing the system for my bachelor thesis in tensorflow and this issue blocks me currently.

vongruenigen on 6 Mar 2017

For anyone who still get this bug: Line 1022 in tensorflow/python/ops/seq2seq.py, change softmax_loss_function(logit, target) to "softmax_loss_function(target, logit)".

Someone swapped the order of the arguments that function.

bangnk on 10 Mar 2017

use tensorflow v0.12.x and python 3.5.x, you wont get all these errors and everything will run smoothly except for corpus loading and utf- unicode error that can be solved easily

yashkumar6640 on 10 Mar 2017

Also experiencing the same problem, on python 2.7. Hopefully, recompiling might save me.

fjcamillo on 14 Mar 2017

Regarding the new seq2seq API, I just pushed the last of the basics: a new
rnncell decoder wrapper. Should be in master tomorrow. We'll be adding
more in the coming weeks but I hope we've now got feature parity with the
old legacy API.

On Mar 14, 2017 10:16 AM, "Jerhone" notifications@github.com wrote:

Also experiencing the same problem, on python 2.7. Hopefully, recompiling
might save me.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/836#issuecomment-286493727,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim-P7Y-J7PL3nAmndFz-zNjNDTUM8ks5rlstrgaJpZM4LYyWr
.

ebrevdo on 15 Mar 2017

(well, + dynamic decoding and scheduled sampling)

On Mar 14, 2017 10:41 PM, "Eugene Brevdo" ebrevdo@gmail.com wrote:

Regarding the new seq2seq API, I just pushed the last of the basics: a new
rnncell decoder wrapper. Should be in master tomorrow. We'll be adding
more in the coming weeks but I hope we've now got feature parity with the
old legacy API.

On Mar 14, 2017 10:16 AM, "Jerhone" notifications@github.com wrote:

Also experiencing the same problem, on python 2.7. Hopefully, recompiling
might save me.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/836#issuecomment-286493727,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim-P7Y-J7PL3nAmndFz-zNjNDTUM8ks5rlstrgaJpZM4LYyWr
.

ebrevdo on 15 Mar 2017

I think this change that I just merged should solve the issue: https://github.com/tensorflow/models/pull/982. It looks like the sampled_loss method in seq2seq_model.py had its arguments reversed.

nealwu on 16 Mar 2017

👍4

For anyone who is still facing this error you can change the following in seq2seq_model.py
line 103 from def sampled_loss(inputs, labels): to def sampled_loss(labels, inputs):

My current settings are:

theano: 0.8.2
tensorflow: 1.0.0
Using TensorFlow backend.
keras: 1.2.2

gerarq on 20 Mar 2017

🎉2

@gerarq looks like you are reversing that change I linked to above. Why does that fix things for you? According to https://www.tensorflow.org/api_docs/python/tf/contrib/legacy_seq2seq/model_with_buckets, softmax_loss_function should take inputs first and labels second.

nealwu on 20 Mar 2017

It looks like the problem is with our documentation. https://www.tensorflow.org/api_docs/python/tf/contrib/legacy_seq2seq/sequence_loss_by_example suggests the other order.

nealwu on 20 Mar 2017

I believe this should be fixed now via https://github.com/tensorflow/models/pull/1226. If you run into further issues let me know.

nealwu on 20 Mar 2017

Thanks for taking care of that!

gerarq on 21 Mar 2017

I was facing the same issue.
@gerarq Thanks for the quick fix!