Tensor2tensor: using separate input/output vocabularies

Created on 4 Aug 2017 · 4Comments · Source: tensorflow/tensor2tensor

Hi all,
I was trying to keep the input/output vocabularies separated (to have an additional datapoint and check performances).

I took inspiration from wsj_parsing_tokens_16k - basically by using a similar token_generator function.
This technically worked, but when decoding it was clear to me that I was only using the input vocabulary.

So I also changed feature_encoders, to return a different vocabulary for encoder and decoder.
This works in data preparation (t2t-datagen), but it crashes in training because - I have a feeling - training is using the wrong dictionaries, and given one is shorter than the other, there is an out of index error:

ValueError: Variable symbol_modality_28_512/shared/weights_0 does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

Do you know if I am supposed to change some parameters to make this work?

Thanks,
Mirko

Source

mirkobronzi

Most helpful comment

Your crash, I believe, is due to this hyperparameter setting:
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L296

Try running with `--hparams="shared_embedding_and_softmax_weights=0"' -- then we're not forcing the model to share source and target weights, which is impossible if vocabularies are different (and so it crashes). I'm closing for now, but please reopen if the problem still appears!