Hi all,
I was trying to keep the input/output vocabularies separated (to have an additional datapoint and check performances).
I took inspiration from wsj_parsing_tokens_16k - basically by using a similar token_generator function.
This technically worked, but when decoding it was clear to me that I was only using the input vocabulary.
So I also changed feature_encoders, to return a different vocabulary for encoder and decoder.
This works in data preparation (t2t-datagen), but it crashes in training because - I have a feeling - training is using the wrong dictionaries, and given one is shorter than the other, there is an out of index error:
ValueError: Variable symbol_modality_28_512/shared/weights_0 does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
Do you know if I am supposed to change some parameters to make this work?
Thanks,
Mirko
Check the wmt_zhen_tokens_8k example with separate source and target vocabulary.
Your crash, I believe, is due to this hyperparameter setting:
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L296
Try running with `--hparams="shared_embedding_and_softmax_weights=0"' -- then we're not forcing the model to share source and target weights, which is impossible if vocabularies are different (and so it crashes). I'm closing for now, but please reopen if the problem still appears!
Thanks @lukaszkaiser !
that fixed it.
Check the wmt_zhen_tokens_8k example with separate source and target vocabulary.
This reply solves my problem, thank you :)
Most helpful comment
Your crash, I believe, is due to this hyperparameter setting:
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L296
Try running with `--hparams="shared_embedding_and_softmax_weights=0"' -- then we're not forcing the model to share source and target weights, which is impossible if vocabularies are different (and so it crashes). I'm closing for now, but please reopen if the problem still appears!