Tensor2tensor: Language Modeling Does Not Sample (even with sampling_method=random)

Created on 22 Jun 2018 · 6Comments · Source: tensorflow/tensor2tensor

Description

We tried running language modeling with languagemodel_ptb10k and the transformer_small as recommended in the README. No errors / tensorboard training curves looked fine, but the decoder output is something like: "the the the the the the the" (and identical every time).

We looked through the code and found --hparams='sampling_method=random', but it still seems to be argmaxing instead of sampling (or maybe something else is wrong?). We have also tried with languagemodel_ptb_characters and with transformer_base and attention_lm with similar results (no sampling, same degenerate output every time).

Is there something flag that we are missing? Code below.

Thanks for the help in advance!

...

Environment information

OS:  Ubuntu 14.04

$ pip freeze | grep tensor
tensor2tensor==1.6.5
tensorboard==1.8.0
tensorflow==1.8.0

$ python -V
# Python 3.6.5 :: Anaconda, Inc.

For bugs: reproduction and error logs

Steps to reproduce:

...


PROBLEM=languagemodel_ptb10k
MODEL=transformer
HPARAMS=transformer_small

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --hparams='sampling_method=random' \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=input.txt \ 
  --decode_to_file=output.txt

input.txt is a blank file with a dozen empty lines

bug

Source

hughbzhang

All 6 comments

if you trace the hparams through the various layers of modification you'll see transformer_small-> transformer_base-> transformer_base_v2-> transformer_base_v1->common_hparams.basic_params1. In basic_params1, sampling_method is set to argmax: https://github.com/tensorflow/tensor2tensor/blob/6969fab42200a7da11bc40c9537b76b0a204b46a/tensor2tensor/layers/common_hparams.py#L90 and is never changed as the hparam set is modified into transformer_small. The same is true for transformer_base and the attention_lm.py file's preset hparams.

s-xie on 22 Jun 2018

Stanley, thanks for your response!

We saw that hyperparameter and tried to change it on the t2t-decoder (also tried on the t2t-train but that didn't work and we thought maybe its not necessary since you don't sample at train time anyways).

I also did the nuclear option of installing tensor2tensor from source and manually changing sampling_method="random", # "argmax" or "random" in case the hyperparam passing in wasn't working, but the results are all the same.

hughbzhang on 22 Jun 2018

have you tried logging/printing some things around here: https://github.com/tensorflow/tensor2tensor/blob/a4fa55a3f128753d006d26ba8691eb97d14fbcfc/tensor2tensor/utils/t2t_model.py#L1087
to see what the distribution you're sampling out of looks like? Does the code get to this function?

s-xie on 22 Jun 2018

I have found two mirror issues when I use a trained language model to decode a sentence.

the demo problem languagemodel_ptb10k generate vocabulary file that has word the with id->0, thus <pad>'s is 1, <EOS>'s is 2, so this line will give wrong eos_id to beam_search decoding processing. It results wrong terminal state. https://github.com/tensorflow/tensor2tensor/blob/1de75bda4bd4c98ca50bcdbcf5e94b388bf9a044/tensor2tensor/models/transformer.py#L812
language model problem has only targets, so if the model decodes those targets words, it will be striped, see this line:
https://github.com/tensorflow/tensor2tensor/blob/57444300243f068bad88eb5ed51a9793c4bde172/tensor2tensor/models/transformer.py#L442 . However, in the preprocessing, <EOS> is automatically added to the targets, the model will then always decodes <pad> after<EOS> . Thus nothing is outputed.

Chanrom on 23 Aug 2018

👍1

Quite strange -- could anyone figure out why "the" ends up with id = 0? We can look into it but would appreciate any help to fix it !

Thanks to everyone for the debugging.

@lukaszkaiser @rsepassi

afrozenator on 26 Oct 2018

I noticed that if I use a beam_size of 1 then it goes into the "greedy" decoding, however it will look at the sampling_temp hyperparameter and if I specify a value of 1.0, it seems to correctly sample random tokens (which is great). Am I correct that one needs to specify a beam_size of 1 and a non-zero sampling_temp to generate random text? If so, perhaps there should be a warning if the sampling_method is "random" but the beam_size is not 1 or if the sampling_temp is 0?