Tensor2tensor: how to reproduce "one model to learn them all" results?

Created on 21 Jun 2017 · 5Comments · Source: tensorflow/tensor2tensor

"one model to learn them all" relates to many tasks, but T2T problems are for single task.
How to run multimodel training by T2T?
Thanks a lot!

Source

zzkszzks

👍4

Most helpful comment

Reproducing the MultiModel is not as simple as translation for 2 reasons: (1) you need to get the data (and some are not free and need pre-processing), and (2) it takes over a week to train even in a large distributed setup. I can help guide you through both if you're interested though :).

As for data, 5 out of the 8 problems are hooked up in the generator (t2t-datagen), so it should do the downloading and pre-processing for you. There are: image_mscoco_tokens_8k_tune, wmt_ende_tokens_8k, wmt_ende_tokens_8k_rev, wmt_enfr_tokens_8k, wmt_enfr_tokens_8k_rev. Then, for wmt_parsing_tokens_8k, you can also use the generator, but you'll need to get the Penn Treebank and maybe hack a little bit, similar to wsj_parsing. For image_imagenet you can follow the instructions here: https://github.com/tensorflow/models/tree/master/inception -- it's the same file we use at the end. The speech data might be the hardest, we got some help with this from friends who used Kaldi to transform the WSJ corpus into the frequency domain.

So yeah, getting this data together (and remember to use the same tokenizer and vocabulary!) is a bit of work. And then comes the training, which is in principle simple: just set $MODEL=multimodel, $HPARAMS=multimodel_1p8 and $PROBLEM=audio_wsj_tokens_8k_test-image_mscoco_tokens_8k_tune-wmt_parsing_tokens_8k-wmt_ende_tokens_8k-wmt_ende_tokens_8k_rev-wmt_enfr_tokens_8k-wmt_enfr_tokens_8k_rev-image_imagenet -- except that even on 8 GPUs that'd take a very long time. I'm actively working on making it faster, will be checking in updates to the model, but it's a lot of data to go through.

So the above tells how to replicate the full 8-problem setting. If you want to try smaller-scale, I think it's interesting to try the Transformer on multi-lingual translation. For example, just use the README instructions with $PROBLEM=wmt_ende_tokens_32k-wmt_ende_tokens_32k_rev-wmt_enfr_tokens_32k-wmt_enfr_tokens_32k_rev -- that 4-way translation model could be a good warmup before going all-in on the 8-problem one.

lukaszkaiser on 21 Jun 2017

👍9 ❤4

All 5 comments

lukaszkaiser on 21 Jun 2017

👍9 ❤4

Thank you for the detailed answer!

zzkszzks on 22 Jun 2017

Hi,

Just wondering if the transformer four way translation operate like the multi-modal in sense that it applies a prelayer before the transformer for each language? Or it works by using side-constraints, like adding the extra language destination token at the input?

Best,
Colman

colmantse on 11 Aug 2017

Dear Lukasz,
I also tried to reproduce this results with Multimodel, but I tried to initialize model in Python script. As a codebase I used file tensor2tensor/models/multimodel_test.py, but I stucked.
Instead of

hparams.problems = [p_hparams]

I can use list with, for example, two problems:

hparams.problems = [p1_hparams, p2_hparams]

with parameters from two different problems. But how to feed this model with two inputs?

I also can't understand one moment. According to paper, as I understood it, every input and output of Multimodel takes data from other models. For example features from CNN encoder is feeded to input, processed in Multimodell with features from other models inputs, and decoded by captioning generation LSTM network in output.
But in example script described above I've found only Multimodel initialization and problem class copy for it. Even if I could to feed several problems into Multimodel, problem class doesn't define through which encoder program will feed data into Multimodel, and through which decoder would it decode features vector.
I would thank you if you explain me this details of tensor2tensor work, and how to run multimodel in Python script for trainingprediction.