I want to reproduce the SOTA TTS result on LJspeech, which based on Transformer.v3 and its MOS is 4.25.
Does "train_pytorch_transformer.v3.yaml" corresponds to the SOTA model configuration? I notice that these also exists "train_pytorch_transformer.v3.single.yaml", is this config for single_gpu training? How many GPUS should I use and do I have to modify these yaml files to reproduce the SOTA TTS result on LJspeech? (also the provided pretrained model config is also a little different from "train_pytorch_transformer.v3.yaml", which is more closer to the posted result?)
BTW, which vocoder did you use?
Thanks!
Hi @Syrup274. I will answer your questions.
trans_type=phn, train_pytorch_transformer.v3.yaml, and full band mel (fmin=0 fmax=11025).train_pytorch_transformer.v3.yaml and train_pytorch_transformer.v3.single.yaml is batch_size. *.single.yaml config can be ran on a single gpu with 12 GB memory. Since we use gradient accumulation, the results w/ both configs should be the same theoretically.89-7600 and we can select phn or char for both transformer and tacotron 2. Therefore, the quality of taco 2 and the transformer is almost the same.Thanks @kan-bayashi for your reply.
Another three questions:
Thanks again for your patient.
How many GPUs did you use in "train_pytorch_transformer.v3.yaml"? (in model.json of pretrained model it seems to be 2, but it seems to be 3 according to batch size). Can I use more GPUs?
Three gpus. I can accelerate the training with 6 gpus by setting accum_grad: 1 in config.
I didn't find "train_pytorch_transformer.v3.yaml" in tag v0.5.3. Do I have to checkout to v0.5.4 or just modify the scripts on master?
Use v.0.5.4 or modify the parameters fmin and fmax in run.sh with current master.
I notice some differences in "decode.yaml" between master and v0.5.3, does these extra lines affect the final result?
https://github.com/espnet/espnet/blob/9e2bfc5cdecbb8846f5c6cb26d22010b06e98c40/egs/ljspeech/tts1/conf/decode.yaml#L5-L7
These above options are available for only Tacotron2. So no effect for Transformer-TTS.
Thanks for your reply.