Espnet: The utilization rate of gpu is strang when training transformer & fastspeech

Created on 4 Dec 2019  Â·  3Comments  Â·  Source: espnet/espnet

when I trained TTS model on the /csmsc/conf/tuning/fastspeech.v3&transformer.v1, the utilization rate of gpu would go to zero for a while(but the result is aceptable), but the rate of tacotron2.v3 was fine. Is there any explanation for that?
fastspeech on 3 gpu
image
transformer on 3 gpu
image
tacotron2 on 1 gpu
image

Question

Most helpful comment

@kan-bayashi @ShigekiKarita I have tested on the experiment what you suggested, and the result support your guess @ShigekiKarita ,thx both of you :)

test on default
image

test on num-iter-porcessed = 4
image

test on num-save-attention = 0
image

All 3 comments

Maybe IO is bottleneck.
Try num-iter-processes > 0.
https://github.com/espnet/espnet/blob/beaef4e8c5f09cc655a6f7011f257d3800d311f8/espnet/bin/tts_train.py#L93
For example, you can add num-iter-processes: 4 in yaml config.

Maybe attention plot is also bottleneck because Transformer's plot is very large. You can try num-save-attention: 0 in the training yaml
https://github.com/espnet/espnet/blob/beaef4e8c5f09cc655a6f7011f257d3800d311f8/espnet/bin/tts_train.py#L121-L122

@kan-bayashi @ShigekiKarita I have tested on the experiment what you suggested, and the result support your guess @ShigekiKarita ,thx both of you :)

test on default
image

test on num-iter-porcessed = 4
image

test on num-save-attention = 0
image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mdeisher picture mdeisher  Â·  4Comments

ymzlygw picture ymzlygw  Â·  4Comments

ShigekiKarita picture ShigekiKarita  Â·  5Comments

ghost picture ghost  Â·  5Comments

panademo picture panademo  Â·  3Comments