Espnet: Integrating waveglow with espnet fastspeech

Created on 24 Sep 2019  路  4Comments  路  Source: espnet/espnet

Hello, is anyone try this approach? This seems the fastest combination at this moment.
I've tried but it seems there is something wrong with my implementation, which generated inaudible audio (I can vaguely hear some words but it's far from wavenet option)
Sample
waveglow+fastspeech

Discussion

Most helpful comment

Did you use the exact same parameters for log-melspectrogram extraction? I guess the reason you got non intelligible speech is the feature mismatch between espnet and waveglow.

All 4 comments

Did you use the exact same parameters for log-melspectrogram extraction? I guess the reason you got non intelligible speech is the feature mismatch between espnet and waveglow.

Unfortunately, as @r9y9 said, hyperparamers of feature extraction is different.
It is necessary to fix the setting to use waveglow.

I integrated with the real-time neural vocoder ParallelWaveGAN.
You can try in Google Colab.
https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb
img398

Unfortunately, as @r9y9 said, hyperparamers of feature extraction is different.

> It is necessary to fix the setting to use waveglow.

Hi,
can you tell me, which parameters for waveglow should be changed?

Was this page helpful?
0 / 5 - 0 ratings