Hello, is anyone try this approach? This seems the fastest combination at this moment.
I've tried but it seems there is something wrong with my implementation, which generated inaudible audio (I can vaguely hear some words but it's far from wavenet option)
Sample
waveglow+fastspeech
Did you use the exact same parameters for log-melspectrogram extraction? I guess the reason you got non intelligible speech is the feature mismatch between espnet and waveglow.
Unfortunately, as @r9y9 said, hyperparamers of feature extraction is different.
It is necessary to fix the setting to use waveglow.
I integrated with the real-time neural vocoder ParallelWaveGAN.
You can try in Google Colab.
https://colab.research.google.com/github/espnet/notebook/blob/master/tts_realtime_demo.ipynb

Unfortunately, as @r9y9 said, hyperparamers of feature extraction is different.
> It is necessary to fix the setting to use waveglow.
Hi,
can you tell me, which parameters for waveglow should be changed?
Most helpful comment
Did you use the exact same parameters for log-melspectrogram extraction? I guess the reason you got non intelligible speech is the feature mismatch between espnet and waveglow.