Espnet: First multi-speaker Transformer

Created on 20 Sep 2019 · 4Comments · Source: espnet/espnet

hello ,what is speaker embedding in the first multi-speaker transformer system ? Is x vextor?
can you answer where speaker embedding is placed in the system? I am trying do multi-speaker transformer system. thanks!

Question

Source

yyggithub

👍1

Most helpful comment

Here the encoder state means outputs of the final layer of the encoder.
Maybe you can understand by checking following part.
https://github.com/espnet/espnet/blob/a2181ad10929ae980c228f40533defa6904d9db0/espnet/nets/pytorch_backend/e2e_tts_transformer.py#L507-L513
https://github.com/espnet/espnet/blob/a2181ad10929ae980c228f40533defa6904d9db0/espnet/nets/pytorch_backend/e2e_tts_transformer.py#L773-L795

kan-bayashi on 24 Sep 2019

👍2

All 4 comments

Sorry for late reply. I'm back from INTERSPEECH.
Pretrained speaker embedding is X-vector, which is trained by VoxCeleb corpus
I add or concat x-vector for each hidden state of encoder as follows:

add: x-vector -> linear -> replicate -> + each encoder hidden state
concat: x-vector -> replicate -> concat with each encoder hidden state -> linear

kan-bayashi on 23 Sep 2019

Sincere thanks . I understand what you say that is you place the x-vector between Multi-head layer and FFN layer ,such as N=3,you do it in every layer.
for example, concat x-vector as follows:
for i in N (N=3)
----->Multi-head->add&norm->concat x-vector->FFN->add&norm--->
I don't understand that each encoder hidden state you say.I have concated speaker embedding with encoder output in the transformer system ,but it doesn't work. again thanks!

yyggithub on 24 Sep 2019

kan-bayashi on 24 Sep 2019

👍2

Sincere thanks！

yyggithub on 27 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Question : batch-bin

dsa934 · 4Comments

[Info] C++ backend

ShigekiKarita · 3Comments

RuntimeError: Error(s) in loading state_dict for Transformer: size mismatch for encoder.embed.0.weight: copying a param with shape torch.Size([43, 384]) from checkpoint, the shape in current model is torch.Size([37, 384]).

thrfdth · 4Comments

Development plan for 0.5.0

ShigekiKarita · 5Comments

Error when installing espnet

ghost · 5Comments