Espnet: Implement TCEN for speech translation

Created on 18 Sep 2019 · 4Comments · Source: espnet/espnet

I have implemented the architecture of the Tandem Connectionist Encoding Network (TCEN) for speech translation, which is based on our submitted AAAI2020 paper. https://arxiv.org/pdf/1909.07575.pdf

Features for TCEN:

It has a speech encoder and a text encoder, one for acoustic feature extraction and another for linguistic feature extraction
The model can be trained on ST data, ASR data, as well as MT data.
The CTC layer for ASR training can be shared with the source word embedding layer for MT training

Our model can improve the speech translation performance significantly and I'll release my trained model in the future.

Discussion Stale

Source

CherrieWang97

👍2

Most helpful comment

Of course, I'll merge my code to epsnet v.0.6.0

CherrieWang97 on 20 Sep 2019

👍3

All 4 comments

This sounds very good.
Can you also think of making a PR?
We'll help you.

sw005320 on 20 Sep 2019

Of course, I'll merge my code to epsnet v.0.6.0

CherrieWang97 on 20 Sep 2019

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.