Espnet: Implement TCEN for speech translation

Created on 18 Sep 2019  路  4Comments  路  Source: espnet/espnet

I have implemented the architecture of the Tandem Connectionist Encoding Network (TCEN) for speech translation, which is based on our submitted AAAI2020 paper. https://arxiv.org/pdf/1909.07575.pdf

Features for TCEN:

  1. It has a speech encoder and a text encoder, one for acoustic feature extraction and another for linguistic feature extraction
  2. The model can be trained on ST data, ASR data, as well as MT data.
  3. The CTC layer for ASR training can be shared with the source word embedding layer for MT training

Our model can improve the speech translation performance significantly and I'll release my trained model in the future.

Discussion Stale

Most helpful comment

Of course, I'll merge my code to epsnet v.0.6.0

All 4 comments

This sounds very good.
Can you also think of making a PR?
We'll help you.

Of course, I'll merge my code to epsnet v.0.6.0

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue is closed. Please re-open if needed.

Was this page helpful?
0 / 5 - 0 ratings