Espnet: Is there any plan to support multi-machine multi-GPU ASR training?

Created on 20 Nov 2019 · 3Comments · Source: espnet/espnet

Hello, I notice that espnet's PyTorch version uses data_parallel to support multi-GPU, but to my understanding, this can only be used in single-machine. Is there any plan to support multi-machine multi-GPU ASR training? So that making 10k+ hours training feasible in the industry. Thx.

Question Stale

Source

kaituoxu

👍1

Most helpful comment

We have been working internally with several projects and it's working with some minor changes, but I want to move in this direction once we fix a new training abstraction design (e.g., #1372). We're thinking of a (super) major update for this.

sw005320 on 20 Nov 2019

👍3

All 3 comments

sw005320 on 20 Nov 2019

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 4 Jan 2020

This issue is closed. Please re-open if needed.

stale[bot] on 3 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Reproduce SOTA TTS result on LJspeech

Syrup274 · 4Comments

RuntimeError: Error(s) in loading state_dict for Transformer: size mismatch for encoder.embed.0.weight: copying a param with shape torch.Size([43, 384]) from checkpoint, the shape in current model is torch.Size([37, 384]).

thrfdth · 4Comments

[Info] C++ backend

ShigekiKarita · 3Comments

Integrating waveglow with espnet fastspeech

enamoria · 4Comments

Question: Train a pre-trained TTS transformer phoneme based with new data

marcosnetopires · 4Comments