Espnet: Is there any plan to support multi-machine multi-GPU ASR training?

Created on 20 Nov 2019  路  3Comments  路  Source: espnet/espnet

Hello, I notice that espnet's PyTorch version uses data_parallel to support multi-GPU, but to my understanding, this can only be used in single-machine. Is there any plan to support multi-machine multi-GPU ASR training? So that making 10k+ hours training feasible in the industry. Thx.

Question Stale

Most helpful comment

We have been working internally with several projects and it's working with some minor changes, but I want to move in this direction once we fix a new training abstraction design (e.g., #1372). We're thinking of a (super) major update for this.

All 3 comments

We have been working internally with several projects and it's working with some minor changes, but I want to move in this direction once we fix a new training abstraction design (e.g., #1372). We're thinking of a (super) major update for this.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue is closed. Please re-open if needed.

Was this page helpful?
0 / 5 - 0 ratings