Wav2letter: Where is wav2letter's Hello World?

Created on 17 Dec 2019  路  9Comments  路  Source: flashlight/wav2letter

I believe it would be to wav2letter's advantage to have a simple sample that can show it at work. Something like the proverbial hello world program.

Because wav2letter is so complex and because ASR in general is complex, having a baseline that people can refer to when trying to build a system that is using gigabytes of data, several thousands transcription and audio files would save much time and many issues. I believe deepsearch has such a one-audio file that can be used although I have not tried it.

A minimal system that shows how to create a minimum n-gram (kenlm fails if your transcription sample does not have too many lines, although there might be a flag to override the behavior) would avoid many issues created to inquire for the helloworld-like example.

All 9 comments

@tlikhomanenko @jacobkahn if you guys give me some guidelines for this helloworld sample, I would not mind documenting it. I think it would make a tremendous resource for people getting acquainted to wav2letter

I think there is one in tutorials, There you can find how to train, prepare dataset, and decode. Although I find it really hard to understand what is going on, but that can works.

Hi @mpierre0220 ,
Thanks for the feedback. We have a tutorial here - https://github.com/facebookresearch/wav2letter/tree/master/tutorials and we recently started working on a wiki here - https://github.com/facebookresearch/wav2letter/wiki.

I think it would make a tremendous resource for people getting acquainted to wav2letter

Agreed! We are actively working on improving the documentation and we would love to take help from the community.

@vineelpratap Thanks for the link. I have ran the librispeech tutorial. It is a rather long one with thousands of files. I am thinking about a short tutorial for a few minutes that goes through the essentials and gives a nice view of the steps to take.

@vineelpratap
Thanks, I believe that the baseline easy set of instructions for that hello world is what is direly needed (and it would be a tremendous resource)

Many people are coming to wav2letter fresh and they have their own language they would like to try with their own audio. This tutorial would be step-by-step instructions on how do the following

  1. Prepare wav2letter data with a single audio file
    a. Create the list file with the transcription
    b. Create the language model
    c. Create the lexicon file
    d. Create the tokens file
    e. Any other directory structure needed
  2. Train wav2letter with that single audio file
    a. Syntax to launch training with above files
    b. How to read any result from training
  3. Decode a smaller subset of the single audio
    a. Syntax to launch the decoder
    b. What to expect

It is easy to get lost in the process if you're dealing with many variables such as thousands of files and the process can really be overwhelming that way. But it is easy to debug and find your way around when dealing with a single file.

This is also at the heart of induction. Show it true for n=1 and then true for any n+1

These things can be ridiculously easy for people who are in the code day in and day out but for someone coming new to the technology it can be quite an undertaking without an easier entry point.

@vineelpratap Like I said before, I would not mind working on it but I have a few questions I would like answered. I understand the structure of the list file and the flags file. I understand how to launch training and look at the results and even interpret some of the output and match the output with the source files and look what is going on in there. What I don' t quite understand, is how a would plug a network into that one-audio-file system, how I would generate a n-gram for that one transcription file (kenlm bails out if your transcription file is very small) then how to launch decoding and what to expect...I have been playing around with a bunch of files in a foreign language and have been unsuccessful at getting wav2letter to train with these files.

If there is a solid way to do it with one file, them someone can move to 2, 3...n+1, backtrack to debug what introduced an error, etc.... If you cannot do it for n=1.... it is futile to try for n=1,000,000... you know what I mean?

@mpierre0220

I would not mind working on it but I have a few questions I would like answered.
I have been playing around with a bunch of files in a foreign language
I believe that the baseline easy set of instructions for that hello world is what is direly needed

Thanks for your feedback ! Yes, having a very simple tutorial where we explain the components would be helpful and we would be happy to review your PR, if you want to spend your time on writing some doc.

Also, I think one file might be too small to do some training. It might be good have a small dataset that could be taken from https://github.com/Jakobovski/free-spoken-digit-dataset/ .

What I don' t quite understand, is how a would plug a network into that one-audio-file system

We have some documentation about writing architecture files for training a model - https://github.com/facebookresearch/wav2letter/wiki/Writing-architecture-files.

how to launch decoding and what to expect
For decoder, the documentation is at https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder.

Let us know if there are any any other specific questions.

Thanks @vineelpratap I am going to jump on that task. If I am stuck I will get back to you else have a PR for your review....it might be late next week or the first of the year as we're in the midst of the holiday season.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kamakshi-malhotra picture kamakshi-malhotra  路  5Comments

isaacleeai picture isaacleeai  路  5Comments

bill-kalog picture bill-kalog  路  4Comments

gauenk picture gauenk  路  3Comments

nihiluis picture nihiluis  路  5Comments