Tfjs: TFJS - How to create model for custom word(Speech commands model)

Created on 3 Jul 2019  路  11Comments  路  Source: tensorflow/tfjs

To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.

TensorFlow.js version

Node version : V12.4.0

Browser version

Describe the problem or feature request

I used the audio_model which has been given in https://github.com/tensorflow/tfjs-models/tree/master/speech-commands.
Followed the document and train the model with custom word like (wakeup) with existing dataset and saved the model.
1 . Create_model wakeup up down left right

  1. loaded the dataset
  2. train 100
  3. save_model

Model json and weights.bin got generated then i imported the model in js but its not detecting any word.

Please suggest how to train custom word and how much training epoch required.

Code to reproduce the bug / link to feature request

support

Most helpful comment

Like others on this thread, I'm also still unclear about step 3 on this README:

  1. Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here.

Could someone please provide additional details on that step of training? Thanks so much.

All 11 comments

Could you give a bit more detail and describe how you did the retraining? Did you use the transfer learning api or something different. A code snippet would be great for us to get a better sense of what may be going on. Also how many samples/examples do you have for each of the words in your vocabulary?

We download the speech data set from https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz and added one more folder called wakeup with 500 samples and trained the data we followed this tutorial to train the data .
https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training

  1. create up down left right wakeup
  2. load_dataset all /tmp/data ( loaded all the 5 datasets)
  3. train 500
  4. save_model /tmp/audio_model.

We got model.json and weights.bin files.
Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

let recognizer;

function predictWord() {
console.log("predictWord--");
// Array of words that the recognizer is trained to recognize.
const words = recognizer.wordLabels();

recognizer.listen(({scores}) => {
console.log("scores--",scores);
// Turn scores into a list of (score,word) pairs.
scores = Array.from(scores).map((s, i) => ({score: s, word: words[i]}));
// Find the most probable word.
console.log("Scores--",scores);
scores.sort((s1, s2) => s2.score - s1.score);
document.querySelector('#console').textContent = scores[0].word;
}, {probabilityThreshold: 0.75});
}

async function app() {
recognizer = speechCommands.create('BROWSER_FFT','directional4w');
//recognizer = speechCommands.create('BROWSER_FFT');
await recognizer.ensureModelLoaded();
predictWord();
}

app();

We are not able detect any keywords like up,down or left and right. Please let me if anything wrong in the training process ? or do we need to train more steps ?

We dint use transfer learning api.

Thanks for the information @ranjithrengaraj a few things stand out to me.

  1. speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following?

Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

It is surprising that you don't get the original words being recognized. Were you able to get the base model working without modification.

  1. More importantly the training script you linked to might be for a different model that is trainable in node.js (I do think this is confusing so I'll try to get that fixed or at least better described/located). The instructions seem incomplete for how to load it and do inference with it. @pyu10055 Could you update https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training with code snippets for how use the model trained from that script.

Apologies for how confusing this all is.

Thanks tafsiri.

Were you able to get the base model working without modification. - Yes we are able to detect the keyword without modification.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following? - We have loaded custom model using loadLayersModel .Same method working for pre trained model but not custom training model.

this.modelURL='http://localhost:28440/model.json'
return i.sent(), [4, t.loadLayersModel(this.modelURL)];

Someway i loaded metadata.json also.

I'm also interested in figuring out how to train a model that can later be loaded in the browser. I was able to train and save model following the README in the training/soft-fft directory, though it appears that functionality is not yet supported by speechCommands. I looked into training/browser-fft but there appears to be a missing step:

  1. Run WebAudio FFT on the .dat files generated in step 2 in the browser.

Anywhere you can point me to figure the best way to run the WebAudio FFT on the processed files?

@caisq gentle ping ! Did you get chance to look at this ?

+1 to @nsteins question. Some pointers would be great!

I'm also interested in some details regarding WebAudio FFT.

Stuck at step 3:
"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

Closing this due to lack of activity, feel to reopen. Thank you

Like others on this thread, I'm also still unclear about step 3 on this README:

  1. Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here.

Could someone please provide additional details on that step of training? Thanks so much.

Hello, I would also be interested in further information about step 3 of data preparation :

"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

Does anyone have any idea or indications on how to do this given the available code? Would be greatly appreciated

Was this page helpful?
0 / 5 - 0 ratings