Tfjs: TFJS - How to create model for custom word(Speech commands model)

Created on 3 Jul 2019 · 11Comments · Source: tensorflow/tfjs

To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.

TensorFlow.js version

Node version : V12.4.0

Browser version

Describe the problem or feature request

I used the audio_model which has been given in https://github.com/tensorflow/tfjs-models/tree/master/speech-commands.
Followed the document and train the model with custom word like (wakeup) with existing dataset and saved the model.
1 . Create_model wakeup up down left right

loaded the dataset
train 100
save_model

Model json and weights.bin got generated then i imported the model in js but its not detecting any word.

Please suggest how to train custom word and how much training epoch required.

Code to reproduce the bug / link to feature request

support

Source

ranjithrengaraj

Most helpful comment

Like others on this thread, I'm also still unclear about step 3 on this README:

Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here.

Could someone please provide additional details on that step of training? Thanks so much.

jcambre on 16 Jul 2020

👍3

All 11 comments

Could you give a bit more detail and describe how you did the retraining? Did you use the transfer learning api or something different. A code snippet would be great for us to get a better sense of what may be going on. Also how many samples/examples do you have for each of the words in your vocabulary?

tafsiri on 3 Jul 2019

We download the speech data set from https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz and added one more folder called wakeup with 500 samples and trained the data we followed this tutorial to train the data .
https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training

create up down left right wakeup
load_dataset all /tmp/data ( loaded all the 5 datasets)
train 500
save_model /tmp/audio_model.

We got model.json and weights.bin files.
Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

let recognizer;

function predictWord() {
console.log("predictWord--");
// Array of words that the recognizer is trained to recognize.
const words = recognizer.wordLabels();

recognizer.listen(({scores}) => {
console.log("scores--",scores);
// Turn scores into a list of (score,word) pairs.
scores = Array.from(scores).map((s, i) => ({score: s, word: words[i]}));
// Find the most probable word.
console.log("Scores--",scores);
scores.sort((s1, s2) => s2.score - s1.score);
document.querySelector('#console').textContent = scores[0].word;
}, {probabilityThreshold: 0.75});
}

async function app() {
recognizer = speechCommands.create('BROWSER_FFT','directional4w');
//recognizer = speechCommands.create('BROWSER_FFT');
await recognizer.ensureModelLoaded();
predictWord();
}

app();

We are not able detect any keywords like up,down or left and right. Please let me if anything wrong in the training process ? or do we need to train more steps ?

We dint use transfer learning api.

ranjithrengaraj on 3 Jul 2019

Thanks for the information @ranjithrengaraj a few things stand out to me.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following?

Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

It is surprising that you don't get the original words being recognized. Were you able to get the base model working without modification.

More importantly the training script you linked to might be for a different model that is trainable in node.js (I do think this is confusing so I'll try to get that fixed or at least better described/located). The instructions seem incomplete for how to load it and do inference with it. @pyu10055 Could you update https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training with code snippets for how use the model trained from that script.

Apologies for how confusing this all is.

tafsiri on 8 Jul 2019

Thanks tafsiri.

Were you able to get the base model working without modification. - Yes we are able to detect the keyword without modification.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following? - We have loaded custom model using loadLayersModel .Same method working for pre trained model but not custom training model.

this.modelURL='http://localhost:28440/model.json'
return i.sent(), [4, t.loadLayersModel(this.modelURL)];

Someway i loaded metadata.json also.

ranjithrengaraj on 8 Jul 2019

I'm also interested in figuring out how to train a model that can later be loaded in the browser. I was able to train and save model following the README in the training/soft-fft directory, though it appears that functionality is not yet supported by speechCommands. I looked into training/browser-fft but there appears to be a missing step: