Deepspeech: Add Python example showing use of streaming API

Created on 6 Nov 2018 · 9Comments · Source: mozilla/DeepSpeech

Right not there's no example of how to use the streaming API and, say, VAD detection to stream. Creating such an example will help devs get up and running.

Source

kdavis-mozilla

👍2 🎉1

Most helpful comment

I can do that. I have built it on top of the VAD. Needs a little cleanup. I'll try to send in the first patch for review over this weekend.

b-ak on 6 Nov 2018

👍5

All 9 comments

I can do that. I have built it on top of the VAD. Needs a little cleanup. I'll try to send in the first patch for review over this weekend.

b-ak on 6 Nov 2018

👍5

I have a small (Python) project deepspeech-websocket-server: Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments. I'm still developing it, and will probably add some other servers for other ASR packages.

But I could easily make a stable copy in a PR if you want. It has few requirements (no GUI cruft), is currently fairly simple, and I could take the websocket stuff out to make it a single simpler app if desired. The websocket feature is great for some situations like using DeepSpeech through WSL using the Windows microphone, though. Just let me know what structure you want.

daanzu on 8 Nov 2018

👍1

Nice to see that people is making tools. I want to make it available for Windows without WSL and I think I almost ahieve it, I'm able to load the model using C# without the memmapped_file_system.h parts. I'm waiting for few changes merge to try again. If I achieve my goal I'll try to use it with the Windows speech recognition as VAD, and make it available on nuget.
I'm also investing time making transcriptions for Spanish from Librivox, I have built a tool that is helping me making the transcriptions using the current text of the book and the windows speech recognition to detect the sentences and cutting them into parts with ffmpeg, the results are perfect on sentences that contains more than four words. Hopefully will post in discourse soon.

carlfm01 on 8 Nov 2018

👍1

@daanzu Thanks for volunteering!

What I had in mind was something very simple, like client.py, but which allow for streaming using VAD.

However, now that I think about it maybe audioTranscript_cmd.py is simple enough that we're fine relying on it as a "go to" example?

kdavis-mozilla on 8 Nov 2018

@kdavis-mozilla vad_transcriber looks good, but I don't think it uses the streaming API? Also, personally, I think it'd be good to have an easy tool included to use the microphone. Both for first-time users, and for rapid testing without needing the rigor of repeatability, the frictionlessness of microphone recognition is unbeatable.

Microphone recognition is a natural demonstration of the streaming API, and it'd be easy for me to make a single script to use VAD and stream directly from microphone to DeepSpeech. Just let me know.

daanzu on 8 Nov 2018

@daanzu I stand corrected. If you want to make a PR, it will be gladly welcomed!

kdavis-mozilla on 8 Nov 2018

FWIW there's an example using the microphone here: https://gist.github.com/reuben/80d64de15d1f46d34d28c7e83fc5f57e#file-ds_mic-py

reuben on 8 Nov 2018

👍1

Looks like we did our home work here, thanks @daanzu !

lissyx on 13 Dec 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 13 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

could not find a version that satisfies the requirement deepspeech

mdasari823 · 39Comments

Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

testdeepv · 62Comments

Investigate GPUs on ARM boards

lissyx · 33Comments

No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

SirZontax · 59Comments

Running with Ivy Bridge and other non-AVX2 hardware

nealmcb · 30Comments