Deepspeech: Add Python example showing use of streaming API

Created on 6 Nov 2018  路  9Comments  路  Source: mozilla/DeepSpeech

Right not there's no example of how to use the streaming API and, say, VAD detection to stream. Creating such an example will help devs get up and running.

Most helpful comment

I can do that. I have built it on top of the VAD. Needs a little cleanup. I'll try to send in the first patch for review over this weekend.

image

All 9 comments

I can do that. I have built it on top of the VAD. Needs a little cleanup. I'll try to send in the first patch for review over this weekend.

image

I have a small (Python) project deepspeech-websocket-server: Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments. I'm still developing it, and will probably add some other servers for other ASR packages.

But I could easily make a stable copy in a PR if you want. It has few requirements (no GUI cruft), is currently fairly simple, and I could take the websocket stuff out to make it a single simpler app if desired. The websocket feature is great for some situations like using DeepSpeech through WSL using the Windows microphone, though. Just let me know what structure you want.

Nice to see that people is making tools. I want to make it available for Windows without WSL and I think I almost ahieve it, I'm able to load the model using C# without the memmapped_file_system.h parts. I'm waiting for few changes merge to try again. If I achieve my goal I'll try to use it with the Windows speech recognition as VAD, and make it available on nuget.
I'm also investing time making transcriptions for Spanish from Librivox, I have built a tool that is helping me making the transcriptions using the current text of the book and the windows speech recognition to detect the sentences and cutting them into parts with ffmpeg, the results are perfect on sentences that contains more than four words. Hopefully will post in discourse soon.

@daanzu Thanks for volunteering!

What I had in mind was something very simple, like client.py, but which allow for streaming using VAD.

However, now that I think about it maybe audioTranscript_cmd.py is simple enough that we're fine relying on it as a "go to" example?

@kdavis-mozilla vad_transcriber looks good, but I don't think it uses the streaming API? Also, personally, I think it'd be good to have an easy tool included to use the microphone. Both for first-time users, and for rapid testing without needing the rigor of repeatability, the frictionlessness of microphone recognition is unbeatable.

Microphone recognition is a natural demonstration of the streaming API, and it'd be easy for me to make a single script to use VAD and stream directly from microphone to DeepSpeech. Just let me know.

@daanzu I stand corrected. If you want to make a PR, it will be gladly welcomed!

FWIW there's an example using the microphone here: https://gist.github.com/reuben/80d64de15d1f46d34d28c7e83fc5f57e#file-ds_mic-py

Looks like we did our home work here, thanks @daanzu !

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings