Nw.js: webkitSpeechRecognition for desktop apps?

Created on 14 Sep 2013 · 83Comments · Source: nwjs/nw.js

I was attempting to add speech recognition to a desktop application but it seems speechRecognition is not working probably due to the application using local file urls and no way to allow permission like when running it in chrome.

https://github.com/TalAter/annyang/issues/44

As you can see in the linked related issue trying to run speecheRecognition by it's self in the console does nothing in node-webkit.

var sr = new webkitSpeechRecognition;
sr.continuous = true; 
sr.interimResults = true;
sr.lang='en';
sr.onresult = function(e) {
    console.log(e.results[e.results.length-1][0].transcript);
};
sr.start();//nothin happens and no errors

So would it be correct to say this is not currently possible in node-webkit or is there a known work around? I'm aware of the getUserMedia API but it seems that only captures audio but doesn't do any speech recognition.

If it's not currently possible will it be possible in future releases?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

feature-request

Source

isimmons

👍6

Most helpful comment

According to the latest draft of the Web Speech API, the serviceURI hasn't been removed from the specification.

So I went digging a bit, which did turn up some things I haven't discovered before, but most discussion around this seems to have happened out of band, or somewhere inaccessible to the public.

There was Chromium Issue N° 480516 filed ~2 years ago, in which support was added, and subsequently reverted again a few months later. Discussion related to this happened on blink-dev/82LcTDrhshw/pGKPgrXOUaAJ, yet it is still unclear to me what exactly happened there.

It might be worth filing a new Chromium Issue to get an update on the situation.

jhermsmeier on 3 Jul 2017

👍3

All 83 comments

caolan on 12 Oct 2013

Seems that permission needs to be pre-authorised in the same way it was for getUserMedia calls and screensharing.

Would love to see support for this @rogerwang and feels like a possible quick win?

tommoor on 12 Oct 2013

deanshub on 17 Oct 2013

You could write a native-client app for chrome that is a wrapper for this. I guess this could work with phantomjs too.

X4 on 1 Dec 2013

TalAter on 16 Jan 2014

manuelpaulo on 16 Jan 2014

jshemas on 7 Feb 2014

Seriously no comment on this after 7 months? +1

Akkuma on 31 Mar 2014

@isimmons @caolan @tommoor @deanshub @X4 @TalAter @manuelpaulo @jshemas

Does any of you find any workaround ?

ThomasAy on 1 Apr 2014

@ThomasAy I'd have to debug it, as I've only skimmed the nodewebkit sourcecode, but to me it looks like you could patch https://github.com/rogerwang/node-webkit/blob/master/src/media/media_capture_devices_dispatcher.cc to show debug messages, when you call sr.start();
However, I've not even seen a reference of webkitSpeechRecognition, so it seems that this is currently not implemented yet. Correct me if I'm wrong, but here is what I think is missing in node-webkit to support this: http://git.chromium.org/gitweb/?p=chromium.git;a=tree;f=chrome/browser/speech

X4 on 1 Apr 2014

ghost on 5 Apr 2014

alanjames1987 on 7 May 2014

what about the status?

taskinegemen on 22 Jul 2014

PlasmaPower on 8 Sep 2014

byourselves on 11 Sep 2014

This can be done some through Google's web api for speech recognition, but it would be nice to have it integrated into nodeWebkit (or even nicer as a nodeJS module).

PlasmaPower on 11 Sep 2014

bram-dingelstad on 3 Oct 2014

willemmulder on 7 Oct 2014

dzautner on 15 Oct 2014

netanelgilad on 16 Oct 2014

ghost on 17 Oct 2014

RLTmultimedia on 29 Oct 2014

xhallix on 5 Nov 2014

+1 I would LOVE this feature!

miller9904 on 7 Nov 2014

Lezeper on 25 Nov 2014

is this feature still not working ? i want it badly

anishtr4 on 4 Dec 2014

Me too!

miller9904 on 5 Dec 2014

askbeka on 5 Dec 2014

Still no fix?!

Edit from 2020: I'm sorry that I responded like this!

RobinMalfait on 21 Dec 2014

@RobinMalfait this isn't a bug, it's a feature request :wink: Chrome uses Google's voice recognition service, there is no equivalent for node webkit.

tommoor on 22 Dec 2014

FYI
I've implemented this for my own usage, though i have to warn you, it's not that nice of a solution...
steps:
1) Record the audio using web audio
2) save it as a file in the file system
3) convert file format to flac
4) send the flac file to Google API
5) receive the text hypothesees

deanshub on 22 Dec 2014

👍1

Please see #2051 :)

mpreziuso on 12 Jan 2015

JaredCubilla on 18 Jan 2015

The feature seems useless because in order to use this you'll need an API key, under which google limited with 50 requests per day ...

https://groups.google.com/a/chromium.org/d/msg/chromium-dev/6HDtCUfIbLE/3GExi4-XnMcJ

rogerwang on 19 Jan 2015

Maybe fairly useless with Google's API, but there's a .serviceURI property on a SpeechRecognition instance: https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#api_description
which enables use with other service endpoints (i.e. even a machine local one, which is also what I was intending to experiment with):

jhermsmeier on 19 Jan 2015

Good point. I would love to be able to plug in my own speech recognizer.

miller9904 on 19 Jan 2015

@jhermsmeier good to know it! do you happen to know any service with which I can test my commit ?

rogerwang on 20 Jan 2015

@rogerwang Found WAMI from MIT's CSAIL, it even supports several languages (tested English & German so far): .serviceURI = 'wami.csail.mit.edu'

jhermsmeier on 20 Jan 2015

Jonas, could you make an example file how to use that? thanks

RobinMalfait on 20 Jan 2015

@RobinMalfait here's a quick thing I threw together: http://bl.ocks.org/jhermsmeier/3bc995d37f3acc0b0364

jhermsmeier on 20 Jan 2015

That works great! :-)

Is there anything we can do to make this happen in node-webkit?

willemmulder on 24 Jan 2015

@willemmulder from Roger's comments it looks like he's working on adding support for it.

Gimmeaphatbeat on 25 Jan 2015

Ah thanks, I missed the commit a few messages up. Excellent, thanks @rogerwang

willemmulder on 26 Jan 2015

@rogerwang is there any progress on this? (not pushing, just curious what the current state of things is)

jhermsmeier on 20 Feb 2015

@jhermsmeier I plan to add it in one of the beta versions of 0.12.0.

rogerwang on 22 Feb 2015

unfortunately there is no support for 'serviceURI' in current implementation of upstream: https://chromium.googlesource.com/chromium/blink/+/master/Source/modules/speech/SpeechRecognition.idl
so we may have to wait for upstream for it.

The current git version of NW supports google service but lacks usable API key...

rogerwang on 22 Feb 2015

Nice, thanks for the update! Wondering why serviceURI isn't in upstream yet, though. It's been in stable Chrome for a while.

jhermsmeier on 22 Feb 2015

Thanks for the update! I hope ServiceURI becomes available soon, then we can really start doing nice stuff!

willemmulder on 23 Feb 2015

barbass on 19 Apr 2015

kalaomer on 10 May 2015

emroot on 26 Jun 2015

:+1:

sholladay on 8 Jul 2015

l0k3ndr on 26 Jul 2015

jaybrownlee on 26 Aug 2015

lucasyvas on 3 Sep 2015

jhm-ciberman on 7 Sep 2015

KingRial on 8 Sep 2015

TemaSM on 10 Oct 2015

zackm0571 on 4 Nov 2015

tom-s on 1 Dec 2015

Aaronik on 24 Dec 2015

It should be supported by NW13.

Aaron Sullivan [email protected]于2015年12月25日周五 00:58写道：

+1

—
Reply to this email directly or view it on GitHub
https://github.com/nwjs/nw.js/issues/1115#issuecomment-167136747.

ghostoy on 28 Dec 2015

Is this supported already? I'm using NW13 beta 2.
And this code throw a SpeechRecognitionError (error="network", message="")

var recognition = new webkitSpeechRecognition();
recognition.onresult = function(e) {console.log(e);}
recognition.onerror =  function(e) {console.log(e);}
// Fails with and without this line: 
//recognition.serviceURI = 'wami.csail.mit.edu';
recognition.start();

jhm-ciberman on 11 Jan 2016

I assume wami.csail.mit.edu returns an error or requires a key of sorts? What do you see in the network requests?

willemmulder on 11 Jan 2016

Does this feature work without embedding a black box binary blob from Google that accesses the microphone without user consent (and without any web app making explicit use of it)?

Please note while I might sound paranoid, this is apparently what chromium used to do to support this: https://www.privateinternetaccess.com/blog/2015/06/google-chrome-listening-in-to-your-room-shows-the-importance-of-privacy-defense-in-depth/

If this is what you plan to add to nwjs as well, maybe make the binary blob download only triggered at runtime by the web app so no app using nwjs is FORCED to run this potential spyware code, or if this needs to be embedded into nwjs from now on then please provide a separate download that doesn't contain this feature if it relies on a google blob handling the microphones.

TL;DR: make sure apps that don't want to use speech recognition don't need to get shipped with nw.js versions that potentially spy on users with the microphone without asking. Again, see the URL above.

etc0de on 11 Jan 2016

If I use the code above in NW, it fails with a SpeechRecognitionError. If I try (with or without the wami.csail.mit.edu) in regular Google Chrome, it works fine and detects the speech correctly.
In neither case network requests are generated or showed in the devtools panel.

jhm-ciberman on 11 Jan 2016

I want this soooooo bad.

Nelderson on 29 Feb 2016

frenchbread on 21 Apr 2016

🎉1

@jhm-ciberman @willemmulder I just recently ran into the same network error with the 0.14.0-sdk, might get around to looking into what's going on there in the next days.

jhermsmeier on 22 Apr 2016

Thanks @jhermsmeier ! Really looking forward to your findings.

willemmulder on 22 Apr 2016

I've just done some testing with Chrome, Chromium and nw.js - attempting to use a local speech recognition backend running on localhost via serviceURI. First finding: The SpeechRecognition API stopped giving a flying fuck about the serviceURI property. It's completely ignored from what I can tell... i.e. with Chromium no connection to localhost could be observed, yet the speech recognition worked perfectly fine. Might find the time to dig into the source and file a bug, if there isn't one already.

jhermsmeier on 23 Apr 2016

Hmm I'm pretty sure that using a different URL (I used wami.csail.mit.edu) worked before in Chrome. However, if I read this thread https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/82LcTDrhshw then there's some talks about not supporting a full URL, whatever that means. Or in other words: I'm just as surprised as you are.

willemmulder on 23 Apr 2016

Looks like it's been removed / reverted 4 months ago: https://bugs.chromium.org/p/chromium/issues/detail?id=480516#c6

Can't see a valid reason for why they did though, as from what I can see, it's still in the draft/spec (although on the other hand; that's in flux, and I have difficulties finding a recent document).

There are some mentions of going through chrome://speech-recognition to use custom recognizers – which I haven't quite fully understood, yet. Will really need to dig into the issue tracker and source to figure out what's happening with that I guess.

jhermsmeier on 23 Apr 2016

Any updates on this?

chrome.tts doesn't seem to work at all in nwjs.io, neither does window.speechSynthesis. Has anyone succeeded?

I thought it may be an issue of adding the right permissions to ttsEngine in the manifest, but that doesn't seem to do the trick either.

bromagosa on 20 Feb 2017

@bromagosa I just tested chrome.tts works for me. This issue is about speech recognition, not text-to-speech. Please file another issue if it has bug on your side.

rogerwang on 22 Feb 2017

To all: I just had a test, speech recognition is working with a valid google API key. So I'm closing this issue.

I'm testing with https://www.google.com/intl/en/chrome/demos/speech.html.

Check here to see how to get a key: https://www.chromium.org/developers/how-tos/api-keys

rogerwang on 22 Feb 2017

@rogerwang sorry, you're right. I was hunting for solutions for both because speech recognition wasn't working either, but I'll take a look at your links. Thanks!

p.s. I found out chrome.tts is supported in Windows, Mac and Chromebooks, but (seemingly arbitrarily) not in GNU/Linux, so that's why I can't get it to work. I may need to work around that by adding a Node binding to festival :(

bromagosa on 22 Feb 2017

The Speech API looks great but the current limits imposed by Google make it unusable for developers.

There needs to be a really good offline ASR so not to ever worry about Google and being online but that probably wont happen anytime soon.

mscreenie on 19 Mar 2017

It looks like you could use the Google Cloud Platform for speech rec, or spawn a chrome process and try piping everything through there.

Sugarcaen on 22 Mar 2017

I think @jhermsmeier comment is interesting. serviceURI was removed in Chromium 49 and I can't find any information as to why. I know the API and spec is in draft, shit happens and things change. But a serviceURI param would have been useful.

This really sucks because vendor lock in and to implement your own self hosted speech to text is now really difficult. Don't know if this is Google wanting all the info to flow into their veins but frustrating to say the least and the removal of the param is a step in the wrong direction in the spirit of openness.

I am probably the minority here so there is little chance anything will change now. Would be nice to be able to voice this to the Chromium team at least.

mscreenie on 14 Apr 2017

According to the latest draft of the Web Speech API, the serviceURI hasn't been removed from the specification.

So I went digging a bit, which did turn up some things I haven't discovered before, but most discussion around this seems to have happened out of band, or somewhere inaccessible to the public.

It might be worth filing a new Chromium Issue to get an update on the situation.

jhermsmeier on 3 Jul 2017

👍3

Thanks for digging. I've done some myself and have come to the same conclusion. Everything is still a little unclear and seems odd serviceURI to be removed after being added and I can't establish why - It's also a little frustrating.

I am hopeful it may return, for now I'm exploring other things.

1) Firefox has VoiceFill with similar functionality, I haven't dug into how it works completely but this may be interesting to you:

https://github.com/mozilla/speaktome
https://github.com/mozilla/speech-proxy

VoiceFill is written as a WebExtension unless I am mistaken. If Chromium had WebExtension support there could be hope to port it as a viable replacement easily and without much effort given that is the purpose of WebExtension.

https://developer.chrome.com/extensions

2) A node plugin but may be short lived.

3) Maybe posting in the Chromium issue tracker and linking here would be a good start into investigating.

mscreenie on 2 Aug 2017

Hello, may I know why this issue was closed. I really want to implement webkitSpeechRecognition for my nodewebkit app. Can someone please tell me if there is any solution or workaround?