Nw.js: webkitSpeechRecognition for desktop apps?

Created on 14 Sep 2013  ·  83Comments  ·  Source: nwjs/nw.js

I was attempting to add speech recognition to a desktop application but it seems speechRecognition is not working probably due to the application using local file urls and no way to allow permission like when running it in chrome.

https://github.com/TalAter/annyang/issues/44

As you can see in the linked related issue trying to run speecheRecognition by it's self in the console does nothing in node-webkit.

var sr = new webkitSpeechRecognition;
sr.continuous = true; 
sr.interimResults = true;
sr.lang='en';
sr.onresult = function(e) {
    console.log(e.results[e.results.length-1][0].transcript);
};
sr.start();//nothin happens and no errors

So would it be correct to say this is not currently possible in node-webkit or is there a known work around? I'm aware of the getUserMedia API but it seems that only captures audio but doesn't do any speech recognition.

If it's not currently possible will it be possible in future releases?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

feature-request

Most helpful comment

According to the latest draft of the Web Speech API, the serviceURI hasn't been removed from the specification.

So I went digging a bit, which did turn up some things I haven't discovered before, but most discussion around this seems to have happened out of band, or somewhere inaccessible to the public.

There was Chromium Issue N° 480516 filed ~2 years ago, in which support was added, and subsequently reverted again a few months later. Discussion related to this happened on blink-dev/82LcTDrhshw/pGKPgrXOUaAJ, yet it is still unclear to me what exactly happened there.

It might be worth filing a new Chromium Issue to get an update on the situation.

All 83 comments

+1

Seems that permission needs to be pre-authorised in the same way it was for getUserMedia calls and screensharing.

Would love to see support for this @rogerwang and feels like a possible quick win?

+1

You could write a native-client app for chrome that is a wrapper for this. I guess this could work with phantomjs too.

+1

+1

+1

Seriously no comment on this after 7 months? +1

@isimmons @caolan @tommoor @deanshub @X4 @TalAter @manuelpaulo @jshemas

Does any of you find any workaround ?

@ThomasAy I'd have to debug it, as I've only skimmed the nodewebkit sourcecode, but to me it looks like you could patch https://github.com/rogerwang/node-webkit/blob/master/src/media/media_capture_devices_dispatcher.cc to show debug messages, when you call sr.start();
However, I've not even seen a reference of webkitSpeechRecognition, so it seems that this is currently not implemented yet. Correct me if I'm wrong, but here is what I think is missing in node-webkit to support this: http://git.chromium.org/gitweb/?p=chromium.git;a=tree;f=chrome/browser/speech

+1

+1

what about the status?

+1

+1

This can be done some through Google's web api for speech recognition, but it would be nice to have it integrated into nodeWebkit (or even nicer as a nodeJS module).

+1

+1

+1

+1

+1

+1

+1

+1 I would LOVE this feature!

+1

is this feature still not working ? i want it badly

Me too!

+1

Still no fix?!

Edit from 2020: I'm sorry that I responded like this!

@RobinMalfait this isn't a bug, it's a feature request :wink: Chrome uses Google's voice recognition service, there is no equivalent for node webkit.

FYI
I've implemented this for my own usage, though i have to warn you, it's not that nice of a solution...
steps:
1) Record the audio using web audio
2) save it as a file in the file system
3) convert file format to flac
4) send the flac file to Google API
5) receive the text hypothesees

Please see #2051 :)

+1

The feature seems useless because in order to use this you'll need an API key, under which google limited with 50 requests per day ...

https://groups.google.com/a/chromium.org/d/msg/chromium-dev/6HDtCUfIbLE/3GExi4-XnMcJ

Maybe fairly useless with Google's API, but there's a .serviceURI property on a SpeechRecognition instance: https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#api_description
which enables use with other service endpoints (i.e. even a machine local one, which is also what I was intending to experiment with):

Good point. I would love to be able to plug in my own speech recognizer.

@jhermsmeier good to know it! do you happen to know any service with which I can test my commit ?

@rogerwang Found WAMI from MIT's CSAIL, it even supports several languages (tested English & German so far): .serviceURI = 'wami.csail.mit.edu'

Jonas, could you make an example file how to use that? thanks

@RobinMalfait here's a quick thing I threw together: http://bl.ocks.org/jhermsmeier/3bc995d37f3acc0b0364

That works great! :-)

Is there anything we can do to make this happen in node-webkit?

@willemmulder from Roger's comments it looks like he's working on adding support for it.

Ah thanks, I missed the commit a few messages up. Excellent, thanks @rogerwang

@rogerwang is there any progress on this? (not pushing, just curious what the current state of things is)

@jhermsmeier I plan to add it in one of the beta versions of 0.12.0.

unfortunately there is no support for 'serviceURI' in current implementation of upstream: https://chromium.googlesource.com/chromium/blink/+/master/Source/modules/speech/SpeechRecognition.idl
so we may have to wait for upstream for it.

The current git version of NW supports google service but lacks usable API key...

Nice, thanks for the update! Wondering why serviceURI isn't in upstream yet, though. It's been in stable Chrome for a while.

Thanks for the update! I hope ServiceURI becomes available soon, then we can really start doing nice stuff!

+1

+1

+1

:+1:

+1

+1

+1

+1

+1

+1

+1

+1

+1

It should be supported by NW13.

Aaron Sullivan [email protected]于2015年12月25日周五 00:58写道:

+1


Reply to this email directly or view it on GitHub
https://github.com/nwjs/nw.js/issues/1115#issuecomment-167136747.

Is this supported already? I'm using NW13 beta 2.
And this code throw a SpeechRecognitionError (error="network", message="")

var recognition = new webkitSpeechRecognition();
recognition.onresult = function(e) {console.log(e);}
recognition.onerror =  function(e) {console.log(e);}
// Fails with and without this line: 
//recognition.serviceURI = 'wami.csail.mit.edu';
recognition.start();

I assume wami.csail.mit.edu returns an error or requires a key of sorts? What do you see in the network requests?

Does this feature work without embedding a black box binary blob from Google that accesses the microphone without user consent (and without any web app making explicit use of it)?

Please note while I might sound paranoid, this is apparently what chromium used to do to support this: https://www.privateinternetaccess.com/blog/2015/06/google-chrome-listening-in-to-your-room-shows-the-importance-of-privacy-defense-in-depth/

If this is what you plan to add to nwjs as well, maybe make the binary blob download only triggered at runtime by the web app so no app using nwjs is FORCED to run this potential spyware code, or if this needs to be embedded into nwjs from now on then please provide a separate download that doesn't contain this feature if it relies on a google blob handling the microphones.

TL;DR: make sure apps that don't want to use speech recognition don't need to get shipped with nw.js versions that potentially spy on users with the microphone without asking. Again, see the URL above.

If I use the code above in NW, it fails with a SpeechRecognitionError. If I try (with or without the wami.csail.mit.edu) in regular Google Chrome, it works fine and detects the speech correctly.
In neither case network requests are generated or showed in the devtools panel.

I want this soooooo bad.

+1

@jhm-ciberman @willemmulder I just recently ran into the same network error with the 0.14.0-sdk, might get around to looking into what's going on there in the next days.

Thanks @jhermsmeier ! Really looking forward to your findings.

I've just done some testing with Chrome, Chromium and nw.js - attempting to use a local speech recognition backend running on localhost via serviceURI. First finding: The SpeechRecognition API stopped giving a flying fuck about the serviceURI property. It's completely ignored from what I can tell... i.e. with Chromium no connection to localhost could be observed, yet the speech recognition worked perfectly fine. Might find the time to dig into the source and file a bug, if there isn't one already.

Hmm I'm pretty sure that using a different URL (I used wami.csail.mit.edu) worked before in Chrome. However, if I read this thread https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/82LcTDrhshw then there's some talks about not supporting a full URL, whatever that means. Or in other words: I'm just as surprised as you are.

Looks like it's been removed / reverted 4 months ago: https://bugs.chromium.org/p/chromium/issues/detail?id=480516#c6

Can't see a valid reason for why they did though, as from what I can see, it's still in the draft/spec (although on the other hand; that's in flux, and I have difficulties finding a recent document).

There are some mentions of going through chrome://speech-recognition to use custom recognizers – which I haven't quite fully understood, yet. Will really need to dig into the issue tracker and source to figure out what's happening with that I guess.

Any updates on this?

chrome.tts doesn't seem to work at all in nwjs.io, neither does window.speechSynthesis. Has anyone succeeded?

I thought it may be an issue of adding the right permissions to ttsEngine in the manifest, but that doesn't seem to do the trick either.

@bromagosa I just tested chrome.tts works for me. This issue is about speech recognition, not text-to-speech. Please file another issue if it has bug on your side.

To all: I just had a test, speech recognition is working with a valid google API key. So I'm closing this issue.

I'm testing with https://www.google.com/intl/en/chrome/demos/speech.html.

Check here to see how to get a key: https://www.chromium.org/developers/how-tos/api-keys

@rogerwang sorry, you're right. I was hunting for solutions for both because speech recognition wasn't working either, but I'll take a look at your links. Thanks!

p.s. I found out chrome.tts is supported in Windows, Mac and Chromebooks, but (seemingly arbitrarily) not in GNU/Linux, so that's why I can't get it to work. I may need to work around that by adding a Node binding to festival :(

The Speech API looks great but the current limits imposed by Google make it unusable for developers.

There needs to be a really good offline ASR so not to ever worry about Google and being online but that probably wont happen anytime soon.

It looks like you could use the Google Cloud Platform for speech rec, or spawn a chrome process and try piping everything through there.

I think @jhermsmeier comment is interesting. serviceURI was removed in Chromium 49 and I can't find any information as to why. I know the API and spec is in draft, shit happens and things change. But a serviceURI param would have been useful.

This really sucks because vendor lock in and to implement your own self hosted speech to text is now really difficult. Don't know if this is Google wanting all the info to flow into their veins but frustrating to say the least and the removal of the param is a step in the wrong direction in the spirit of openness.

I am probably the minority here so there is little chance anything will change now. Would be nice to be able to voice this to the Chromium team at least.

According to the latest draft of the Web Speech API, the serviceURI hasn't been removed from the specification.

So I went digging a bit, which did turn up some things I haven't discovered before, but most discussion around this seems to have happened out of band, or somewhere inaccessible to the public.

There was Chromium Issue N° 480516 filed ~2 years ago, in which support was added, and subsequently reverted again a few months later. Discussion related to this happened on blink-dev/82LcTDrhshw/pGKPgrXOUaAJ, yet it is still unclear to me what exactly happened there.

It might be worth filing a new Chromium Issue to get an update on the situation.

Thanks for digging. I've done some myself and have come to the same conclusion. Everything is still a little unclear and seems odd serviceURI to be removed after being added and I can't establish why - It's also a little frustrating.

I am hopeful it may return, for now I'm exploring other things.

1) Firefox has VoiceFill with similar functionality, I haven't dug into how it works completely but this may be interesting to you:

https://github.com/mozilla/speaktome
https://github.com/mozilla/speech-proxy

VoiceFill is written as a WebExtension unless I am mistaken. If Chromium had WebExtension support there could be hope to port it as a viable replacement easily and without much effort given that is the purpose of WebExtension.

https://developer.chrome.com/extensions

2) A node plugin but may be short lived.

3) Maybe posting in the Chromium issue tracker and linking here would be a good start into investigating.

Hello, may I know why this issue was closed. I really want to implement webkitSpeechRecognition for my nodewebkit app. Can someone please tell me if there is any solution or workaround?

Was this page helpful?
0 / 5 - 0 ratings