Google-cloud-ruby: How to stream microphone input for Google Cloud Speech API in Ruby, Mac OS X?

Created on 19 Mar 2018 · 17Comments · Source: googleapis/google-cloud-ruby

Hi! I decided to use this project as part of our company hack days to build a voice activated game, but struggled finding how to stream microphone input into it. After digging through the docs, there's some examples using "MicrophoneInput.read" but no link to what this is supposed to be.

After searching around a bit, I found easy_audio https://github.com/lsegal/easy_audio but it was not easy to work with, as the library itself is old and portaudio (the underlying library and C ffi calling code) is difficult to get to grips with. It would be great if there was either a recommended library or some more stuff put into this library to provide a microphone sampling api. Some examples of functionality could be: listen for x seconds, listen for a certain word/phrase.

In any case, some clarity around this would be great, as ruby really does not seem to have much in the way of sound libraries, and those that do exist are old and difficult to use.

Cheers!

speech question

Source

JamesZoft

All 17 comments

@Jarob22 Thanks for opening this issue! We definitely want to help people get up and running quickly with these awesome (and fun) ML APIs like Cloud Speech API.

@blowmage Any ideas on how we might provide a bit more onboarding help in our Speech docs with microphone input?

quartzmo on 19 Mar 2018

@Jarob22 What platform are you targeting? Most of the libraries used to access a microphone are platform specific, which is why we used a made-up library in the code examples. Unfortunately, your observations matches ours: collecting audio from an input in Ruby is not well supported or difficult to use.

Here is a streaming speech example running on OS X using the coreaudio gem. It collects the microphone input and formats it for the LINEAR16 format.

require "google/cloud/speech"
require "coreaudio"

input_device = CoreAudio.default_input_device
input_buffer = input_device.input_buffer 1024

speech = Google::Cloud::Speech.new

# Stream audio until the first utterance is found.
stream = speech.stream encoding: :linear16, 
                       language: "en-US",
                       sample_rate: input_device.actual_rate

stream.start
input_buffer.start

25.times do
  break if stream.stopped?
  # Collect data from only the left channel
  bits = @input_buffer.read(4096).to_a.map(&:first)
  # Convert the bits to 16-bit signed little-endian samples
  sample = bits.pack("s<*")
  # Send the sample to the Speech stream
  stream.send sample
end

input_buffer.stop
stream.stop
stream.wait_until_complete!

results = stream.results
puts results.first.transcript

blowmage on 19 Mar 2018

👍1

If you want to watch me stumble through live coding this example you can see it here.

blowmage on 19 Mar 2018

Hi @blowmage,

I'm using Mac, and even this would have been preferable to easy_audio. I spent an entire day being stuck because of some arcane issues existing between the C ffi and the GIL!

What data does CoreAudio return? One of the other confusions with easy_audio (well really portaudio) was that the data it returned was not a series of floats -1 to +1, but a series of floats from -? to +? (I never managed to figure out what the max/min values were).

JamesZoft on 19 Mar 2018

CoreAudio returns an NArray of a pair of Integers. Most of CoreAudio is C and Objective C, so I remember looking at the source a lot to figure out how it works.

blowmage on 19 Mar 2018

Ah, are these integers between -32768 and 32767? I was converting portaudio's floats and capping them at those values to get a stream of ints, then converting to little endian samples and packing. I guess this would work - what I was trying to do was set the stream going, append the audio input to a temporary buffer, then when the amplitude went above a threshold, start actually listening for a command word (like google home/echo has 'hey google' except mine was going to be something like 'play') then the command like 'move left' or 'throw fireball' etc.

JamesZoft on 19 Mar 2018

I believe OS X CoreAudio uses 16bit samples for audio, so it should be fine.

blowmage on 19 Mar 2018

Alright. The code I initially used to record a simple command is in https://github.com/Jarob22/uocum_ludum/blob/master/sokoban.rb at the bottom if you wanted to use it for examples. Yours would be a great one to have, @blowmage.

JamesZoft on 19 Mar 2018

The current Speech example for streaming looks like this:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

stream = speech.stream encoding: :linear16,
                       language: "en-US",
                       sample_rate: 16000

# Stream 5 seconds of audio from the microphone
# Actual implementation of microphone input varies by platform
5.times do
  stream.send MicrophoneInput.read(32000)
end

The fictional MicrophoneInput.read is explained as: "Actual implementation of microphone input varies by platform". Personally I'm hesitant about getting deeper into platform-specific details, or showing easy_audio examples that may be problematic for some users.

It's great that we're documenting OS X solutions here in this issue.

@frankyn Any thoughts regarding a Ruby tutorial for real-world Speech input streaming?

quartzmo on 19 Mar 2018

👍1

Rewrote the issue title for future searchers.

quartzmo on 19 Mar 2018

Closing, since @blowmage is in agreement that replacing MicrophoneInput.read with real-world code in the API doc example is currently problematic due to the platform-specific nature of the solutions.

quartzmo on 19 Mar 2018

Thanks again for bringing this up @Jarob22, I'm sure the code that @blowmage posted and the code that you've linked will be very helpful for other users.

quartzmo on 19 Mar 2018

@quartzmo No worries, hope this helps future people. In lieu of replacing the current documentation with platform specific stuff, maybe at least a link to this ticket for "a couple of real world examples" or similar wording would be useful?

JamesZoft on 19 Mar 2018

Let's see what @frankyn has to say.

quartzmo on 19 Mar 2018

Ok!

JamesZoft on 19 Mar 2018

Is coreAudio gem support linux?