Google-cloud-node: @google-cloud/speech Operation not returning.

Created on 23 Jun 2017 · 3Comments · Source: googleapis/google-cloud-node

Environment details

OS: google-cloud-functions
Node.js version: ?
npm version: ?
google-cloud-node version: ?

Steps to reproduce

I'm trying to use the node.js speech api client to return text using google cloud functions. I have the following in my index.js:

// Get a reference to the Cloud Vision API component
const speech = require('@google-cloud/speech')();

/**
 * Background Cloud Function to be triggered by Cloud Storage.
 *
 * @param {object} event The Cloud Functions event.
 * @param {function} The callback function.
 */
exports.audio2text = function (event, callback) {
  const file = event.data;
  console.log(`uploaded file ${file.name}, beginning transcription...`)

  var config = {
    encoding: 'FLAC',
    languageCode: 'en-US',
    sampleRateHertz: 48000,
    verbose: true
  };

  speech.startRecognition(`gs://influx-audio-upload/${file.name}`, config)
    .then(function(data) {
      console.log("apiResponse: ", data[1])
      console.log("Operation started")
      data[0].on("error", (err) => {
        console.log("Operation err")
        console.log(err)
      }).on("complete", (results) => {
        console.log("Operation complete")
        results.forEach((element) =>{
          console.log(element.transcript)
        })
      })
    }).catch(err => {
      console.log("in error")
      console.log(err)
    })
  console.log("end of function")
  callback();
};

Uploading and transcribing shorter audio (~5m) works just fine, however when I try longer audio (~30m) the Operation never returns. I've tried defining the cloud function with a longer timeout (--timeout 9m) and I'm still not getting any returns. Am I on the right track or is this API not intended to function this way?

question speech

Source

jackzampolin

All 3 comments

@jackzampolin : I'll double check, but I believe that recognizing text is expected to take about as long as the audio file, so a 30m file should be done in ~30 minutes. I'm going to ping the speech team to be sure.

Either way, with things like this it's probably best to structure the app to think of speech recognition as a "background processing" task rather than a synchronous one. That is, this script running on a server somewhere would be fine because the server is always on, and Node handles the threading while waiting for events. Running on GCF is a bit trickier since it's designed for short event-triggered functions, and it'd probably make more sense to either poll for "recognition operations that are done" or have some sort of notification upon completion.

I don't think the latter is quite ready (Speech sending events via PubSub when completed), but I'll ask the team. If Speech launches a batch API (that takes in a bunch of audio and puts all that audio on Cloud Storage) you could set GCS to notify you when a new processed audio file shows up and have that trigger your .on('complete') handler.

jgeewax on 3 Jul 2017

👍1

My understanding matches @jgeewax's. The processing time increases roughly linearly with the length of the file.

Note that Operations are basically wrappers that let you say "Are we there yet? Are we there yet?". You _can_ make them block for up to 10 minutes while you wait, but it might be more useful to consider using them for polling. So, start a new function every minute, ask quickly if it is done (with a short timeout, such as 10 seconds), and then try again until you get a different result.

lukesneeringer on 5 Jul 2017

👍1

I don't believe there is anything to address in the client library, so I will close the issue. Please correct me if I'm wrong and we will re-open. Thanks for the question @jackzampolin!