Google-cloud-java: How to pass Audio from Microphone to Google Speech To Text?

Created on 24 Apr 2018  Â·  28Comments  Â·  Source: googleapis/google-cloud-java

I can't find any example for this functionality on the repository :

https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-clients/google-cloud-speech

Is it supported at all ?

speech p2 question

Most helpful comment

package com.mycompany.app;

import com.google.api.gax.rpc.ClientStream;
import com.google.api.gax.rpc.ResponseObserver;
import com.google.api.gax.rpc.StreamController;
import com.google.cloud.speech.v1.*;
import com.google.protobuf.ByteString;
import java.io.IOException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.TargetDataLine;

/** Hello world! */
public class App {

  public static void demo() {

    // Target data line
    TargetDataLine line = null;
    AudioInputStream audio = null;

    // Capture Microphone Audio Data
    try {

      // Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
      int sampleRate = 16000;
      AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
      DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

      // Check if Microphone is Supported
      if (!AudioSystem.isLineSupported(info)) {
        System.out.println("Line not supported");
        System.exit(0);
      }

      // Get the target data line
      line = (TargetDataLine) AudioSystem.getLine(info);
      line.open(format);
      line.start();

      // Audio Input Stream
      audio = new AudioInputStream(line);

    } catch (Exception ex) {
      ex.printStackTrace();
    }

    // ------------------------ WHAT I NEED -----------------------------

    // Send audio from Microphone to Google Servers and return Text
    try (SpeechClient client = SpeechClient.create()) {

      ResponseObserver<StreamingRecognizeResponse> responseObserver =
          new ResponseObserver<StreamingRecognizeResponse>() {

            public void onStart(StreamController controller) {
              // do nothing
            }

            public void onResponse(StreamingRecognizeResponse response) {
              System.out.println(response);
            }

            public void onComplete() {}

            public void onError(Throwable t) {
              System.out.println(t);
            }
          };

      ClientStream<StreamingRecognizeRequest> clientStream =
          client.streamingRecognizeCallable().splitCall(responseObserver);

      RecognitionConfig recConfig =
          RecognitionConfig.newBuilder()
              .setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
              .setLanguageCode("en-US")
              .setSampleRateHertz(16000)
              .build();
      StreamingRecognitionConfig config =
          StreamingRecognitionConfig.newBuilder().setConfig(recConfig).build();

      StreamingRecognizeRequest request =
          StreamingRecognizeRequest.newBuilder()
              .setStreamingConfig(config)
              .build(); // The first request in a streaming call has to be a config

      clientStream.send(request);

      while (true) {
        byte[] data = new byte[10];
        try {
          audio.read(data);
        } catch (IOException e) {
          System.out.println(e);
        }
        request =
            StreamingRecognizeRequest.newBuilder()
                .setAudioContent(ByteString.copyFrom(data))
                .build();
        clientStream.send(request);
      }
    } catch (Exception e) {
      System.out.println(e);
    }

The code can compile but I don't have a device for testing. If you run into problems during run time feel free to post your errors here.

All 28 comments

Hi @goxr3plus ,
Can you provide more context? What is the major difference here between audio from microphone and audio produced otherwise?
If you are looking for a streaming call, Speech can take StreamingRecognizeRequest. More information about streaming calls can be found here.

Thanks for replying :) . I need to stream audio from computer microphones . I am doing it with another library which supports only Google Cloud Speech Private , though i can't find out how to with your official library in Java .

https://github.com/goxr3plus/java-google-speech-api

I am not aware of the private repo for speech as you mentioned, would you mind sharing it? Although I do believe under most circumstances features in a private repo are usually under development and should become public in the future.

@hzyi-google What i need is plug-in the microphone to the computer and 10 lines of Java code to get the audio from Microphone and do Speech Recognition with google-cloud-java library ?

How can i do that in Java Code ? I know how to capture audio from the Microphone .

How can i call StreamingRecognizeRequest with your current Java Library , can we have an example please ?? :)

This is the repository i am talking about : https://github.com/goxr3plus/java-google-speech-api

I don't need to recognize audio from .flac files etc .... just raw data from Microphone how i can do it ?

 try (SpeechClient speechClient = SpeechClient.create()) {
   RecognitionConfig.AudioEncoding encoding = RecognitionConfig.AudioEncoding.FLAC;
   int sampleRateHertz = 44100;
   String languageCode = "en-US";
   RecognitionConfig config = RecognitionConfig.newBuilder()
     .setEncoding(encoding)
     .setSampleRateHertz(sampleRateHertz)
     .setLanguageCode(languageCode)
     .build();
   String uri = "gs://bucket_name/file_name.flac";
   RecognitionAudio audio = RecognitionAudio.newBuilder()
     .setUri(uri)
     .build();
   RecognizeResponse response = speechClient.recognize(config, audio);
 }

If your question is how to capture audio from outside the computer using java, this might be what you want to refer to.

Thank you, but my question is let's say i capture the audio from the
Microphone, how to pass the Audio to the Google Text To Speech Library so
it returns text, in real time.

I need to do Real Time Speech Recognitom from 🎤 microphone using your
library how i can do it?

On Fri, Jun 22, 2018, 01:57 Hanzhen Yi notifications@github.com wrote:

If your question is how to capture audio from outside the computer using
java, this https://docs.oracle.com/javase/tutorial/sound/capturing.html
might be what you want to refer to.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-java/issues/3188#issuecomment-399269019,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATbiwCcbmHUtdxJ8-UwF3q-jL3qaxsVKks5t_CTtgaJpZM4Th3mo
.

@hzyi-google My dear friend i am waiting an answer on this , it is very important :) .

Here comes the answer my dear friend :-) Sorry I was quite busy in the past few days.

Suppose your microphone has an interface similar to Iterable:

interface Microphone {

  boolean hasNext();
  Audio next();

}

Then you could use splitCall in BidiStreamingCallable this way:


try (SpeechClient client = SpeechClient.create()) {

  ResponseObserver<StreamingRecognizeResponse> responseObserver =
      new ResponseObserver<StreamingRecognizeResponse>() {

    public void onStart(StreamController controller) {
      // do nothing
    }

    public void onResponse(StreamingRecognizeResponse response) {
      // do something with the response
    }

    public void onComplete() {
      // do something to close up
    }

    public void onError() {
      // do something to handle errors
    }
  }

  ClientStream<StreamingRecognizeRequest, StreamingRecognizeResponse> clientStream =
        speechClient.streamingRecognizeCallable().splitCall(responseObserver);

  StreamingRecognitionConfig config = StreamingRecognitionConfig.newBuilder()
    ... // set up config
    .build();

  StreamingRecognizeRequest configRequest = StreamingRecognizeRequest.newBuilder()
    .setStreamingConfig(configRequest)
    .build();  // The first request in a streaming call has to be a config

  clientStream.send(configRequest);

  while (microphone.hasNext()) {
    StreamingRecognizeRequest request = 
        createRecognizeRequestFromMicrophone(microphone.next()); // create streaming requests
    clientStream.send(request);
  }
}

Is this what you want?

I have followed all the links , trying to figure out the code you send me . I wrote a small code to show you what i need . Can you please make it a runnable example with the above code you gave me ... not assuming methods that doesn't exist , a runnable example please . I will add credentials , no problem for that ( access tokens etc).

package googleSpeech;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.TargetDataLine;

public class GoogleSpeechTest {

    public GoogleSpeechTest() {

        //Target data line
        TargetDataLine line ;


        //Capture Microphone Audio Data
        try {

            // Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
            int sampleRate = 16000;
            AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

            //Check if Microphone is Supported
            if (!AudioSystem.isLineSupported(info)) {
                System.out.println("Line not supported");
                System.exit(0);
            }

            //Get the target data line
            line= (TargetDataLine) AudioSystem.getLine(info);
            line.open(format);
            line.start();

            //Audio Input Stream
            AudioInputStream audio = new AudioInputStream(line);



        } catch (Exception ex) {
            ex.printStackTrace();
        }


        // ------------------------ WHAT I NEED -----------------------------

        //Send audio from Microphone to Google Servers and return Text

        //TAKE THE AUDIO FROM THE AudioInputStream , send to Google on the fly and return text
    }

    public static void main(String[] args) {
        new GoogleSpeechTest();
    }

}

package com.mycompany.app;

import com.google.api.gax.rpc.ClientStream;
import com.google.api.gax.rpc.ResponseObserver;
import com.google.api.gax.rpc.StreamController;
import com.google.cloud.speech.v1.*;
import com.google.protobuf.ByteString;
import java.io.IOException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.TargetDataLine;

/** Hello world! */
public class App {

  public static void demo() {

    // Target data line
    TargetDataLine line = null;
    AudioInputStream audio = null;

    // Capture Microphone Audio Data
    try {

      // Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
      int sampleRate = 16000;
      AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
      DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

      // Check if Microphone is Supported
      if (!AudioSystem.isLineSupported(info)) {
        System.out.println("Line not supported");
        System.exit(0);
      }

      // Get the target data line
      line = (TargetDataLine) AudioSystem.getLine(info);
      line.open(format);
      line.start();

      // Audio Input Stream
      audio = new AudioInputStream(line);

    } catch (Exception ex) {
      ex.printStackTrace();
    }

    // ------------------------ WHAT I NEED -----------------------------

    // Send audio from Microphone to Google Servers and return Text
    try (SpeechClient client = SpeechClient.create()) {

      ResponseObserver<StreamingRecognizeResponse> responseObserver =
          new ResponseObserver<StreamingRecognizeResponse>() {

            public void onStart(StreamController controller) {
              // do nothing
            }

            public void onResponse(StreamingRecognizeResponse response) {
              System.out.println(response);
            }

            public void onComplete() {}

            public void onError(Throwable t) {
              System.out.println(t);
            }
          };

      ClientStream<StreamingRecognizeRequest> clientStream =
          client.streamingRecognizeCallable().splitCall(responseObserver);

      RecognitionConfig recConfig =
          RecognitionConfig.newBuilder()
              .setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
              .setLanguageCode("en-US")
              .setSampleRateHertz(16000)
              .build();
      StreamingRecognitionConfig config =
          StreamingRecognitionConfig.newBuilder().setConfig(recConfig).build();

      StreamingRecognizeRequest request =
          StreamingRecognizeRequest.newBuilder()
              .setStreamingConfig(config)
              .build(); // The first request in a streaming call has to be a config

      clientStream.send(request);

      while (true) {
        byte[] data = new byte[10];
        try {
          audio.read(data);
        } catch (IOException e) {
          System.out.println(e);
        }
        request =
            StreamingRecognizeRequest.newBuilder()
                .setAudioContent(ByteString.copyFrom(data))
                .build();
        clientStream.send(request);
      }
    } catch (Exception e) {
      System.out.println(e);
    }

The code can compile but I don't have a device for testing. If you run into problems during run time feel free to post your errors here.

The code produces no errors except the credentials .

java.io.IOException: The Application Default Credentials are not available. They
 are available if running in Google Compute Engine. Otherwise, the environment v
ariable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defini
ng the credentials. See https://developers.google.com/accounts/docs/application-
default-credentials for more information.

Is there any tutorial on how to provide the credentials , i have already setup projects on Google Cloud Engine , i have Client ID and Secret Key , Access Tokens etc for Google Speech Recognition , how though i can provide them from the Java code :)

Thank you very much for helping me , all these will become tutorials on youtube by me for other people .

@hzyi-google Fount a solution for the above asking on StackOverFlow :)

The error now is .... when leaving it for more than 65 seconds.


error {
  code: 11
  message: "Exceeded maximum allowed stream duration of 65 seconds."
}

com.google.api.gax.rpc.OutOfRangeException: io.grpc.StatusRuntimeException: OUT_
OF_RANGE: Exceeded maximum allowed stream duration of 65 seconds.


Why this happens though ...

@hzyi-google In attempts of improvements i tried to modify the code to use .FLAC encoding because it is better , so added this on pom.xml

<dependency>                                  
    <groupId>com.github.axet</groupId>        
    <artifactId>java-flac-encoder</artifactId>
    <version>0.3.8</version>                  
</dependency>                                 

.setEncoding(RecognitionConfig.AudioEncoding.FLAC)

But getting the below error :

com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCE
LLED: Failed to read message.
Jul 02, 2018 3:06:09 AM io.grpc.netty.shaded.io.netty.handler.codec.http2.Defaul
tHttp2ConnectionDecoder$FrameReadListener shouldIgnoreHeadersOrDataFrame
INFO: [id: 0x7eac131d, L:/192.168.1.5:5048 - R:speech.googleapis.com/172.217.23.
138:443] ignoring HEADERS frame for stream RST_STREAM sent. {}

OutOfRangeException: speech currently limit the streaming request to 1 minute. Probably you could follow the discussions here and see if you could work around it in java.

CancelledException: You wouldn't need another artifact just to setEncoding(RecognitionConfig.AudioEncoding.FLAC). If you were to encode the audio from your microphone using the artifact above before sending them to speech api, it's likely you did not encode the audio correct. Please verify it if that's the case (probably play it?), and if you still have this issue please let me know.

I see from description that , LINEAR16 is at least double size of FLAC and that affects the speed of sending audio to Google Servers and getting back response :

[`FLAC`](https://xiph.org/flac/documentation.html) (Free Lossless Audio
 Codec) is the recommended encoding because it is
 lossless--therefore recognition is not compromised--and
 requires only about half the bandwidth of `LINEAR16`. `FLAC` stream
 encoding supports 16-bit and 24-bit samples, however, not all fields in
 `STREAMINFO` are supported.

Have you any way in mind that i can support FLAC Encoding :) ?

This is beyond the code in google-cloud-java and beyond the scope of my knowledge, sorry. You could probably share your question in their repo. I would still be happy to help you with questions related to speech api.

I'm going to close it here but feel free to ask for reopen if you still have problems with speech api.

I tried this code. But the

public void onResponse(StreamingRecognizeResponse response) {
  System.err.println("REAL Time Response >>" + response);
}

takes a long time to respond. It's not realtime at all. I know streaming speech to text should be near realtime, it works on all android devices and on web (here).
What can be the issue. My byte stream is coming near real time. I know this because I print the byte stream as soon as I mute my microphone the array goes all 0s. Can you help?

It might be because the magic number 10 in byte[] data = new byte[10]; in the above code is not a good choice. A very rough research showed that a 10 second recording would be at least the size of 1kb (depending on compression algorithm). Can you change the 10 above to probably 1000 or larger to see if that solves your problem?

Also a good think is to send FLAC audio to the server,that's why i asked it
before :)

On Fri, Jul 20, 2018, 21:27 Hanzhen Yi notifications@github.com wrote:

It might be because the magic number 10 in byte[] data = new byte[10]; in
the above code is not a good choice. A very rough research showed that a 10
second recording would be at least the size of 1kb (depending on
compression algorithm). Can you change the 10 above to probably 1000 or
larger to see if that solves your problem?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-java/issues/3188#issuecomment-406688202,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATbiwBw6UTshIQTO2cjCe_IGvOEb3YvKks5uIiEKgaJpZM4Th3mo
.

As it happens, I am accurately aware of the exact nature of my stream. It's coming from a VoIP call. The bit depth is 16, frames per second is 8000 and mono channeled raw pcm. It comes encoded in u law, but that is taken care of before forwarding it to google asr. Every byte array I add to the clientstream is 320 in length. Should I accumulate it before sending it to google? Is it too small to be sent?

"Can you change the 10 above to probably 1000 or larger to see if that solves your problem?" Thanks a lot for responding promptly. I am actually using a sip client. I have redirected the stream meant for speakers to the clientstream object. At a time 320 sized byte array is getting sent. I think this is almost real time.

In that case can you share how you redirected the stream? And can you estimate how long the delay is?

Hi I was AFK over the weekend. I am using an SIP client called peers. I get the incoming audio as a byte array. For brevity take a look here. The init() method gets called when the java VOIP client receives the call. Here I inititalize the stream etc. The writeData(byte[] buffer, int offset, int length) receives the data (at real time, I checked this already) from the other person on the VOIP. I just add this to clientStream. Please suggest. To answer your question, I am getting first response after approximately 8 seconds.

@absin1 Since your problem is more of a performance concern and not exactly the same as this question I am going to file a different issue for tracking.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fondberg picture fondberg  Â·  26Comments

hairybreeches picture hairybreeches  Â·  23Comments

naushad97 picture naushad97  Â·  30Comments

lbergelson picture lbergelson  Â·  71Comments

thefunkjunky picture thefunkjunky  Â·  22Comments