Botframework-webchat: Adaptive Card speak property is not working

Created on 9 Mar 2018  Â·  37Comments  Â·  Source: microsoft/BotFramework-WebChat

Adaptive cards are supposed to support the "speak" property to denote text to say to the user (see https://docs.microsoft.com/en-us/adaptive-cards/create/speech). However, this property does not have any effect in WebChat.

Bug P1 backlog front-burner

Most helpful comment

Thanks, I'll see if I'll see if I can try it out this weekend.

All 37 comments

Web Chat will only speak when the message is replied after the user used the microphone to ask the question. Since the message replied by the bot might be abbreviated and not make sense to speak out. Thus, we have a speak property to make the spoken text different than the message shown.

We have an ask on making TTS (text-to-speech) always on at #859. If you think it's relevant, please vote it.

The current behavior is by design and we need to think a better way to surface the TTS feature.

@compulim this issue has to do with the speakproperty having no effect in WebChat, even when the microphone is used first by the user. In other words, WebChat _never_ acts on the speak property, which seems to be a bug. One would expect that if the microphone is used first by the user to ask a question, then WebChat would speak out the text that is inside the speak tag of the adaptive card that is part of the answer, but this does not happen. Please let me know if I am missing something, otherwise I request that this issue be re-opened. Thank you.

I have just tested our sample weather card. I use the speech button in Web Chat and my voice to send a request to my bot. Then my bot response with the card. I can clearly hear the following, i.e. the speak property.

S. Weather forecast for Monday is high of 62 and low of 42 degrees with a 20% chance of rain. Slash S. S. Winds will be 5 mph from the northeast. Slash S.

I am using the TTS and synthesizer from Cognitive Services, and testing from master branch, which is newer than the one on NPM.

What kind of TTS/Synthesizer are you using, browser or Cognitive Services? And can you share us the Adaptive Cards JSON file to investigate?

Thanks, that's good to know. I will upgrade to the latest code and retry it. The code base I'm using is from 11/12/17, so maybe the issue was fixed some time after that date?

We are upgrading Adaptive Cards to 1.0.0, which was released 2 days ago. That might be related.

Please do let us know you result. 😄

Is that upgrade available yet to pull from github, or should I wait until you are fully done?

We should have it out next week. If you wanna try out now, you will need to clone the repos, build it, and use NPM link to link to your project.

Steps:

cd <somewhere-not-your-project-dir>
git clone https://github.com/Microsoft/BotFramework-WebChat.git
cd BotFramework-WebChat
npm install
npm run build
npm link .
cd <your-project-dir>
npm link botframework-webchat

NPM link will build a symbolic link from your project directory to the cloned/built repos directory.

I recommend to testing it early because if it is still buggy, we can still fix it before the next publish cycle. 😉

Thanks, I'll see if I'll see if I can try it out this weekend.

I tested the sample weather card with the current code base on github. But it didn't work -- the speak property had no effect. It works in the Visualizer (when you click on the "Speak this card" button), but in WebChat the card gets displayed without any speech.

Attached is the JSON I used in WebChat (note that I had to modify the JSON a little bit for WebChat).
weather.zip

@puripuneet I will try your JSON.

Did your bot send the sample weather card because you spoke or you typed? The speak property will only be synthesized if the last action was your voice but not you typed or clicked.

The code that decide whether to synthesize or not is here and here. Internally, the Redux action that do synthesis is called Speak_SSML.

@compulim I spoke to it but it didn't speak back.

@compulim Note that I had to modify the sample weather card JSON in order to send it from my bot using the Bot Framework. To use the Bot Framework to send messages to WebChat, the JSON has to include the contentType and content properties as shown in this example.

Few questions, I am trying to repro:

  1. When you say something and expecting a text return (instead of AC), will Web Chat speak back the response?
  2. Are you using speech from browser (BotChat.Speech.BrowserSpeechRecognizer) or Cognitive Services?

(Let me reopen the issue, closed issues are difficult to find)

@compulim To answer your questions:

  1. Yes (speech works with both text messages and suggested actions).
  2. Using Cognitive Services and also our own speech synthesizer.

Few prelims on my investigations:

  1. Cognitive Services did not understand SSML (or we send it incorrectly), it do speak the XML tag in AC card speak property
    a. For example, speak = '<s>Hello</s>', Cognitive Services will say "S Hello slash-S".
  2. Chrome do not understand SSML
    a. It works when speak = 'Hello'
    b. It won't speak anything if speak = '<s>Hello</s>'

Adaptive Cards speak property support both plain text and SSML:

Specifies what should be spoken for this entire Item. This is simple text or SSML fragment

For browser-based Web Speech, we assume Chrome does not support SSML, our code here will parse SSML and convert it to SpeechSynthesisUtterance.

Since <s> is a wrong tag, our fallback code use nodeValue which should be textContent instead.

@puripuneet I have a fix in PR #895 that should fix the bug. The problem is that when we synthesize thru Web Speech API (browser-based), we have a bad XML parsing on the SSML malformed tag (<s> is not a valid SSML tag, should bypass but we missed.)

We don't have any problems on Cognitive Services synthesizer even with bad SSML.

@compulim I am not using Web Speech API, so I don't think the fix to PR #895 applies.

@puripuneet do you see anything weird on F12 console?

You mentioned you are using your own speech synthesizer. Can you tell me a little more on that?

@compulim I checked the Chrome DevTools console -- there's no error or any other weird message. Also, I don't think it has anything to do with the speech synthesizer that we are using, since it works fine for text messages and suggested actions. If I switch it out and use Cognitive Services (Bing Text-to-Speech) instead, I get no speech for adaptive cards with that either. So, it looks like the issue is only with adaptive cards.

Were you able to test using the modified JSON I had sent?

Yes, this is how I tried:

  • Create a new Azure Bot Service (WebApp Bot in Node.js)

    • Write down the Web Chat token

  • Open the bot.js (via online code editor) and add a line of code and inject your JSON in, essentially,

    • session.send(new builder.Message(session).addAttachment({ … }));

  • Open our MOCK_DL test.html in Chrome with browser speech (a.k.a. Web Speech API) and Cognitive Services, essentially,
  • Click the microphone button and speak anything

I can hear Web Chat is saying "You said …", followed by the content of your Adaptive Cards.

I'm using a slightly different deployment model for my bot -- I use Bot Channels Registration with the Azure Bot Service. And I have WebChat using the Direct Line channel to communicate with my bot.

How do you host Web Chat?

I am running it inline as described here.

Can you try pointing your BotChat.js in your <script> tag like below and give it a try?

<script src="https://unpkg.com/botframework-webchat@next/botchat.js"></script>
<script src="https://unpkg.com/botframework-webchat@next/CognitiveServices.js"></script>

I replaced both botchat.js and CognitiveServices.js as you asked, but the same issue is still there -- I speak to the bot, it responds back with the adaptive card (weather), but it doesn't speak out the text in the speak property. If you want to put some console log messages in botchat.js and have me retry, I can do that. This is a great way to debug!

Sorry was pretty tightened last week.

This is my sample bot to try out your Adaptive Cards (inlined in bot.js). I have included both bot.js and index.html.

Do you see big differences than on your side? And does it work on your side?

894.zip

I tested your bot.js with the Bot Emulator. And then I separately tested your index.html against my own bot running remotely. I don't know of a way to test both bot.js and index.html together over localhost (if this is possible, can you please share how to do it?). Unfortunately, in both cases it didn't work -- the echoed command is spoken, but not the adaptive card.

I did notice an exception in the Bot Emulator every time it processes the speak property in the adaptive card:
Uncaught TypeError: Cannot read property 'indexOf' of null C:\Users\Puneet\AppData\Local\botframework\app-3.5.35\resources\app.asar\node_modules\botframework-webchat\built\SpeechModule.js 216 32 {}

I was able to get rid of the error in the Bot Emulator by changing a couple of lines in your code as follows:

From:

session.send('You said ' + session.message.text);
var msg = new builder.Message(session).addAttachment({

To:

// session.send('You said ' + session.message.text);
var msg = new builder.Message(session).text('You said ' + session.message.text).addAttachment({

This matches how I create messages to send to the user in my code. I'm not sure why this gets rid of the error I mentioned earlier, or if it has anything to do with the issue we are troubleshooting.

The indexOf exception is fixed in 0.12.0, due to a bad SSML parsing bug in speak property. Bot Emulator is not on the latest build yet.

For development, we run bot.js over a port (3978) and use ngrok to build an incoming HTTPS endpoint. Then on Azure, we put the endpoint on an Azure as Bot Registration. We grab the Web Chat secret out.

I usually use the test.html in Web Chat to try things out. But you can bring your own index.html too. For me,

  1. At root of Web Chat, run npm run build-test followed by npm run mock
  2. Navigate to http://localhost:3000/?s=

Then it should connect to your bot.js.

For the second question, I also don't know why commented out the code would make it work. The indexOf exception should be related to SSML parsing.

Do you think it will help if I am hosting a index.html with latest Web Chat (@0.12.0)? So you can enter Web Chat secret in and try out your own bot.js with our Web Chat component.

I figured out how to test both bot.js and index.html together over localhost using ngrok as described here.

I'm not sure if it would help to test a WebChat hosted by you remotely, since the index.html that I'm using already sources your botchat.js file. Maybe what might help (if it is doable) is if you could put some console log messages in your botchat.js file so that we can compare the traces and make sure that the code is going through the expected path.

We are also having same issue, if we assign a text to AdaptiveCard.Speak it is not working with Cortana Skill. But this works, if we assign the value to IMessageActivity instance of Bot Framework. We are using AdaptiveCards of version 1.0.3.0.

Following is our code sample...

`
IMessageActivity responseMessage = context.MakeMessage();
responseMessage.Attachments.Add(new Attachment { Content = GetAdaptiveCard(), ContentType=AdaptiveCard.ContentType, Name = "Card" });

private static AdaptiveCard GetAdaptiveCard()
{
return new AdaptiveCard {
Speak = "Hello to AdaptiveCard response",
Body = new List { new AdaptiveTextBlock {
Type = AdpativeTextBlock.TypeName,
Text = "Hello to Adaptive Card response",
Size = AdaptiveTextSize.Medium,
Weight = AdaptiveTextWeight.Default,
Wrap = true
},
}
};
}
`

I was having the same issue, and I got it to work by setting the Speak property on the message which contains the AdaptiveCard rather that in the AdaptiveCard directly; something like:

var reply = context.MakeMessage();
reply.Speak = "Hello";
reply.Attachments.Add(attachment);

Hello,
I'm facing the same issue.
I use the mic to say "hello" to the bot, it replies with an adaptive card in the chat. The card has its field speak set with a string like "hello" but the Text to Speech has some problems.
I'm using the CognitiveService.js, and also botchat.js and botchat.css.

I don't face the problem if I add an audio message to a text message like this:

var mess = "Ciao come va? Posso aiutarti a prenotare presso il nostro ristorante";
var audiomess = _ttsService.GenerateSsml(mess, BotConstants.ItalianLanguage);
await turnContext.SendActivityAsync(mess, audiomess);

But I face the problem with the adaptive card.

I did some investigation and I noticed that I get a 400 error when the "https://westeurope.tts.speech.microsoft.com/cognitiveservices/v1" endpoint is called.
In particular I noticed that the request payload is different from a payload that works.

This is the payload build for TTS for card:

<speak version='1.0' xml:lang='it-it'><voice xml:lang='it-it' xml:gender='Female' name='undefined'>Ciao! Sono qui per aiutarti a trovare l&apos;abito perfetto per ogni occasione. A che tipo di evento dovrai partecipare?</voice></speak>

It doesn't work.

This is the payload build for simple text message:
<speak xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="it-IT"> <voice name="Microsoft Server Speech Text to Speech Voice (it-IT, LuciaRUS)">Ciao username! come va? Posso aiutarti a prenotare presso il nostro ristorante</voice> </speak>

It works.

So, are there news about it?

Hit a related issue when using an adaptive card with SSML for the speak property and using the Cognitive Services Speech (not Bing).

If the adaptive card has the following content for speak:

<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
   <voice name='Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)'>Hello, how can I help you today</voice>
</speak>

And using the following:

```javascript
if (subscriptionKey) {
  webSpeechPonyfillFactory = await 
  window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory({
      region: region,
      subscriptionKey
   });
}

window.WebChat.renderWebChat({
   directLine: window.WebChat.createDirectLine({ token }),
   webSpeechPonyfillFactory,
   locale: 'en-US'
}, document.getElementById('webchat'));
   document.querySelector('#webchat > *').focus();
})().catch(err => console.error(err));

The speech endpoint (https://eastus.tts.speech.microsoft.com/cognitiveservices/v1) is being sent the following payload

```xml




Hello, how can I help you today



Which returns a 400 due to the nested tags as hitting the API directly gives:
400 Voice can only be a child of speak.

So how do you reliably send adaptive card SSML (and thus be able to control the voice font) that gets used?

Closing as this is resolved in the latest version of Web Chat - v4.7.1.

await context.sendActivity({
    attachments: [CardFactory.adaptiveCard({
        "type": "AdaptiveCard",
        "version": "1.0",
        "body": [],
        "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
        "speak": "The bot should speak this"
    })],
    channelData: { 
        speak: true
    }
});
Was this page helpful?
0 / 5 - 0 ratings