Azure-docs: (REST Speech-to-text) How do you differentiate pronScore, Accuracy Score, and Fluency Score?

Created on 8 Jul 2020  Â·  7Comments  Â·  Source: MicrosoftDocs/azure-docs

The docs are unclear: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text#response-parameters

Fluency is a part of accuracy, and accuracy is a part of fluency. I don't understand what the difference is in the calculation/production of these two scores. The explanations are single lines that essentially say "x is x".

Also the pronScore is based on these two scores and "weighted" - weighted how? weighted towards what?

Forgive me if I've posted this issue in the wrong place!


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 cognitive-servicesvc cxp doc-bug doc-enhancement speech-servicsubsvc triaged

All 7 comments

@crevulus
Thanks for the feedback! We are currently investigating and will update you shortly.

@crevulus Thanks again for the feedback.
We will improve the document to make it more meaningful to help customer understand the API more easily.
For your questions, let me answer here.

About the difference between accuracy and fluency:
The accuracy score indicates the sounds accuracy of phonemes toward native pronunciation.
We calculate it on phoneme level first, and word level and full text level accuracy score is aggregated from phoneme level accuracy score.
The fluency score indicates the speech fluency of the given speech towards native speaking naturalness such as break, silence duration. It cares inter-word part. This is different from accuracy score.

The completeness score is calculated by the ratio of non-mispronunciation words towards reference text input.

For pronScore, it's the overall score which is aggregated from accuracy score, fluency score and completeness score. It's calculated by accuracyScore * X% + fluencyScore * Y% + completenessScore * Z%. There could be adjustment on the weight so we don't share it here. You can also calculate the over all score with your customized weight.
In the future we will introduce more dimensions like prosody score and aggregate it into pronScore.

Please let me know if you have further questions.

Thanks for the feedback. It was very thorough.

Prosody score would be very useful for my purposes! Please keep me updated, and you can close this ticket if you wish.

@crevulus
We will now proceed to close this thread. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.

Revised public-facing content will appear at this address within 24 hours:
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text

sign-off

@yinhew , thanks for the detailed explaination. I wonder if you have time to help me to better understand the fluency score:

Like you mentioned:

The fluency score indicates the speech fluency of the given speech towards native speaking naturalness such as break, silence duration. It cares inter-word part. This is different from accuracy score.

Is there a way for you to share more details of how exactly the fluency score is calculated? For example, I have the follow alignment result of a recording:

silence 0-0.1s
I       0.1-0.3s
<break> 0.3-0.6s
like    0.7-1.7s # the speaker pronounced the word 'like' longer than most of the native speakers.
it      1.7-2.0s
silence 2.0-2.3s

What is the fluency score for this case? And how it is calculated?
P.S.: only a rough idea of the calculation procedure is fine with me, no specific parameter is needed.

Thanks.

@weiwchu @YutongTie-MSFT I'm also very interested in your reply to @weiwchu 's query. Would be useful to know for our product.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JeffLoo-ong picture JeffLoo-ong  Â·  3Comments

mrdfuse picture mrdfuse  Â·  3Comments

bdcoder2 picture bdcoder2  Â·  3Comments

AronT-TLV picture AronT-TLV  Â·  3Comments

paulmarshall picture paulmarshall  Â·  3Comments