Make devices speak arbitrary text
public static class TextToSpeech
{
public static int MaxSpeechInputLength { get; }
public static Task SpeakAsync (string text, CancellationToken cancelToken = default(CancellationToken));
public static Task SpeakAsync (string text, SpeakSettings settings, CancellationToken cancelToken = default(CancellationToken));
public static Task GetLocalesAsync ();
}
public struct SpeakSettings
{
public Locale Locale;
public float? Pitch;
public float? SpeakRate;
public float? Volume;
}
public static class Locale
{
public string Language { get; }
public string Country { get; }
public string Name { get; }
}
The table above tries to show what each platform does. Android is mostly just "1.0" is normal and then multiply. iOS has specific ranges and defaults. UWP follows the SSML and appears to support an "enum"-based method as well as a flat percentage-based method. We could use the "enum" values and just map them to the platform restrictions. Or we could go with a percentage based and just clip them when they exceed the range for a particular platform.

Any reason for creating a new class (Locale) instead of using CultureInfo?
This is a great point @alfredmyers I suspect it was from the old plugin which may have been done for reasons before netstandard.
We should change this
I think the difference is that the codes each engine uses for culture info may not match the .NET codes.... however we would have to validate this.
If that's the case it's plausible we could write a mapping. That could be a useful API regardless of text to speech.
As an example, of what @jamesmontemagno said, even within a single OS, different TTS engines can return locales in different formats.
For instance, on Android Brazilian Portuguese is returned as:
This is specially important if you're going to check if a language is supported querying on the proposed GetLocalesAsync method.
From what I could grasp going through the different TextToSpeech implementations in @jamesmontemagno 's TextToSpeechPlugin, the only one that needs a MaxSpeechInputLength property is the implementation for Android.
If that is the case, and we could solve the issue splitting the text from within the implementation for Android, would exposing MaxSpeechInputLength on the API still be necessary?
Yes I love this idea!
While we are writing code, we realised that splitting the text internally may be a very hard task. English is easy, we separate words with spaces and sentences with punctuation. But, many non-English languages are quite different.
Android appears to be the guy with the limit, but just because they are the only one, it doesn't mean we can't still create a cross-platform way.
If we are to add a property to return the limit, we can have a rule that if there is _no_ limit, we return -1 and if there _is_ a limit, return that.
The Android source appears to have a limit of 4K:
https://github.com/aosp-mirror/platform_frameworks_base/blob/b056324630b8adfeb38393bcab49f3b9c720f4fd/core/java/android/speech/tts/TextToSpeech.java#L2364-L2366
In addition, other speech engines may have other limits as seen here: https://stackoverflow.com/questions/19312536/android-tts-fails-to-speak-large-amount-of-text
@mattleibow
- there may be languages that don't use punctuation
- there may be languages that are backwards
- there may be languages that have multi-byte characters
I have a prototype of a method that splits the string on the nearest punctuation mark or white space just before MaxSpeechInputLength.
I really don't have experience with RTL languages, but I still have a copy of Developing International Software lying around. Best case, if it is only a matter of iterating over the string in reverse order, I can take a look into it.
@alfredmyers I did not have time to finish and test the whole SplitText method, but I ended up iterating from the buffer end in reverse order searching for Char.IsPunctuation and Char.IsWhitespace. From that position the process is repeated.
I really don't have experience with RTL languages,
My experience is not huge in that area, but I asked some people that did something with TTS in Persian. I have received an answer, but waiting for more info.
Hi guys, I'm the Persian language. @moljac say to me that we have the problem with spiting RTL text.
I don't have experience with TTS but can help with issues.
In the first case, I present a Persian TTS Project that it's ParsKhan. This open source project developed by iranian programmers ( that solved many matters in RTL).
ParsKhan does not have code quality ( i refactor it somewhat) but we don't need project codes, just use algorithms and scenarios.
ParsKhan has the doc but doc language is Persian ( if it's necessary I translate it ).
ParsKhan Repo is here