Anki-android: Voice control

Created on 28 Jul 2015  Â·  31Comments  Â·  Source: ankidroid/Anki-Android

Originally reported on Google Code with ID 815

Hello,

I’m not planning to do something like this, but wouldn’t this be nice?

If I am driving a car and pair my phone with the hands-free equipment, it would be
nice if there were a voice-controlled Ankidroid. For the cards, it is no problem (just
make voice files for every vocabulary), but the selection would be difficult: One cannot
press “easy” without touching the screen.

So it would be nice if one could speak “easy“ “again” “difficult” etc., and Ankidroid
would understand it and press the appropriate button. This would also be nice for other
situations, where one cannot use the hands (while making food for example, or when
lying in the warm bed and not wanting to put the hands outside etc).

Reported by gerritsangel on 2011-10-09 20:53:38

Accepted Enhancement Help Wanted Keep Open Priority-Medium

Most helpful comment

Massive number of +1s, let's keep this open.

All 31 comments

Voice control would be nice indeed!
Thanks for the idea :-)

Note: I would never review flashcards while driving...
Driving requires full attention, and reviewing flashcards is quite an intensive activity.

Reported by nicolas.raoul on 2011-10-09 23:16:24

  • Status changed: Accepted
  • Labels added: Type-Enhancement
  • Labels removed: Type-Defect
Well, I don’t know how it is with driving - I also guess it would be quite dangerous.
But I guess if it is just repetition of cards you know pretty well, it shouldn’t be
that bad - usually one also talks when driving etc.

But other activities where you cannot use your hands (or where it is a nuisance) are
quite suited for that - cooking when you have your hands dirty, or jogging, bicycle,
etc.

I guess the main problem would then be if you have to create voice files for every
card, or if one could use a screen reader (isn’t there something built into Android?)...


Reported by gerritsangel on 2011-10-11 19:12:51

Indeed that would be useful when cooking or strength training!

Text-to-speech is available, and pure-audio decks are also a possibility.

The only difficult part would be voice recognition for "easy" "difficult" etc, I guess
that would require the user to say those words a few times.

Everyone:
Is there any speech recognition library available for Android? (preferably open source)

Reported by nicolas.raoul on 2011-10-12 01:43:36

An open source speech recognition library that works on Android: http://cmusphinx.sourceforge.net/
See http://stackoverflow.com/questions/4396046

Reported by nicolas.raoul on 2011-10-30 09:44:47

Just wondering - are there keyboard shortcuts for the Easy/Difficult - (e.g "E" and
"D") - doing that might make the recognition much easier (it would therefore become
a very small grammar indeed,

Reported by stivbennett on 2013-03-26 08:59:05

This is something that I would be interested in using, not while driving but while exercising
or doing other things with my hands.

Couldn't this just be done with the built-in android speech recognition api? It would
be useful even if it took a little while for the network traffic.

Reported by wrsaunde on 2013-08-31 02:13:38

I would find this very useful. I understand that there is quite a well developed Voice
Command system on Android for controlling quite a few applications and writing messages.
There are also other apps that act as personal assistants. Would it be possible for
these to be connected in to Ankidroid?

Reported by [email protected] on 2014-02-19 19:07:26

I'd love to have some kind of voice control in AnkiDroid. Doing reviews when you simply
cannot hold your phone in your hands would be extremely helpful.
Lying in a bathtub, walking/running, exercising etc...
Even simple control with regular/bluetooth headphones would help so much. At least
"fail" and "good" buttons as volume up/down or skip forward/backward.
Please have a look into this :)

Reported by glwisnia on 2014-11-15 12:01:10

+1. So many situations where voice control would be useful.

Reported by antony.gelberg on 2014-11-29 09:55:44

Looks like this thread is quite old ... did anyone find a way to use voice just for
learning cards all the would be needed seems to be some keywords for easy, good ...
and maybe mark, discard .. 

Reported by [email protected] on 2015-01-21 22:05:46

Voice control would absolutely rock!

Reported by dmt.lsv on 2015-01-22 10:37:43

Apparently there is a handheld version of CMU Sphinx called PocketSphinx, usable on
Android.
http://www.speech.cs.cmu.edu/pocketsphinx/
https://github.com/cmusphinx/pocketsphinx-android
https://github.com/cmusphinx/pocketsphinx-android-demo
https://softwarerecs.stackexchange.com/questions/13797/fast-voice-command-library-on-android-open-source-works-offline

Reported by nicolas.raoul on 2015-04-10 07:23:05

I started looking at this a few weeks ago, and just adapted the example app they have
to do continuous recognition of a few keywords - (one, two, three, four for levels
of difficulty, and "next" for flip the card) - but couldn't get accuracy well enough
to be usable for my needs unless I held the microphone of the phone at a specific angle
to my voice.   If anyone knows more about tuning to get better accuracy, it doesn't
seem like it would be that hard to integrate in, I just gave up because I wasn't getting
good enough results

Reported by agjohnst on 2015-04-10 15:06:12

I take that back - the poor recognition accuracy was due to my old phone being broken.
 On a new phone this works great - I have it integrated into ankidroid and AbstractFlashcardViewer
now takes basic voice commands.  Although the code is ugly (commands are hardcoded
and have to match the text in a certain assets file).  I'll submit a pull request once
I've cleaned up the code / repo

Reported by agjohnst on 2015-04-13 22:22:55

Really exciting! I don't find time to use Anki with my current schedule, but I do spend
a lot of time in traffic jams... :)

Reported by antony.gelberg on 2015-04-13 22:41:43

Issue 403 has been merged into this issue.

Reported by perceptualchaos2 on 2015-06-01 03:29:56

Issue 480 has been merged into this issue.

Reported by perceptualchaos2 on 2015-06-01 04:47:21

Hi!

Did this go anywhere, was it pulled? Is there a way to get the code? I'm very interested on this functionality! But I ain't skilled enough to find the code :( Skilled enough to compile it tho)

I created a module for Anki proper that implements voice control:
https://ankiweb.net/shared/info/1646263898
Hopefully, this might be useful to the mobile team.

Sorry completely forgot about this for a while - I never cleaned it up, and haven't updated for a current git version, but for now here is a patch off commit 43a8ad459f5223b855ea0ce1760c5f6592613c25 - if I get a chance I'll update, but I don't know enough about localization right now in anki to make this flexible. Also, it looks like I had to do the following:

1) copy the pocketsphinx models dir into AnkiDroid/src/main/assets/sync (create if doens't exist), and add AnkiDroid/src/main/assets/sync/assets.lst with contents:

models/dict/cmu07a.dic
models/grammar/digits.gram
models/grammar/menu.gram
models/grammar/menu.gram.back
models/hmm/en-us-semi/README
models/hmm/en-us-semi/feat.params
models/hmm/en-us-semi/mdef
models/hmm/en-us-semi/means
models/hmm/en-us-semi/noisedict
models/hmm/en-us-semi/sendump
models/hmm/en-us-semi/transition_matrices
models/hmm/en-us-semi/variances

2) edit AnkiDroid/src/main/assets/sync/models/grammar/menu.gram to contain:

okay
one
two
three
four
yes
no

3) copy pocketsphinx-android-0.8-nolib.jar into AnkiDroid/libs
4) copy libpocketsphinx_jni.so into the correct arch directory in either AnkiDroid/libs or AnkiDroid/jniLibs (I'm not sure which it was specifically because I copied to both).

5) apply patch:

diff --git a/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java b/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
index f8704c9..0434341 100644
--- a/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
+++ b/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java
@@ -19,6 +19,8 @@

 package com.ichi2.anki;

+import static edu.cmu.pocketsphinx.SpeechRecognizerSetup.defaultSetup;
+
 import android.annotation.SuppressLint;
 import android.app.Activity;
 import android.content.BroadcastReceiver;
@@ -106,6 +108,11 @@ import java.util.Set;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;

+import edu.cmu.pocketsphinx.Assets;
+import edu.cmu.pocketsphinx.Hypothesis;
+import edu.cmu.pocketsphinx.RecognitionListener;
+import edu.cmu.pocketsphinx.SpeechRecognizer;
+
 import timber.log.Timber;

 public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
@@ -197,6 +204,8 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
     private boolean mPrefFixArabic;
     // Android WebView
     private boolean mSpeakText;
+       private boolean mUseVoiceCommands;
+       private float mVoiceThresh;
     protected boolean mDisableClipboard = false;
     protected boolean mInvertedColors = false;
     protected boolean mNightMode = false;
@@ -423,6 +432,107 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         }
     };

+       class MyRecognitionListener implements RecognitionListener 
+       {
+       private SpeechRecognizer recognizer = null;
+       private static final String KWS_SEARCH = "keywords";
+       public static final String KW_FAIL = "no";
+       public static final String KW_OK = "yes";
+       public static final String KW_FAIL2 = "one";
+       public static final String KW_HARD = "two";
+       public static final String KW_MID = "three";
+       public static final String KW_EASY = "four";
+       public static final String KW_NEXT = "okay";
+
+               @Override
+               public void onBeginningOfSpeech() {
+               }
+
+               @Override
+               public void onEndOfSpeech() {
+               }
+
+       
+               public void stop() 
+               {
+                       if (recognizer != null)
+                       {
+                               recognizer.stop();
+                               recognizer = null;
+                       }
+               }
+
+               public void toggle()
+               {
+                       if (recognizer != null)
+                       {
+                               stop();
+                               Themes.showThemedToast(AbstractFlashcardViewer.this, "Disabled voice recognizer", true);
+                       }
+                       else
+                       {
+                               init();
+                       }
+               }
+
+               public void init()
+               {
+                       try {
+                               Assets assets = new Assets(AbstractFlashcardViewer.this);
+                               File assetDir = assets.syncAssets();
+                               setupRecognizer(assetDir);
+                               initSearch();
+                               Themes.showThemedToast(AbstractFlashcardViewer.this, "Started voice recognizer: " + Double.toString(mVoiceThresh), true);
+                       } catch (IOException e) {
+                       Themes.showThemedToast(AbstractFlashcardViewer.this, "Failed to init voice recognizer", true);
+                       }
+               }
+               private void setupRecognizer(File assetsDir) {
+                       File modelsDir = new File(assetsDir, "models");
+                       recognizer = defaultSetup()
+                                       .setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
+                                       .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
+                                       .setRawLogDir(assetsDir).setKeywordThreshold(mVoiceThresh) 
+                                       .getRecognizer();
+                       recognizer.addListener(this);
+
+                       File kwlist = new File(modelsDir, "grammar/menu.gram");
+                       recognizer.addKeywordSearch(KWS_SEARCH, kwlist);
+               }
+               private void initSearch() {
+                       recognizer.stop();
+                       recognizer.startListening(KWS_SEARCH); 
+               }
+               @Override
+               public void onPartialResult(Hypothesis hypothesis) {
+                       if (hypothesis != null)
+                       {
+                               String text = hypothesis.getHypstr();
+                               if (text != null)
+                                       trySRCommand(text);                     
+                               initSearch();
+                       }
+               }
+
+               @Override
+               public void onResult(Hypothesis hypothesis) { }
+
+               private void trySRCommand(String result)
+               {
+            Themes.showThemedToast(AbstractFlashcardViewer.this, result, true);
+                       if (result.equals(KW_FAIL) || result.equals(KW_FAIL2))
+                       { 
+                               executeCommand(GESTURE_ANSWER_EASE1); 
+                       }
+                       if (result.equals(KW_OK) || result.equals(KW_HARD)) { executeCommand(GESTURE_ANSWER_EASE2); }
+                       if (result.equals(KW_MID)) { executeCommand(GESTURE_ANSWER_EASE3); }
+                       if (result.equals(KW_EASY)) { executeCommand(GESTURE_ANSWER_EASE4); }
+                       if (result.equals(KW_NEXT)) { executeCommand(GESTURE_SHOW_ANSWER); }
+               }
+
+       };
+       private MyRecognitionListener mKeywordRecognizer = new MyRecognitionListener();
+
     private View.OnTouchListener mGestureListener = new View.OnTouchListener() {
         @Override
         public boolean onTouch(View v, MotionEvent event) {
@@ -973,6 +1083,7 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {

         stopTimer();
         Sound.stopSounds();
+               mKeywordRecognizer.stop();
     }


@@ -985,6 +1096,8 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         // Reset the activity title
         setTitle();
         updateScreenCounts();
+               if (mUseVoiceCommands)
+                       mKeywordRecognizer.init();
     }


@@ -1708,6 +1821,11 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {
         mInputWorkaround = preferences.getBoolean("inputWorkaround", false);
         mPrefFixArabic = preferences.getBoolean("fixArabicText", false);
         mSpeakText = preferences.getBoolean("tts", false);
+        mUseVoiceCommands = preferences.getBoolean("voice", false);
+               mVoiceThresh = (float)Math.pow(10.0, preferences.getInt("voiceThresh", 2000)/10000.0 * 50.0 - 40.0);
+
+       
+
         mPrefSafeDisplay = preferences.getBoolean("safeDisplay", false);
         mPrefUseTimer = preferences.getBoolean("timeoutAnswer", false);
         mWaitAnswerSecond = preferences.getInt("timeoutAnswerSeconds", 20);
@@ -2733,6 +2851,7 @@ public abstract class AbstractFlashcardViewer extends NavigationDrawerActivity {

         @Override
         public boolean onDoubleTap(MotionEvent e) {
+                       mKeywordRecognizer.toggle();
             if (mGesturesEnabled) {
                 executeCommand(mGestureDoubleTap);
             }
diff --git a/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java b/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
index 9f3e2b6..6c3b944 100644
--- a/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
+++ b/AnkiDroid/src/main/java/com/ichi2/anki/Preferences.java
@@ -107,7 +107,7 @@ public class Preferences extends PreferenceActivity implements OnSharedPreferenc
     private static String[] sListNumericCheck = {"minimumCardsDueForNotification"};
     private static String[] sShowValueInSummSeek = { "relativeDisplayFontSize", "relativeCardBrowserFontSize",
             "relativeImageSize", "answerButtonSize", "whiteBoardStrokeWidth", "swipeSensitivity",
-            "timeoutAnswerSeconds", "timeoutQuestionSeconds", "backupMax", "dayOffset" };
+            "timeoutAnswerSeconds", "timeoutQuestionSeconds", "backupMax", "dayOffset", "voiceThresh" };
     private static String[] sShowValueInSummEditText = { "deckPath" };
     private static String[] sShowValueInSummNumRange = { "timeLimit", "learnCutoff" };
     private TreeMap<String, String> mListsToUpdate = new TreeMap<>();
diff --git a/AnkiDroid/src/main/res/values/10-preferences.xml b/AnkiDroid/src/main/res/values/10-preferences.xml
index d5619d1..60c7c4d 100644
--- a/AnkiDroid/src/main/res/values/10-preferences.xml
+++ b/AnkiDroid/src/main/res/values/10-preferences.xml
@@ -87,6 +87,10 @@
     <string name="swipe_sensitivity_summ">XXX</string>
     <string name="tts">Text to speech</string>
     <string name="tts_summ">Reads out question and answer if no sound file is included</string>
+    <string name="voice">Voice commands</string>
+    <string name="voice_summ">Use voice commands</string>
+    <string name="voice_thresh">Voice command recognition threshold</string>
+    <string name="voice_thresh_summ">Threshold value to use for voice commands</string>
     <string name="sync_fetch_missing_media">Fetch media on sync</string>
     <string name="sync_fetch_missing_media_summ">Automatically fetch missing media when syncing.</string>
     <string name="sync_account">AnkiWeb account</string>
@@ -220,4 +224,4 @@
     <string name="deck_conf_cram_reschedule_summ">Reschedule cards based on my answers in this deck</string>
     <string name="deck_conf_cram_steps">Custom steps</string>
     <string name="deck_conf_cram_steps_summ">Define custom steps</string>
-</resources>
\ No newline at end of file
+</resources>
diff --git a/AnkiDroid/src/main/res/xml/preferences.xml b/AnkiDroid/src/main/res/xml/preferences.xml
index 9d17637..e838b29 100644
--- a/AnkiDroid/src/main/res/xml/preferences.xml
+++ b/AnkiDroid/src/main/res/xml/preferences.xml
@@ -397,6 +397,22 @@
                 android:key="tts"
                 android:summary="@string/tts_summ"
                 android:title="@string/tts" />
+                       <CheckBoxPreference             
+                               android:defaultValue="false"
+                               android:key="voice"
+                               android:summary="@string/voice_summ"
+                               android:title="@string/voice"/>
+                       <com.hlidskialf.android.preference.SeekBarPreference
+                               android:defaultValue="2000"
+                               android:dependency="voice"
+                               android:dialogMessage="@string/voice_thresh_summ"
+                               android:key="voiceThresh"
+                               android:max="9999"
+                               android:summary="@string/voice_thresh_summ"
+                               android:text=""
+                               android:title="@string/voice_thresh"
+                               app:interval="1"
+                               app:min="0" />
             <ListPreference
                 android:defaultValue="0"
                 android:entries="@array/dictionary_labels"

Note this can also be achieved outside of AnkiDroid app (at an Android level) by installing the currently beta Voice Access google app.

This is how it looks like, note I need to say the numbers out loud so Voice Access knows what to "click". For example I say "twelve" for AGAIN, "thirteen" for GOOD and so on (voice commands)

Here the installation instructions. Perhaps can also be achieved in a similar way with utter but haven't tried that yet.

I guess in can also be done inside the app perhaps by using the Google Cloud Speech API and following this speech/grpc example code

Guys thanks for Anki!!! is awesome!!!

@aegray Any idea what happened here (why your patch wasn't applied)? Do you have a patch against current master?

There are several factors here...

1) a pull request was never submitted so we never reviewed the code. If the author of the patch doesn't have time to make a pull request, they probably don't have time to see the feature through beta testing bug fixes to release quality either. For us it just adds more maintenance burden

2) another user gave a solution to achieve the same result without modifying AnkiDroid.

Given 2) above we would need a very persuasive argument to include the patch even if 1) were to be "solved"

That's why I pinged the patch author.

As for the "persuasive argument", that's a bit surprising. The external app (Voice Access) is clunky at best, and I think most users would see that as an unwieldy workaround. To have this in the app would take the UX up a notch or two.

I agree that the voice access method is clunky. I tried it for a bit but ultimately uninstalled it because enabling and disabling it were annoying and I didn't like having to look at the screen to figure out which number to say. It would be much better if it were built in and I could just say "again", "good", "easy", etc.

Many thanks elgalu for advice that Voice Access can be used to provide hands free ankidroid access. It's even easier to set up now:

  1. Download and install Voice Access from play store
  2. Open it
  3. Start ankidroid
  4. All touchscreen options can be operated by reading the number written next to them

So pleased to have this

Currently Voice Access is not downloadable:
Early access program is currently full, space may open up later.

I got patch for linux pc up and running, but is there any working solution for mobile?

You can download Google Voice Access from APKMirror, which seems to be the only safe site to download APKs directly outside the Play Store. Using it with AnkiDroid works like a charm!

Hello đź‘‹, this issue has been opened for more than 2 months with no activity on it. If the issue is still here, please keep in mind that we need community support and help to fix it! Just comment something like _still searching for solutions_ and if you found one, please open a pull request! You have 7 days until this gets closed automatically

Massive number of +1s, let's keep this open.

For testing of voice-control I've built a solution here https://github.com/svenmeier/Anki-Android/tree/voice-control

  • it tried Android's SpeechRecognizer, it works well but continuously plays notification sounds when used for continuous speech recognition :(
  • so I've integrated PocketSphinx instead (as @aegray has done it)
  • I've kept the changes in Anki to a bare minimum only
  • ... and moved the actual configuration outside of Anki, it has to be adjusted by each user to his language and/or speaking habits. You can see an example here https://github.com/svenmeier/Anki-Android/tree/voice-control/tools/voice-control

Maybe someone wants to try this out.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

OoDeLally picture OoDeLally  Â·  4Comments

kanjieater picture kanjieater  Â·  4Comments

david-allison-1 picture david-allison-1  Â·  4Comments

david-allison-1 picture david-allison-1  Â·  4Comments

SimonePols picture SimonePols  Â·  3Comments