I have to admitt I am a bit confused about all the popping up services about speech.


Could anyone provide a short description about the use and how they are supposed to work together?

A standard script that starts up InMoov, listens to a command given on a local/remote system and talks including mouth control would be very welcome.




7 years 9 months ago

Hi Juerg,

  Hopefully I can help put some more detail on this for you.  Lets break it down a bit more into the categories.

  1. SpeechRecognition - this is using a microphone to listen to you and understand the words that you are speaking.
  2. SpeechSynthesis - taking a string of text and making sound so you can listen to what was being said.
  3. Audio Playback - the ability to play an audio file like an mp3.
  4. ChatBot - these are services that can take spoken text and come up with an answers to questions , hopefully something that is understandable by a human.

Ok, so for speech recognition we have

Speech Recognition:

  • Sphinx - CMU , old version of speech recognition.  Must know what it's listening for.  Poor accuracy.  Does not need to be connected to the internet.
  • WebkitSpeechRecognizer - based on Google's speech recognition.  Only works with chrome web browser, handles many languages.  very accurate, requires internet connection
  • AndroidSpeechRecognition - added By MaVo.  Uses an android app to hit the REST api in MRL. 

Speech Synthesis

  • AcapelaSpeech - uses a web service to generate an mp3 file that represents the spoken text.  That resulting file is cached on disk locally.  It has many voices and can pronouce things in many languages depending on the voice chosen.  Initial lookup requires internet connection.
  • MarySpeech - uses algorithms to generate speech.  Does not require internet connection.  Additional voice packs can be added, but probably needs some work.  

Audio Playback

  • AudioFile - can play back an mp3 for you.  This is used by AcapelaSpeech to play back the audio.

Chat Bots

  • ProgramAB - a chat bot engine that implements the AIML standard.  You can specify categories that have an expected pattern to match against the question and a template that is the response for that template.

Other Services

  • Mouth Control - animates a servo based on the text that is being spoken.
  • AudioCapture - record audio from the microphone.  (I haven't tested this in quite some time.. not sure what state its in.
  • ThingSpeak - this is actually for RFID stuff and has nothing to do with speech or natural language.


Hope that helps!



Hi Kevin

Thanks a ton for the extended explanations - sorry for making you write to much! Makes things a bit clearer to me.

You left open which services I need to start and how I have to connect them to have e.g.

WebGui -> WebkitSpeechRecognition

a) -> ProgramAB -> AcapelaSpeech -> MouthControl for getting response on general questions

b) -> python code to look for trigger words in the results of WebkitSpeechRecognition and move the robot and if wanted calling AcapelaSpeech and MouthControl for comments?

P.S. spending a lot of time listening to Oussama ...