javadoc - new API

GoogleSTT (Speech To Text) - uses an unpublished speech service which the Chrome browser uses.  In short audio is captured, encoded into a flac file posted to some magical service residing in Google and it comes back as text !   

Pretty neat !

Some differences between this service and Sphinx

  • Sphinx is an open source voice recognition application, Google has a service which we can (at the moment) interface without cost - but the details of how the sound files are converted to text are forever hidden.
  • Sphinx can be "trained" for better quality.  
  • Typically a set of grammar must be created for Sphinx - such that it will recognize words from a set or combination of words which is specified.  Google does not have this limitation, it will attempt to match any word uttered.
  • Sphinx has ok speed and processes utterances locally.  Google has amazing speed but there is a catch, it takes time to send the Flac file over the internet to be processed - this adds time to processing.

GoogleSTT Service does a  request to Google to do a speech to text conversion.  The service monitors the microphone until an utterance is heard.  It then converts this data into a Flac file and sends it do Google over the internet.  The response from Google is text and the confidence level of the processing.

This service is in rather poor condition, there is no documentation on using the Google speech to text website - and this is a non open source system.

To try it you need to start it and look at the Java console - you should see results of the speech captured from the microphone.