MarySpeech | MyRobotLab

The MarySpeech Service can be used to generate speech from text using the MaryTTS project.

1. General

It's different from most other speech services (AcapellaSpeech & GoogleSpeech) as it is OPEN SOURCE (Yay!) and doesn't even require an internet connection (Yay! x2) as everything is generated locally and offline.

The downside is that you have to pre-download the rather big voice-file(s) you want to use.

Also the quality might not be as great compared to it's competitors (e.g. Acapella & Google), but it's free and nobody can simply "shut it down".

Another cool thing about MarySpeech is that you can apply paramerized voice effects to your voice.

2. Voices

MarySpeech has many different voices you can select and use, below is a table listing all of them. There may be more voices (and languages) available, but these are the voices officially registered in MaryTTS.

All voices are available through an additional download, you can install a voice by simply calling:

maryspeech.installComponentsAcceptLicense(voicename)

and to select a different voice:

maryspeech.setVoice(voicename) or maryspeech.setLanguage(language)

You have to install a voice before you are able to select it!
You need to restart MRL after installing a voice!
By installing a voice you accept it's license!
DATE: 05.05.2017

EDIT 7/12/2018 ( Nixie voices to use )

Voicename	Gender	Language	Mary system voice map
Obadiah	male	en-EN	dfki-obadiah-hsmm
Lucia	female	it-IT	istc-lucia-hsmm
Emma	female	de-DE	bits1-hsmm
Henry	male	en-US	cmu-rms-hsmm
Alim	male	tr-TR	dfki-ot-hsmm
Jessica	female	tr-TR	upmc-jessica-hsmm
Spike	male	en-GB	dfki-spike-hsmm
Sally	female	en-US	cmu-slt-hsmm
Camille	female	fr-FR	enst-camille-hsmm
Hans	male	de-DE	dfki-pavoque-neutral-hsmm
Poppy	female	en-GB	dfki-poppy-hsmm
Mark	male	en-US	cmu-bdl-hsmm
Pierre	male	fr-FR	upmc-pierre-hsmm
Mahi	female	te-IN	cmu-nk-hsmm
Dennys	male	fr-CA	enst-dennys-hsmm
Conrad	male	de-DE	bits3-hsmm
Prudence	female	en-GB	dfki-prudence-hsmm

Raw data for information

Language	Voicename	Gender	Type	Version	Description	License	Size	Dependencies
DE	bits1	female	unit selection	5.2	A female German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	262691025	DE, 5.2
DE	bits1-hsmm	female	hsmm	5.2	A female German hidden semi-Markov model voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	1359993	DE, 5.2
DE	bits4	female	unit selection	5.2	A female German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	274825221	DE, 5.2
DE	bits2	male	unit selection	5.2	A male German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	266072011	DE, 5.2
DE	bits3	male	unit selection	5.2	A male German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	269955538	DE, 5.2
DE	bits3-hsmm	male	hsmm	5.2	A male German hidden semi-Markov model voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals	BY-ND-3.0	1556358	DE, 5.2
DE	dfki-pavoque-neutral	male	unit selection	5.2	A male German unit selection voice	BY-ND-3.0	448866455	DE, 5.2
DE	dfki-pavoque-neutral-hsmm	male	hsmm	5.2	A male German hidden semi-Markov model voice	BY-ND-3.0	2834245	DE, 5.2
DE	dfki-pavoque-styles	male	unit selection	5.2	A male German unit selection voice with expressive styles "happy", "sad", "angry", and "poker"	BY-ND-3.0	700875468	DE, 5.2
EN_GB	dfki-poppy	female	unit selection	5.2	A female British English expressive unit selection voice: Cheerful Poppy	BY-ND-3.0	111958955	EN-GB, 5.2
EN_GB	dfki-poppy-hsmm	female	hsmm	5.2	A female British English hidden semi-Markov model voice	BY-ND-3.0	1015143	EN-GB, 5.2
EN_GB	dfki-prudence	female	unit selection	5.2	A female British English expressive unit selection voice: Pragmatic Prudence	BY-ND-3.0	293735841	EN-GB, 5.2
EN_GB	dfki-prudence-hsmm	female	hsmm	5.2	A female British English hidden semi-Markov model voice	BY-ND-3.0	1559757	EN-GB, 5.2
EN_GB	dfki-obadiah	male	unit selection	5.2	A male British English expressive unit selection voice: Gloomy Obadiah	BY-ND-3.0	165140911	EN-GB, 5.2
EN_GB	dfki-obadiah-hsmm	male	hsmm	5.2	A male British English hidden semi-Markov model voice	BY-ND-3.0	1215660	EN-GB, 5.2
EN_GB	dfki-spike	male	unit selection	5.2	A male British English expressive unit selection voice: Aggressive Spike	BY-ND-3.0	163980552	EN-GB, 5.2
EN_GB	dfki-spike-hsmm	male	hsmm	5.2	A male British English hidden semi-Markov model voice	BY-ND-3.0	1082784	EN-GB, 5.2
EN_US	cmu-slt	female	unit selection	5.2	A female English unit selection voice	CMU-ARCTIC	104627156	EN-US, 5.2
EN_US	cmu-bdl	male	unit selection	5.2	A male US English unit selection voice, built from recordings provided by Carnegie Mellon University	ARCTIC-LICENSE	95244351	EN-US, 5.2
EN_US	cmu-bdl-hsmm	male	hsmm	5.2	A male US English hidden semi-Markov model voice, built from recordings provided by Carnegie Mellon University	ARCTIC-LICENSE	1016701	EN-US, 5.2
EN_US	cmu-rms	male	unit selection	5.2	A male US English unit selection voice, built from recordings provided by Carnegie Mellon University	ARCTIC-LICENSE	121504555	EN-US, 5.2
EN_US	cmu-rms-hsmm	male	hsmm	5.2	A male US English hidden semi-Markov model voice, built from recordings provided by Carnegie Mellon University	ARCTIC-LICENSE	1027287	EN-US, 5.2
FR	enst-camille	female	unit selection	5.2	A female French unit selection voice, built at Télécom ParisTech (ENST) using data recorded by Camille Dianoux	BY-SA-3.0	214247758	FR, 5.2
FR	enst-camille-hsmm	female	hsmm	5.2	A female French hidden semi-Markov model voice, built at Télécom ParisTech (ENST) using data recorded by Camille Dianoux	BY-SA-3.0	1517857	FR, 5.2
FR	upmc-jessica	female	unit seleciton	5.2	A female French unit selection voice, built at ISIR (UPMC) using data recorded by Jessica Durand	BY-SA-3.0	151407773	FR, 5.2
FR	upmc-jessica-hsmm	female	hsmm	5.2	A female French hidden semi-Markov model voice, built at ISIR (UPMC) using data recorded by Jessica Durand	BY-SA-3.0	1118194	FR, 5.2
FR	enst-dennys-hsmm	male	hsmm	5.2	A male Québécois French hidden semi-Markov model voice, built at Télécom ParisTech (ENST)	BY-ND-3.0	1675605	FR, 5.2
FR	upmc-pierre	male	unit selection	5.2	A male French unit selection voice, built at ISIR (UPMC) using data recorded by Pierre Chauvin	BY-SA-3.0	206409457	FR, 5.2
FR	upmc-pierre-hsmm	male	hsmm	5.2	A male French hidden semi-Markov model voice, built at ISIR (UPMC) using data recorded by Pierre Chauvin	BY-SA-3.0	1556673	FR, 5.2
IT	istc-lucia-hsmm	female	hsmm	5.2	Italian female Hidden semi-Markov model voice kindly made available by Fabio Tesser	BY-ND-3.0	1466178	IT, 5.2
LB	marylux	female	unit selection	5.2	A female Luxembourgish unit selection voice	BY-NC-SA-4.0	118421559	LB, 5.2
TE	cmu-nk-hsmm	female	hsmm	5.2	A female Telugu hidden semi-Markov model voice built from voice recordings provided by IIIT Hyderabad and Carnegie Mellon University	BY-ND-3.0	3396770	TE, 5.2
TR	dfki-ot	male	unit selection	5.2	A male Turkish unit selection voice	BY-ND-3.0	161098455	TR, 5.2
TR	dfki-ot-hsmm	male	hsmm	5.2	A male Turkish hidden semi-Markov model voice	BY-ND-3.0	1365754	TR, 5.2

All voices together are somewhere around 5 GB.

You can build your own voice as well, but you should have at least a basic knowledge of "working with computers".

3. Voice Effects

Notation:
"Effect1(param1=value1,param2=value2)+Effect2"

Some examples thankfully provided by MaryTTS:
"FIRFilter+Robot(amount=50)"
"Robot(amount=100)+Chorus(delay1=866, amp1=0.24, delay2=300, amp2=-0.40,)"
"Robot(amount=80)+Stadium(amount=50)"
"FIRFilter(type=3,fc1=6000, fc2=10000) + Robot"
"Stadium(amount=40) + Robot(amount=87) + Whisper(amount=65)+FIRFilter(type=1,fc1=1540;)++"

(The following section is from the MaryTTS documentation.)

3.1. Volume Effect:
Scales the output volume by a fixed amount.
Parameter:
<amount> Definition : Amount of scaling (the output is simply multiplied by amount)
Range : [0.0,10.0]
Example:
amount:2.0;

3.2. Vocal Tract Linear Scaling Effect:
Creates a shortened or lengthened vocal tract effect by shifting the formants.
Parameter:
   <amount>   Definition : The amount of formant shifting
   Range      : [0.25,4.0]
   For values of <amount> less than 1.0, the formants are shifted to lower frequencies
       resulting in a longer vocal tract (i.e. a deeper voice).
   Values greater than 1.0 shift the formants to higher frequencies.
       The result is a shorter vocal tract.

Example:
amount:1.5;

3.3. F0 scaling effect for HMM voices:
All voiced f0 values are multiplied by <f0Scale> for HMM voices.
This operation effectively scales the range of f0 values.
Note that mean f0 is preserved during the operation.
Parameter:
   <f0Scale>   Definition : Scale ratio for modifying the dynamic range of the f0 contour
                If f0Scale>1.0, the range is expanded (i.e. voice with more variable pitch)
                If f0Scale<1.0, the range is compressed (i.e. more monotonic voice)
                If f0Scale=1.0 results in no changes in range
   Range      : [0.0,3.0]
Example:
f0Scale:2.0;

3.4. F0 mean shifting effect for HMM voices:
Shifts the mean F0 value by <f0Add> Hz for HMM voices.
Parameter:
<f0Add> Definition : F0 shift of mean value in Hz for synthesized speech output
Range : [-300.0,300.0]
Example:
f0Add:50.0;

3.5. Duration scaling for HMM voices:
Scales the HMM output speech duration by <durScale>.
Parameter:
<durScale> Definition : Duration scaling factor for synthesized speech output
Range : [0.1,3.0]
Example:
durScale:1.5;

3.6. Robotiser Effect:
Creates a robotic voice by setting all phases to zero.
Parameter:
<amount> Definition : The amount of robotic voice at the output
Range : [0.0,100.0]
Example:
amount:100.0;

3.7. Whisper Effect:
Creates a whispered voice by replacing the LPC residual with white noise.
Parameter:
<amount> Definition : The amount of whisperised voice at the output
Range : [0.0,100.0]
Example:
amount:100.0;

3.8. Stadium Effect:
Adds stadium effect by applying a specially designed multi-tap chorus.
Parameter:
<amount> Definition : The amount of stadium effect at the output
Range : [0.0,200.0]
Example:
amount:100.0

3.9. Multi-Tap Chorus Effect:
Adds chorus effect by summing up the original signal with delayed and amplitude scaled versions.
The parameters should consist of delay and amplitude pairs for each tap.
A variable number of taps (max 20) can be specified by simply defining more delay-amplitude pairs.
Each tap outputs a delayed and gain-scaled version of the original signal.
All tap outputs are summed up with the oiginal signal with appropriate gain normalization.
Parameters:
   <delay1>
   Definition : The amount of delay in miliseconds for tap #1
   Range      : [0,5000]
   <amp1>
   Definition : Relative amplitude of the channel gain as compared to original signal gain for tap #1
   Range      : [-5.0,5.0]
   <delay2>
   Definition : The amount of delay in miliseconds in delayed channel #2
   Range      : [0,5000]
   <amp2>
   Definition : Relative amplitude of the channel gain as compared to original signal gain for delayed channel #2
   Range      : [-5.0,5.0]
   ...
   <delayN>
   Definition : The amount of delay in miliseconds in delayed channel #N
   Range      : [0,5000]
   <ampN>
   Definition : Relative amplitude of the channel gain as compared to original signal gain for delayed channel #N
   Range      : [-5.0,5.0]
   Note: Maximum possible number of taps is N=20. Parameters for more taps will simply be neglected.
Example: (A three-tap chorus effect)
delay1:466;amp1:0.54;delay2:600;amp2:-0.10;delay3:250;amp3:0.30

3.10. FIR filtering:
Filters the input signal by an FIR filter.
Parameters:
   <type>
   Definition : Type of filter (1:Lowpass, 2:Highpass, 3:Bandpass, 4:Bandreject)
   Range      : {1,2,3,4}
   <fc>   Definition : Cutoff frequency in Hz for lowpass and highpass filters
   Range      : [0.0, fs/2.0] where fs is the sampling rate in Hz
   <fc1>   Definition : Lower frequency cutoff in Hz for bandpass and bandreject filters
   Range      : [0.0, fs/2.0] where fs is the sampling rate in Hz
   <fc2>   Definition : Higher frequency cutoff in Hz for bandpass and bandreject filters
   Range      : [0.0, fs/2.0] where fs is the sampling rate in Hz
Example: (A band-pass filter)
type:3;fc1:500.0;fc2:2000.0

3.11. Jet pilot effect:
Filters the input signal using an FIR bandpass filter.
Parameters: NONE

4. TL;DR

PRO:

open source
offline
free
different voices
voice effects!

CON:

not as great as paid alternatives (yet)
rather big voice-files required

5. Further links

References :

MaryTTS http://mary.dfki.de/
MaryTTS web interface http://mary.dfki.de:59125/
Blog post about supporting different voices http://myrobotlab.org/content/marytts-multi-language-support

Example code (from branch develop):

#file : MarySpeech.py (github)

#########################################
# MarySpeech.py
# categories: speech
# more info @: http://myrobotlab.org/service/MarySpeech
#########################################
 
#start Service
mouth = runtime.start("mouth", "MarySpeech")
 
#possible voices ( selected voice is stored inside config until you change it )
print ("these are the voices I can have", mouth.getVoices())
print ("this is the voice I am using", mouth.getVoice())
 
#switch voice:
mouth.setVoice("Mark")
#mouth.setVoice("Camille")
#etc...
 
#speakBlocking!
# this blocks until speaking is done
mouth.speakBlocking(u"Hello world")
mouth.speakBlocking(u"I speak English. More voices are available, but they need to be installed")
mouth.speakBlocking(u"Echo echo echo")
mouth.speakBlocking(u"What should I use")
 
mouth.setVolume(0.7)
mouth.speakBlocking("Silent please")
mouth.setVolume(1.0)
#speak!
# this not blocks speaking and next line is executed immediatly
mouth.speak(u"Happy birthday Kyle")

thank you ! great

thank you ! great documentation...

In french the only good voice I found is pierre a male voice

and changed to women like this : mouth.setAudioEffects("F0Add(f0Add=90.0)+TractScaler(amount=1.2)")

Good instructions

Good instructions MaVo!!

Using the parameters is nice and handy!

Since you are with your hands into voices, is MBrola voices a branch of Marytts? There seems to be lot of various voices in their list.

http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html

large post - great but almost

large post - great but almost a bit overhelming.

What I miss is how exactly we can add a voice to mrl. My procedure was to first download the runtime package from marytts webpage. In the bin folder it has a "marytts-componene-installer.bat". Starting it we can select the voices to download.

You will get a zip file in the download folder relativ from where you started the .bat command. Unzipping it will create a new folder lib where you should have a "voice-xxx.jar" now. You need to copy this jar into your ".../mrl/libraries/jar"-folder AND RESTART MRL!

It might be that we already have a magical feature in MRL which does this automatically - but using my method of creating an "mrl_<version>" folder with each new version I found that I had to copy the voice jar into the jar folder myself.

And it might help to add a python example line of how to apply the voice modifications for all the different options?

To install a

To install a voice:

maryspeech.installComponentsAcceptLicense(voicename)

This will not only download the correct jar, but also the additional voice files (if the voice has them) and put both in the correct location.

(More information on installing voices at the start of "2. Voices")

A Python example script is really needed.

With brackets [[

With brackets [ [service/MarySpeech.py] ] - the page will pick up the pyrobotlab/servce/MarySpeech.py script and format it ...

I added it to the top but it seems pretty bare .. no example of loading voices ...

FYI - Its the develop branch of the script in pyrobotlab.
At some point I'll add a selector to service pages so you can choose which branch your documentation/script examples come from ...

Another thing on the list :)

Hi MaVo !

Thanks for the updated docs

Thanks for the updated docs MaVo !

object has no attribute 'setAudioEffects'

I was using this line in 5_Mouth.py

mouth.setAudioEffects("TractScaler(amount=0.9)")

I try to get the voice of Terminator using Spike voice
In 1.0.2694 it works, but in 1.1.268 this error appears:

File "C:/mrl/Nixie_1.1.268/InMoov/services/5_Mouth.py", line 206, in <module>
mouth.setAudioEffects("TractScaler(amount=0.9)")
AttributeError: 'org.myrobotlab.service.MarySpeech' object has no attribute 'setAudioEffects'

It does not work in last version yet or I have to set it up somewhere else?

Thank you!

Ooops .. I refactored a

Ooops .. I refactored a little too much ..
Ok, should be fixed in the latest - http://build.myrobotlab.org:8888/getLatest

lets us know if it all worky !

Worky!!! Thank you GroG!

Worky!!!

Thank you GroG!

Example configuration (from branch develop):

#file : MarySpeech.py (github)

!!org.myrobotlab.service.config.MarySpeechConfig
blocking: false
listeners: null
mute: false
peers:
  audioFile:
    autoStart: true
    name: maryspeech.audioFile
    type: AudioFile
speechRecognizers: null
substitutions: null
type: MarySpeech
voice: null