The MarySpeech Service can be used to generate speech from text using the MaryTTS project.
1. General
It's different from most other speech services (AcapellaSpeech & GoogleSpeech) as it is OPEN SOURCE (Yay!) and doesn't even require an internet connection (Yay! x2) as everything is generated locally and offline.
The downside is that you have to pre-download the rather big voice-file(s) you want to use.
Also the quality might not be as great compared to it's competitors (e.g. Acapella & Google), but it's free and nobody can simply "shut it down".
Another cool thing about MarySpeech is that you can apply paramerized voice effects to your voice.
2. Voices
MarySpeech has many different voices you can select and use, below is a table listing all of them. There may be more voices (and languages) available, but these are the voices officially registered in MaryTTS.
All voices are available through an additional download, you can install a voice by simply calling:
maryspeech.installComponentsAcceptLicense(voicename)
and to select a different voice:
maryspeech.setVoice(voicename) or maryspeech.setLanguage(language)
- You have to install a voice before you are able to select it!
- You need to restart MRL after installing a voice!
- By installing a voice you accept it's license!
- DATE: 05.05.2017
EDIT 7/12/2018 ( Nixie voices to use )
Voicename |
Gender |
Language |
Mary system voice map |
Obadiah |
male |
en-EN |
dfki-obadiah-hsmm |
Lucia |
female |
it-IT |
istc-lucia-hsmm |
Emma |
female |
de-DE |
bits1-hsmm |
Henry |
male |
en-US |
cmu-rms-hsmm |
Alim |
male |
tr-TR |
dfki-ot-hsmm |
Jessica |
female |
tr-TR |
upmc-jessica-hsmm |
Spike |
male |
en-GB |
dfki-spike-hsmm |
Sally |
female |
en-US |
cmu-slt-hsmm |
Camille |
female |
fr-FR |
enst-camille-hsmm |
Hans |
male |
de-DE |
dfki-pavoque-neutral-hsmm |
Poppy |
female |
en-GB |
dfki-poppy-hsmm |
Mark |
male |
en-US |
cmu-bdl-hsmm |
Pierre |
male |
fr-FR |
upmc-pierre-hsmm |
Mahi |
female |
te-IN |
cmu-nk-hsmm |
Dennys |
male |
fr-CA |
enst-dennys-hsmm |
Conrad |
male |
de-DE |
bits3-hsmm |
Prudence |
female |
en-GB |
dfki-prudence-hsmm |
Raw data for information
Language | Voicename | Gender | Type | Version | Description | License | Size | Dependencies |
DE | bits1 | female | unit selection | 5.2 | A female German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 262691025 | DE, 5.2 |
DE | bits1-hsmm | female | hsmm | 5.2 | A female German hidden semi-Markov model voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 1359993 | DE, 5.2 |
DE | bits4 | female | unit selection | 5.2 | A female German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 274825221 | DE, 5.2 |
DE | bits2 | male | unit selection | 5.2 | A male German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 266072011 | DE, 5.2 |
DE | bits3 | male | unit selection | 5.2 | A male German unit selection voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 269955538 | DE, 5.2 |
DE | bits3-hsmm | male | hsmm | 5.2 | A male German hidden semi-Markov model voice, built from voice recordings provided by the BITS project at the Bavarian Archive of Speech Signals | BY-ND-3.0 | 1556358 | DE, 5.2 |
DE | dfki-pavoque-neutral | male | unit selection | 5.2 | A male German unit selection voice | BY-ND-3.0 | 448866455 | DE, 5.2 |
DE | dfki-pavoque-neutral-hsmm | male | hsmm | 5.2 | A male German hidden semi-Markov model voice | BY-ND-3.0 | 2834245 | DE, 5.2 |
DE | dfki-pavoque-styles | male | unit selection | 5.2 | A male German unit selection voice with expressive styles "happy", "sad", "angry", and "poker" | BY-ND-3.0 | 700875468 | DE, 5.2 |
EN_GB | dfki-poppy | female | unit selection | 5.2 | A female British English expressive unit selection voice: Cheerful Poppy | BY-ND-3.0 | 111958955 | EN-GB, 5.2 |
EN_GB | dfki-poppy-hsmm | female | hsmm | 5.2 | A female British English hidden semi-Markov model voice | BY-ND-3.0 | 1015143 | EN-GB, 5.2 |
EN_GB | dfki-prudence | female | unit selection | 5.2 | A female British English expressive unit selection voice: Pragmatic Prudence | BY-ND-3.0 | 293735841 | EN-GB, 5.2 |
EN_GB | dfki-prudence-hsmm | female | hsmm | 5.2 | A female British English hidden semi-Markov model voice | BY-ND-3.0 | 1559757 | EN-GB, 5.2 |
EN_GB | dfki-obadiah | male | unit selection | 5.2 | A male British English expressive unit selection voice: Gloomy Obadiah | BY-ND-3.0 | 165140911 | EN-GB, 5.2 |
EN_GB | dfki-obadiah-hsmm | male | hsmm | 5.2 | A male British English hidden semi-Markov model voice | BY-ND-3.0 | 1215660 | EN-GB, 5.2 |
EN_GB | dfki-spike | male | unit selection | 5.2 | A male British English expressive unit selection voice: Aggressive Spike | BY-ND-3.0 | 163980552 | EN-GB, 5.2 |
EN_GB | dfki-spike-hsmm | male | hsmm | 5.2 | A male British English hidden semi-Markov model voice | BY-ND-3.0 | 1082784 | EN-GB, 5.2 |
EN_US | cmu-slt | female | unit selection | 5.2 | A female English unit selection voice | CMU-ARCTIC | 104627156 | EN-US, 5.2 |
EN_US | cmu-bdl | male | unit selection | 5.2 | A male US English unit selection voice, built from recordings provided by Carnegie Mellon University | ARCTIC-LICENSE | 95244351 | EN-US, 5.2 |
EN_US | cmu-bdl-hsmm | male | hsmm | 5.2 | A male US English hidden semi-Markov model voice, built from recordings provided by Carnegie Mellon University | ARCTIC-LICENSE | 1016701 | EN-US, 5.2 |
EN_US | cmu-rms | male | unit selection | 5.2 | A male US English unit selection voice, built from recordings provided by Carnegie Mellon University | ARCTIC-LICENSE | 121504555 | EN-US, 5.2 |
EN_US | cmu-rms-hsmm | male | hsmm | 5.2 | A male US English hidden semi-Markov model voice, built from recordings provided by Carnegie Mellon University | ARCTIC-LICENSE | 1027287 | EN-US, 5.2 |
FR | enst-camille | female | unit selection | 5.2 | A female French unit selection voice, built at Télécom ParisTech (ENST) using data recorded by Camille Dianoux | BY-SA-3.0 | 214247758 | FR, 5.2 |
FR | enst-camille-hsmm | female | hsmm | 5.2 | A female French hidden semi-Markov model voice, built at Télécom ParisTech (ENST) using data recorded by Camille Dianoux | BY-SA-3.0 | 1517857 | FR, 5.2 |
FR | upmc-jessica | female | unit seleciton | 5.2 | A female French unit selection voice, built at ISIR (UPMC) using data recorded by Jessica Durand | BY-SA-3.0 | 151407773 | FR, 5.2 |
FR | upmc-jessica-hsmm | female | hsmm | 5.2 | A female French hidden semi-Markov model voice, built at ISIR (UPMC) using data recorded by Jessica Durand | BY-SA-3.0 | 1118194 | FR, 5.2 |
FR | enst-dennys-hsmm | male | hsmm | 5.2 | A male Québécois French hidden semi-Markov model voice, built at Télécom ParisTech (ENST) | BY-ND-3.0 | 1675605 | FR, 5.2 |
FR | upmc-pierre | male | unit selection | 5.2 | A male French unit selection voice, built at ISIR (UPMC) using data recorded by Pierre Chauvin | BY-SA-3.0 | 206409457 | FR, 5.2 |
FR | upmc-pierre-hsmm | male | hsmm | 5.2 | A male French hidden semi-Markov model voice, built at ISIR (UPMC) using data recorded by Pierre Chauvin | BY-SA-3.0 | 1556673 | FR, 5.2 |
IT | istc-lucia-hsmm | female | hsmm | 5.2 | Italian female Hidden semi-Markov model voice kindly made available by Fabio Tesser | BY-ND-3.0 | 1466178 | IT, 5.2 |
LB | marylux | female | unit selection | 5.2 | A female Luxembourgish unit selection voice | BY-NC-SA-4.0 | 118421559 | LB, 5.2 |
TE | cmu-nk-hsmm | female | hsmm | 5.2 | A female Telugu hidden semi-Markov model voice built from voice recordings provided by IIIT Hyderabad and Carnegie Mellon University | BY-ND-3.0 | 3396770 | TE, 5.2 |
TR | dfki-ot | male | unit selection | 5.2 | A male Turkish unit selection voice | BY-ND-3.0 | 161098455 | TR, 5.2 |
TR | dfki-ot-hsmm | male | hsmm | 5.2 | A male Turkish hidden semi-Markov model voice | BY-ND-3.0 | 1365754 | TR, 5.2 |
All voices together are somewhere around 5 GB.
You can build your own voice as well, but you should have at least a basic knowledge of "working with computers".
3. Voice Effects
Notation:
"Effect1(param1=value1,param2=value2)+Effect2"
Some examples thankfully provided by MaryTTS:
"FIRFilter+Robot(amount=50)"
"Robot(amount=100)+Chorus(delay1=866, amp1=0.24, delay2=300, amp2=-0.40,)"
"Robot(amount=80)+Stadium(amount=50)"
"FIRFilter(type=3,fc1=6000, fc2=10000) + Robot"
"Stadium(amount=40) + Robot(amount=87) + Whisper(amount=65)+FIRFilter(type=1,fc1=1540;)++"
(The following section is from the MaryTTS documentation.)
3.1. Volume Effect:
Scales the output volume by a fixed amount.
Parameter:
<amount> Definition : Amount of scaling (the output is simply multiplied by amount)
Range : [0.0,10.0]
Example:
amount:2.0;
3.2. Vocal Tract Linear Scaling Effect:
Creates a shortened or lengthened vocal tract effect by shifting the formants.
Parameter:
<amount> Definition : The amount of formant shifting
Range : [0.25,4.0]
For values of <amount> less than 1.0, the formants are shifted to lower frequencies
resulting in a longer vocal tract (i.e. a deeper voice).
Values greater than 1.0 shift the formants to higher frequencies.
The result is a shorter vocal tract.
Example:
amount:1.5;
3.3. F0 scaling effect for HMM voices:
All voiced f0 values are multiplied by <f0Scale> for HMM voices.
This operation effectively scales the range of f0 values.
Note that mean f0 is preserved during the operation.
Parameter:
<f0Scale> Definition : Scale ratio for modifying the dynamic range of the f0 contour
If f0Scale>1.0, the range is expanded (i.e. voice with more variable pitch)
If f0Scale<1.0, the range is compressed (i.e. more monotonic voice)
If f0Scale=1.0 results in no changes in range
Range : [0.0,3.0]
Example:
f0Scale:2.0;
3.4. F0 mean shifting effect for HMM voices:
Shifts the mean F0 value by <f0Add> Hz for HMM voices.
Parameter:
<f0Add> Definition : F0 shift of mean value in Hz for synthesized speech output
Range : [-300.0,300.0]
Example:
f0Add:50.0;
3.5. Duration scaling for HMM voices:
Scales the HMM output speech duration by <durScale>.
Parameter:
<durScale> Definition : Duration scaling factor for synthesized speech output
Range : [0.1,3.0]
Example:
durScale:1.5;
3.6. Robotiser Effect:
Creates a robotic voice by setting all phases to zero.
Parameter:
<amount> Definition : The amount of robotic voice at the output
Range : [0.0,100.0]
Example:
amount:100.0;
3.7. Whisper Effect:
Creates a whispered voice by replacing the LPC residual with white noise.
Parameter:
<amount> Definition : The amount of whisperised voice at the output
Range : [0.0,100.0]
Example:
amount:100.0;
3.8. Stadium Effect:
Adds stadium effect by applying a specially designed multi-tap chorus.
Parameter:
<amount> Definition : The amount of stadium effect at the output
Range : [0.0,200.0]
Example:
amount:100.0
3.9. Multi-Tap Chorus Effect:
Adds chorus effect by summing up the original signal with delayed and amplitude scaled versions.
The parameters should consist of delay and amplitude pairs for each tap.
A variable number of taps (max 20) can be specified by simply defining more delay-amplitude pairs.
Each tap outputs a delayed and gain-scaled version of the original signal.
All tap outputs are summed up with the oiginal signal with appropriate gain normalization.
Parameters:
<delay1>
Definition : The amount of delay in miliseconds for tap #1
Range : [0,5000]
<amp1>
Definition : Relative amplitude of the channel gain as compared to original signal gain for tap #1
Range : [-5.0,5.0]
<delay2>
Definition : The amount of delay in miliseconds in delayed channel #2
Range : [0,5000]
<amp2>
Definition : Relative amplitude of the channel gain as compared to original signal gain for delayed channel #2
Range : [-5.0,5.0]
...
<delayN>
Definition : The amount of delay in miliseconds in delayed channel #N
Range : [0,5000]
<ampN>
Definition : Relative amplitude of the channel gain as compared to original signal gain for delayed channel #N
Range : [-5.0,5.0]
Note: Maximum possible number of taps is N=20. Parameters for more taps will simply be neglected.
Example: (A three-tap chorus effect)
delay1:466;amp1:0.54;delay2:600;amp2:-0.10;delay3:250;amp3:0.30
3.10. FIR filtering:
Filters the input signal by an FIR filter.
Parameters:
<type>
Definition : Type of filter (1:Lowpass, 2:Highpass, 3:Bandpass, 4:Bandreject)
Range : {1,2,3,4}
<fc> Definition : Cutoff frequency in Hz for lowpass and highpass filters
Range : [0.0, fs/2.0] where fs is the sampling rate in Hz
<fc1> Definition : Lower frequency cutoff in Hz for bandpass and bandreject filters
Range : [0.0, fs/2.0] where fs is the sampling rate in Hz
<fc2> Definition : Higher frequency cutoff in Hz for bandpass and bandreject filters
Range : [0.0, fs/2.0] where fs is the sampling rate in Hz
Example: (A band-pass filter)
type:3;fc1:500.0;fc2:2000.0
3.11. Jet pilot effect:
Filters the input signal using an FIR bandpass filter.
Parameters: NONE
4. TL;DR
PRO:
- open source
- offline
- free
- different voices
- voice effects!
CON:
- not as great as paid alternatives (yet)
- rather big voice-files required
5. Further links
References :
- MaryTTS http://mary.dfki.de/
- MaryTTS web interface http://mary.dfki.de:59125/
- Blog post about supporting different voices http://myrobotlab.org/content/marytts-multi-language-support
thank you ! great
thank you ! great documentation...
In french the only good voice I found is pierre a male voice
and changed to women like this : mouth.setAudioEffects("F0Add(f0Add=90.0)+TractScaler(amount=1.2)")
Good instructions
Good instructions MaVo!!
Using the parameters is nice and handy!
Since you are with your hands into voices, is MBrola voices a branch of Marytts? There seems to be lot of various voices in their list.
http://tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html
large post - great but almost
large post - great but almost a bit overhelming.
What I miss is how exactly we can add a voice to mrl. My procedure was to first download the runtime package from marytts webpage. In the bin folder it has a "marytts-componene-installer.bat". Starting it we can select the voices to download.
You will get a zip file in the download folder relativ from where you started the .bat command. Unzipping it will create a new folder lib where you should have a "voice-xxx.jar" now. You need to copy this jar into your ".../mrl/libraries/jar"-folder AND RESTART MRL!
It might be that we already have a magical feature in MRL which does this automatically - but using my method of creating an "mrl_<version>" folder with each new version I found that I had to copy the voice jar into the jar folder myself.
And it might help to add a python example line of how to apply the voice modifications for all the different options?
To install a
To install a voice:
maryspeech.installComponentsAcceptLicense(voicename)
This will not only download the correct jar, but also the additional voice files (if the voice has them) and put both in the correct location.
(More information on installing voices at the start of "2. Voices")
A Python example script is really needed.
With brackets [[
With brackets [ [service/MarySpeech.py] ] - the page will pick up the pyrobotlab/servce/MarySpeech.py script and format it ...
I added it to the top but it seems pretty bare .. no example of loading voices ...
FYI - Its the develop branch of the script in pyrobotlab.
At some point I'll add a selector to service pages so you can choose which branch your documentation/script examples come from ...
Another thing on the list :)
Hi MaVo !
Thanks for the updated docs
Thanks for the updated docs MaVo !
object has no attribute 'setAudioEffects'
I was using this line in 5_Mouth.py
mouth.setAudioEffects("TractScaler(amount=0.9)")
I try to get the voice of Terminator using Spike voice
In 1.0.2694 it works, but in 1.1.268 this error appears:
File "C:/mrl/Nixie_1.1.268/InMoov/services/5_Mouth.py", line 206, in <module>
mouth.setAudioEffects("TractScaler(amount=0.9)")
AttributeError: 'org.myrobotlab.service.MarySpeech' object has no attribute 'setAudioEffects'
It does not work in last version yet or I have to set it up somewhere else?
Thank you!
Ooops .. I refactored a
Ooops .. I refactored a little too much ..
Ok, should be fixed in the latest - http://build.myrobotlab.org:8888/getLatest
lets us know if it all worky !
Worky!!! Thank you GroG!
Worky!!!
Thank you GroG!