----- UPDATE 2017/05/16 ----
InMoov and ProgramAB can be modeled a bit like a human. Let's think of it in terms of the parts.
- Ear
- Brain
- Filter
- Mouth
Ok, above we have an ear, an ear recognizes text and publishes that to the brain. The brain takes that text and produces 2 things. 1. a responses as text and 2. out of band messages ( generic mrl messages , we'll talk about these in other posts.)
The brain then filters its response, because it's good to be polite and have good maners. (It's not always a good thing to speak your mind..) .. The filter takes that text response in , filters it and publishes it as a cleaned version of that text. The mouth listens for that text to decide when to start speaking and what to say. It's good not to talk to yourself, so the ear knows when the mouth is speaking.
So, let's talk about how MRL is like building blocks of services that can be swapped in and out..
An "Ear" implements the SpeechRecognizer and TextPublisher interfaces. So, any service that implements these 2 interfaces can be consider an "Ear" service. Currently we have 2 ear services
- Sphinx
- WebkitSpeechRecognition
Sphinx is poor quality, but does not need to be connected to the internet. Webkit, high quality, but requires internet connection.
The "Brain" in the context of ProgramAB is not (yet) very strongly defined as an interface. It's designed to process spoken text and natural language so it implements 2 interfaces. TextListener and TextPublisher. In these interfaces, ProgramAB listens for text from another service (the Ear) and then publishes text to another service (like the Filter.). I suspect these interfaces might formalize a bit more as MRL matures and things refactor.
The filter acts just like a brain in that it implements both TextPublisher and TextListener. It listens for text from the brain and it publishes text to the mouth. The HtmlFilter is an example of a service that implements both text listener and publisher. This is used to filter the response from ProgramAB incase it includes some html or other machine language that isn't intended to be spoken aloud.
The Mouth implements 2 and sometimes 3 interfaces.
- TextListener
- SpeechSynthesis
The speech synthesis services will listen for text from another service, they will then generete sound and write to the sound card to play back an audio version of the text. Many of these speech services will generate an mp3 that represents the spoken text and use the "AudioFile" service to play back the actual audio, other speech synthesis services will directly write to the sound card for playback.
Services that have the cached mp3s will additionally implement AudioListener to get callbacks for when the playback begins and stops for the mp3.
There is another service that is in play here called MouthControl... This is a service that listens to the start/stop playing events and then moves a servo according to the utterance (what is being spoken) and the events for the start/stop of playback. The MouthControl service probably needs some updating to make sure it supports speech synthesis services that don't implement AudioListener....
Ok, so there's lots of speech synthesis services, some work, some don't.. some require internet access.. some don't.. MarySpeech.. NaturalReaders... MimicSpeech (win only, currently) ...
I could continue to blab on about this..so instead i'll just share the links to the interface definitions ..
--------- Orig ------
I feel like people have embraced programab as a brain for the inmoov. Shall we make it a first class citizen and make program ab part of the default inmoov service?
This seems to be the base use case, webkit + programab + speech synth (mary?) is default.
so, I think we should add a "brain" to the inmoov service and that should be ProgramAB.. This means, when you start the inmoov service , it will include an instance of program ab...
maybe we could call this "i01.brain" ? or i01.programab ?
thoughts/comments?
Using one rpi for program ab,
Using one rpi for program ab, another rpi to handle vision, and another rpi for all else, that is what i am currently working on. So long as I am able to seperate things between rpi's, I don't mind. :)
Ya .. totally agree !
Ya .. totally agree !
I think I really like this story ...
"I as a user, press a single button and get InMoov behaving like Max-Headroom"
programab + webkit + virtual inmoov = fun for noobie to start hacking ..
webkit I'm a bit nervous about - resources - reliability & problems .. but ya .. its still better than sphinx
also simple switch to remove virtual inmoov and start real inmoov...
also simple switch to remove
And why not the possibility to have a vInMoov on screen AND running a real inMoov in sync to identify faulty hardware? All from a one click button and some config files
Ya .. two simple
Ya .. two simple switches
InMoov on/off & Virtual InMoov on/off
But this is all already there
But this is all already there in a few lines of python script?
I would rather welcome a better method than AIML to communicate with the bot and extend its capabilities - even if it would ask to have that brain in the cloud.
Adding rpi's and assign them different tasks might be a way but you will need bigger and better batteries to run them all. A power consumption optimized and powerful notebook looks to me to be a better solution and it comes with its own battery.
Nice proposition! But I'm
Nice proposition!
But I'm wondering if that is not going to break many things in the InMoov repo.
The config files give the option to the user to decide if he wants a chatbot or sphinx. In fact if you check, the programAB directory isn't directly used by the repo, and instead we have set a bot directory which includes currently three sets of language AIML.
https://github.com/MyRobotLab/inmoov/tree/develop/Inmoov/inmoovVocal
These AIML sets are a bit different then the one found in the ProgramAB directory to fit the requirements of the config files.
Kwatters, and others, did you have time to test the config files of the InMoov repo? It would be interesting to know if it works for others than Anthony(moz4r) and myself. Because it's adding a layer of possible errors, it would interesting to know if it's heading in a smart direction.
We should all love
We should all love switches!
Kevin we are agree about programAb choice
This is the switche we use actualy about chatbot
https://github.com/MyRobotLab/inmoov/blob/develop/Inmoov/services/A_Cha…
What it does :
- Start programab session based on user language
- First presentation if it is the first time user launch it
Chatbot refactor was not the first thing I was thinking because worky so great, but we will follow evolution
After manticore
All great feedback, I think we're all on the same page.
I recommend we come back to this after the Manticore release. The only things we change now should be bug fixes..
Once the release is out, we can revisit a lot of legacy stuff in the InMoov service(s)
TextListener
Great documention! Thank about it.
I removed TextListener from ear to chatbot, because I want to intercept some things , before programab parse them. Is it a clean way to do it ?
https://github.com/MyRobotLab/inmoov/blob/master/Inmoov/services/4_Ear…
if it's just for that, it's
if it's just for that, it's fine. If you want a filter more complicated, the filters can be plug between webkit and programAB with no problem. It just need to implement the TextListener (listen from webkit) and TextPublisher (publish for programAB)
trying to "draw" based on
trying to "draw" based on this documentation
https://www.draw.io/#HMyRobotLab%2Finmoov%2Fdevelop%2Ftools%2Fshared_di…
Cool diagram Moz4r ! Cool
Cool diagram Moz4r !
Cool diagram tool too !
Excited to see the Object Detection & Face Recognition input interfaces -> Image To Text Service (e.g. something which asks questions regarding something detected) -> Program Ab -> Speech
:)
Exported for people not
Exported for people not wanting to give oath credentials to draw.io