Problem with special Polish characters UnicodeEncodeError: 'ascii' codec can't encode character

Hello, I`have a problem with Polish characters like "Ę" , " Ś" , "Ć", "Ń" , "Ą" , "Ż" and "`Ź" , is there possible to add thease characters to service? Without these i can't go to my language.
The i01.ear works is recognizing special characters great, but in log i get this:
(CLOSE HAND) eng. = (ZAMKNIJ DŁOŃ) pl.

[New I/O worker #3] [INFO] Recognized : >Zamknij dłoń<

[New I/O worker #3] [INFO] Publish Text : Zamknij dłoń

[python.input] [ERROR] ------

Traceback (most recent call last):

  File "<script>", line 1, in <module>

  File "<string>", line 1758, in heard

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 9: ordinal not in range(128)

at org.python.core.codecs.strict_errors(codecs.java:208)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at org.python.core.JavaFunc.__call__(Py.java:2426)

at org.python.core.PyObject.__call__(PyObject.java:431)

at org.python.core.codecs.encoding_error(codecs.java:1538)

at org.python.core.codecs.PyUnicode_EncodeIntLimited(codecs.java:1211)

at org.python.core.codecs.PyUnicode_EncodeASCII(codecs.java:1170)

at org.python.core.codecs.encode(codecs.java:165)

at org.python.core.PyString.encode(PyString.java:3896)

at org.python.core.PyString.encode(PyString.java:3888)

at org.python.core.PyUnicode.unicode___str__(PyUnicode.java:667)

at org.python.core.PyUnicode.__str__(PyUnicode.java:662)

at org.python.core.PyString.str_new(PyString.java:164)

at org.python.core.PyString$exposed___new__.new_impl(Unknown Source)

at org.python.core.PyType.invokeNew(PyType.java:494)

at org.python.core.PyType.type___call__(PyType.java:1706)

at org.python.core.PyType.__call__(PyType.java:1696)

at org.python.core.PyObject.__call__(PyObject.java:461)

at org.python.core.PyObject.__call__(PyObject.java:465)

at org.python.pycode._pyx3.heard$30(<string>:2353)

at org.python.pycode._pyx3.call_function(<string>)

at org.python.core.PyTableCode.call(PyTableCode.java:167)

at org.python.core.PyBaseCode.call(PyBaseCode.java:138)

at org.python.core.PyFunction.__call__(PyFunction.java:413)

at org.python.pycode._pyx58.f$0(<script>:1)

at org.python.pycode._pyx58.call_function(<script>)

at org.python.core.PyTableCode.call(PyTableCode.java:167)

at org.python.core.PyCode.call(PyCode.java:18)

at org.python.core.Py.runCode(Py.java:1386)

at org.python.core.Py.exec(Py.java:1430)

at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)

at org.myrobotlab.service.Python$InputQueueThread.run(Python.java:137)

------

[python.input] [ERROR] python error PyException null

Here is the screen shot from the i01.ear service:

 
 
 
calamity's picture

caracter encoding can be a

caracter encoding can be a pain to use

There is several character encoder/decoder that can be use so every character can be recognized

what happen in your case is that the 'ear' encore the answer in a format (probably UTF8) that is not recognized by MRL. You need to tell the program which decoder to use so it can read the string.

Without knowing more about what you are doing, it's hard to guide you how to fix the problem

 

juerg's picture

for my German umlaute I use

for my German umlaute I use this function in the python script:

# ear.addTextListener(marvin)
# route text over Umlaut replacing function
ear.addListener("publishText","python","replaceUmlaute")
 
def replaceUmlaute(data):
  data = data.replace(chr(228),"AE")
  data = data.replace(chr(246),"OE")
  data = data.replace(chr(252),"UE")
  print data
  marvin.getResponse(data)
 
where marvin is my ProgramAB service. This way marvin (chatbot) gets the umlaute as the "normalized" text representations and the AIML pattern using the normalized text gets matched with the result of wksr.
 
I had to find the chr-values through logging as they do not correspond to the ascii table values.
 
not sure whether this can solve your problems but it works for German at least.

 

bartcam's picture

Eureka here is the sample

Eureka here is the sample without any replace in pyton.

I think that works in other languages but need to search for UTF coding.


#speak test to special characters
Runtime.createAndStart("ear", "WebkitSpeechRecognition") 
mouth=Runtime.createAndStart("mouth", "AcapelaSpeech")
htmlFilter=Runtime.createAndStart("htmlFilter", "HtmlFilter")
#
ear.addListener("publishText", python.name, "talk") 
#
htmlFilter.addListener("publishText", python.name, "say");
mouth.setLanguage("pl-PL");
mouth.setVoice("Ania");
ear.setLanguage("pl-PL")
ear.startListening("Łukasz|ręka|jest") # ear is recognizing other characters no need to change to UTF coding
Runtime.createAndStart("webGui", "WebGui") 
 
ear.addCommand(u'\u0141ukasz', "python", "czesc2") # Ł = \u0141 but we need to add u'\(charakter number)ukasz
ear.addCommand("jest", "python", "czesc")
#
ear.addListener("recognized", "python", "heard")
#
ear.startListening()
#
def talk(data):
mouth.speak(data)
#
def czesc():
    mouth.speak("jest")
#
def czesc2():
    mouth.speak(u'\u0141ukasz') # to speak from (mouth.speak"...") we need to change to UTF charactrers

 

bartcam's picture

the simplest way

#speak test to special characters
Runtime.createAndStart("ear", "WebkitSpeechRecognition") 
mouth=Runtime.createAndStart("mouth", "AcapelaSpeech")
htmlFilter=Runtime.createAndStart("htmlFilter", "HtmlFilter")
#
ear.addListener("publishText", python.name, "talk") 
#
htmlFilter.addListener("publishText", python.name, "say");
mouth.setLanguage("pl-PL");
mouth.setVoice("Ania");
ear.setLanguage("pl-PL")
ear.startListening("Łukasz|ręka|jest") # ear is recognizing other characters no need to change to UTF coding
Runtime.createAndStart("webGui", "WebGui") 
 
ear.addCommand(u'Łukasz', "python", "czesc2") #this is recognizing special characters withoud letter code
ear.addCommand(u'cześć', "python", "czesc")
#
ear.addListener("recognized", "python", "heard")
#
ear.startListening()
#
def czesc():
    mouth.speak(u'cześć')
#
def czesc2():
    mouth.speak(u'Łukasz')