A problem with computer interactions today is the inability of computer systems to understand the context in which something is said. Context colors the meaning of words so that a word with multiple meanings is quite clear to a human hearer, but nearly impossible for the computer to interpret. Before speech systems can truly take off, they must be able to interpret context far better than they do now, and that means being able to understand emotion. Even though such systems are a long way away, science fiction authors, such as Isaac Asimov, have long insisted that robots and other computerized systems that interpret spoken input will be able to understand emotional context at some point. In fact, the movie, I, Robot, makes this point quite clearly.
There is a new standard that has been introduced that will partly solve the problem, Emotion Markup Language (EmotionML). The standard provides a framework for describing emotion in a manner that a computer can understand. It provides a framework for future technologies, but doesn’t actually provide a technology that you can use today. A Speech Technology article provides a good overview of EmotionML that you can use to summarize what it does.
I imagine that some people are wondering why we don’t simply use something simple like an emoticon to express emotion. After all, people have used them for quite some time to express emotions as part of e-mail. The problems with emoticons are that they don’t convey enough information and they’re also used tongue-in-cheek in many cases. In addition, even though some simple emoticons are standardized, there are many versions of emoticons that supposedly express the same emotion, which would make interpreting them nearly impossible.
Of course, it’s important to understand what happens when a computer can finally interpret emotions well enough to put some speech into context. Well, for one thing, fewer people will be throwing their computers out of windows when they get frustrated with the computer’s idiotic responses to input that was clearly not meant to be processed in a certain way. However, what is more important is that the computer will correctly interpret the spoken word more often. When words have a whole list of meanings, just knowing the emotional context can help a computer select the correct meaning and react appropriately—reducing user frustration and defusing situations before they deteriorate into some sort of unexpected action.
The important thing to remember is that the EmotionML standard is only a framework, not a technology. When the technologies based on this standard start to appear, you can be certain that vendors will want to put a particular spin on the product to differentiate it from other products out there. The technologies won’t work well together at first and there is going to be a lot of confusion on the part of humans and computers alike. However, at least it’s a start in the right direction.
What other kinds of contextual information does a computer require to interpret the spoken word with greater accuracy? I think one of the next standards will need to address body language, likely starting with facial expressions, but I’d like to hear your opinion. Send your thoughts on language context and how computers can interpret them correctly to John@JohnMuellerBooks.com.