DO YOU BELIEVE ME NOW? (Human-like Robots)

talking robotCreating a Truly Human Sounding Robot Voice
by Jonathan Lowe
No doubt you’ve seen talking or singing robots–and mainly in Japan, where development has been ongoing for over a decade, and where research continues.  Surprisingly, however, the recreation of human speech is more complex than most people realize, and so coming up with a robotic voice which is indistinguishable from human in expression may still be some time off.  What complicates the effect are the variety of vocal qualities which we generate when we talk.  It is not merely that human vocal cords lend vibration to a stream of air from the lungs, and then shape these vibrations within the voice box.  The entire throat cavity, including the larynx, mouth, tongue, nasal cavity, and lips assist forming words and sounds, taking cues from aural feedback processed by the brain.

    “Consider how we are able to speak in the first place,” says Dr. Sayoko Takano, who worked in robotic speech research a decade ago in Japan, and is now doing research into magnetically controlled tongue-operated wheelchairs for paraplegics at the University of Arizona.  “Not only do we have to control respiration, the vibration of our vocal folds, plus our tongue and lip and velum motion, but also the tension of the larynx, the motion of the tongue, and the shape of the vocal tract itself.  No computer voice synthesizer can yet match this complexity without coming off sounding artificial.”

    Dr. Hideyuki Sawada of Kagawa University agrees. “Voice quality depends not only on control and learning techniques, but also on the materials, which should be very close to the human anatomy.  The dampness and viscosity of the organs have influence on the quality of generated sounds, like what you experience when you have a sore throat.  The typical method for generating human-like voices was by software algorithms, but we now try to generate human-like voices by the mechanical way, as humans themselves do.  The goal is to totally reconstruct the human vocal system mechanically, with learning ability for autonomous acquisition of vocalization skills.” 

    In short, what scientists in Japan are doing is creating robots which mimic the way humans actually speak, which is the only way to obtain the qualities that would make you believe a human is speaking.  Of course one might think that building a tongue and larynx robot would be relatively easy, given today’s engineering technology, but again, speech organs are very different from limbs like the leg or arm.  “The tongue is a bundle of muscle assemblages composed of seven main tongue muscles, and there are also lip and jaw muscles adding up to more than thirty combinations in controlling the speech organs,” Dr. Takano asserts. “Each muscle moves by activity innovated in the brain. Since the tongue and lip have an irregular evolution history, they also have both voluntary and involuntary, and also fast-weak and slow-strong controls. So the complex relationship between speech related air flows from tongue, muscle activation, muscle character, and vibration of the vocal fold mean that a non-sentient computer with replicated and motorized parts is still at a disadvantage to a human ‘actor.'”

    Dr. Sawada’s current research with the Dept. of Intelligent Mechanical Systems Engineering at Kagawa, began by first studying how a baby acquires language, and which characteristics are developed in the voice acquisition process. Now he’s developing a talking robot with human-like vocal organs. This robot has already begun to learn vocalization skill by articulating those vocal organs while listening to and mimicking human voices.  In this way, the robot will produce human-like emotional expressions via dynamic articulation. “Current humanoid robots speak electronically, with a speaker system run by software,” explains Sawada.  “We have begun testing a rehabilitation robot for people with auditory and speech disabilities, so that they learn vocalization skill by observing the robot’s vocal articulations. This robot is able to estimate vocal articulations of unclear speech, by listening to voices given by disabled people.  Since the robot knows the normal articulations for vocalizing good speech–from interactive learning with able-bodied people–it can teach disabled people how their speech would be improved by modifying their articulations, showing differences between good speech and unclear speech in the articulations.”

    Rehabilitation therapy and mechanical reproduction aside, the final barrier to making a truly human-like robotic voice is the real time manipulations of sounds, when we imbue words with spontaneous emotional interpretation.  How far are we from achieving that complexity?  And when will a robot be able to respond to human emotions with human reactions?  “We are a long way from the day of the sentient computer-operated robot,” contends Dr. Takano, “and how far away there is no way to say.  My own feeling is that it may not be 2045, as predicted, but rather a hundred years or more.” Adds Dr. Sawaga, “Regarding the generation of emotional expressions, I’m trying to realize it for a general human-like robot, and not only for teaching of the hearing impaired.  But our robot reproduces emotions by mimicry.  It listens to human speech, and extracts emotional expressions using neural network algorithms. Then it responds to human voices with the same emotional expressions.  It does not think in the same sense that we do.  That is an area beyond my research.  Instead, we are trying to extract articulation parameters from human talking and singing voices with emotions, and clearly, we have achieved success in that regard.”

    Why is Japan a leader in this research?  Surprisingly, it may have something to do with the West’s taboo against building truly lifelike robots. “The Christian ethic figures into it, with a reluctance for western religions to copy the human form, or to make an exact replica or ‘idol’ or ‘image,’ if you will,” Takano says.  Consequently, Japan is at the forefront of human form robotics, not the U.S..  Meaning it’s likely that the first lifelike sentient robot, when it finally has a chance to win American Idol, might be disqualified only because it prefers to sing in Japanese.


(Originally published in Cosmos Magazine)


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s