acm-header
Sign In

Communications of the ACM

ACM TechNews

Watch What You Say


View as: Print Mobile App Share:
The shapes of lips speaking different sounds.

A computerized system is being developed to analyze the shapes human lips make when they produce different sounds.

Credit: Haddenham.net

Mu'tah University researcher Ahmad Hassanat is developing a computerized system that can analyze the shapes human lips make as they produce different sounds.

These shapes, called visemes, have been difficult to analyze because there are dozens of visemes for the 40 to 50 sounds that make up the English language. Hassanat is developing a system that can detect the visual signature of entire words, using the appearance of the tongue and teeth as well as the lips.

He trained the system by filming 10 women and 16 men of different ethnicities as they read passages of text. First, the computer compared the recordings with a text it knew, and tried to guess what the volunteers were saying in a second video. When the system was allowed to use the same person's training speech, it was able to identify about 75 percent of the words spoken. However, when the person's original training video was excluded from the analysis, the program's accuracy fell to 33 percent on average.

Separately, Waseda University researcher Yasuhiro Oikawa in 2013 filmed a speaker's throat with a high-speed camera, measuring the tiny vibrations in the skin caused by the act of speaking. Oikawa says the precise frequencies of the vibrations could be used to reconstruct the word being spoken.

From The Economist
View Full Article

 

Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account