University of California, Los Angeles (UCLA) researchers have developed Image to Text (I2T), a computer vision system that can generate a real-time text description of what is happening in a surveillance camera video feed.
The researchers put a series of computer vision algorithms into a system that accepts images or video frames as input and generates summaries of the input. I2T uses an image parser to break down an image by separating the background from objects in the picture. Next, the meaning of the objects is determined. "This knowledge representation step is the most important part of the system," says UCLA professor Song-Chun Zhu. I2T includes a database of more than two million images containing objects that have been identified and classified into more than 500 categories. The video-processing system uses algorithms that can describe the movement of objects in successive frames.
Although the system demonstrates a step toward what Zhu calls a "grand vision in computer science," I2T is not ready for commercialization. Improving the system's knowledge of how to identify objects and scenes by adding to the number of images in the database will help I2T grow, Zhu says.
Abstracts Copyright © 2010 Information Inc., Bethesda, Maryland, USA
No entries found