We are increasingly developing programs and technology that do things the way people do: robots that walk like humans, computers that think like humans, even algorithms that write stories like humans. But what about seeing like humans?
That’s what Laurent Itti, computer science associate professor in the USC Viterbi School of Engineering, is working on right now. He is part of an NSF Expeditions Grant to create cognitive vision systems in collaboration with The Pennsylvania State University, UCLA, UC San Diego, MIT, York College of Pennsylvania, University of Pittsburgh and Stanford. The Expeditions Grant, one of only two given this year, will provide $10 million in funding over five years to the multi-investigator research team, representing the largest single investments in computer science research that NSF makes.
“We’re developing new algorithms for visual processing that are inspired by the way the human brain works,” Itti said.
And how exactly does the brain process visual data? Itti’s group is focusing on two key areas of visual processing that can be applied to machines: top-down attention and compositionality.
Top-down attention refers to the decision tree you engage in automatically when you are looking for something. If you want to find a stapler in an office, you start scanning for surfaces where you would normally find a stapler: on a desk, on a table, in a drawer. You most likely would not look on the ceiling or search behind a bookcase. Your brain combines the goal, previous knowledge and biased visual scanning to find the stapler. Itti wants computers to engage in the same reasoning and searching process.
Compositionality refers to our hierarchical way of recognizing objects. You know what a wheel is, and you can recognize it whether it’s on a bike, a car or a skateboard. But this is a difficult task for a computer. Right now, recognition software is mostly task-specific. For example, effective face detectors exist and can isolate faces in a photo, but facial recognition software won’t make any meaning from a picture of a fire engine.
Itti’s contribution to this project will be defining a dictionary of components and writing algorithms that define ways they can combine to form different objects. This way, if you need to add a new type of object to the recognition database, you can do that by drawing on this fundamental set of components.
Other facets of the project that Itti’s group are not involved in are hardware design, the human-machine interface, usability and privacy issues.
Principal investigator Vijay Narayanan from The Pennsylvania State University explains the importance and application of this work.
“This project will result in smart camera systems that approach the cognitive abilities of the human cortex. Such cameras can understand visual content and result in multi-faceted impact on society, including visual aids for visually impaired persons, driver assistance for reducing automotive accidents, and augmented reality for enhanced shopping, travel and safety.”
For the blind, a camera worn on a pair of glasses could help the user locate a desired item and give him/her information about where it is in space. At a grocery store, for example, a person might be looking for a box of Cheerios. The voice-activated system could scan the shelves for a Cheerios box and communicate the location to the user.
How would the system tell the user where the box is? A few variations of tactile communications could be used. Vibrations from a headset on either side of the user’s head or vibrations on a special vest could direct the user in the direction of the target, much like we tend to say, “warmer, warmer; colder, colder.”
For assisting drivers, cars outfitted with cameras could process information about its environment and possibly catch a hazard the driver didn’t notice, such as a small child running into the road.
Visual input might also include the status of the driver’s attention: if the driver is looking left at a street sign, the car could pay special attention to possible hazards on the right.
Lastly, this research could add to the emerging field of augmented reality, using products like Google Glass. Augmented reality is similar to the application for the blind, but for visually-abled people, in that the computer visual systems can give users real-time information about their surroundings. In the same supermarket cereal aisle, a user could compare prices of an item at nearby stores, get health-related content about the product, and send a text to a spouse asking if he/she wants anything else from the store.
Professor Laurent Itti wearing Goolge Glass.