USC - Viterbi School of Engineering

Engineers Develop a Mathematical Model of Surprise, and Video Eyetracking Experiments Show Attention Follows the Predictions of Their Theory

November 30, 2005 —

How big a wow? New formula offers an snswwer.

Two Southern California engineers have created a mathematical theory of human surprise, working from basic principles of digital communications. Experiments recording eye movements of volunteers watching video appear to provide confirmation of the theory.

Laurent Itti of the University of Southern California's Viterbi School of Engineering and Pierre Baldi of the University of California Irvine's Institute for Genomics and Bioinformatics, presented their results December 7, at the Neural Information Processing Systems (NIPS) Conference in Vancouver, B.C.

Itti is the principal investigator for a new grant awarded by the National Science Foundation to carry on further research in the field with Baldi and electrophysiologist Douglas Muñoz of Queens University.

In developing their theory, Itti and Baldi went back to fundamental principles developed by Claude Shannon in his classic 1948 paper, "A Mathematical Theory of Communication." The pair's mathematical theory of surprise proposes an alternative mode, a subjective one, for characterizing and quantifying information, distinct from Shannon's model.

Itti, who is a assistant professor in the Viterbi School's Department of Computer Science, says Shannon’s technique is not about a specific observer, but about any observer seeking to pick out a message from its noisy environment, or to send one with an assurance it will be read accurately.

Communicators package their messages to survive in a noisy environmental buzz of activity that itself contains crucial information — information that is not in message form. These include potential threats or opportunities. To deal with the flood of information bombarding their senses, Individuals develop mechanisms by which they devote attention to certain stimuli, while ignoring others. As Itti and Baldi write, “efficient and rapid attentional allocation is key to predation, escape, and mating — in short, to survival.”

According to the researchers, previous computational work on the problem has been phrased in the vocabulary of the stream of electronic data making up a video image, as a proxy for the much more complex mixture of sights, sounds, smells and other data found in a real environment. Analyzing such a stream, researchers can isolate stimuli with visual attributes that are unique in the mix by breaking down the signal into “feature channels,” each describing a particular attribute (i.e,, color) in the mix. Such features are called “salient.”

Itti previously developed a measure of saliency. A parallel analysis performs similar operations, but does so over time, not space, by looking for new suddenly-appering elements. This approach is said to model “novelty.” Finally, an analysis can be done purely in terms of Shannon’s original equations, which can measure the level of organization or detail found in the data flow, or its entropy. Itti and Baldi say that in current research, the definition of both saliency and novelty are empirical, based on analysis of visual streams, rather than predictions about them derived from basic principles,

Their theory boldly proposes to make just such predictions. The probability theory involved is that known as “Bayesian,” which is a method for structuring events observed over time in the past into predictions about the future. The equation for making this guess is well known, having been developed from the probability studies of the English mathematician Thomas Bayes (1702-61). Itti and Baldi devised a way of applying it to the data in a video stream, providing a measure of how observing new data will affect the set of beliefs an observer has developed about the world on the basis of data previously received.

"Data that does not change your beliefs is not surprising," says itti.

Their next step was to use this theory to analyze a video stream and describe what streams had the most most “surprising” features. Finally, having performed this analysis, they checked it by looking at the eye movements of observers who were watching the images, to see if the eyes followed their measure of surprise.

The two researchers measured the success of their “surprise” prediction against two other analyses. The first was the version of saliency that Itti co-developed as a graduate student studying under Christof Koch at Caltech. The second was a computation of Shannon entropy by C.M. Privatera and L.W. Stark. Surprise, they say, outperformed entropy and saliency, “exhibiting a stronger human bias toward surprising locations than towards entropic or salient regions.” The pair say they have confirmed these results with a larger study.

The authors conclude: “At the foundation of our model is a simple theory that describes a principled approach to computing surprise in data streams. While surprise is not a new concept, it had lacked a formal definition, broad enough to capture the intuitive meaning of the term, yet quantitative and computable…. Beyond vision, computable surprise could guide the development of data mining, as it can in principle be applied to any type of data, including visual, auditory or text.”

The NGA, NSF and NIH supported the research that will be presented at the NIPS meeting. The presentation can be viewed at http://iLab.usc.edu/publications/Itti_Baldi06nips.html

More information about the NSF grant is at http://www.eurekalert.org/pub_releases/2005-11/uosc-nfp112805.php

Calcuating Wow!: The experiment analysed a set of video images according in three modes: classic Shannon entropy (measuring how organzied the data appeared; "saliency," and for suprise. The surprise result most closely predicted the eye movements of human observers watching the video.

Calculating Wow!

Engineers Develop a Mathematical Model of Surprise, and Video Eyetracking Experiments Show Attention Follows the Predictions of Their Theory

Calcuating Wow!: The experiment analysed a set of video images according in three modes: classic Shannon entropy (measuring how organzied the data appeared; "saliency," and for suprise. The surprise result most closely predicted the eye movements of human observers watching the video.