Projects
Developing a neurocomputational model of eye movements during Visual search
The need for a computationally explicit model of eye movements in a search task is an extremely important step toward understanding
the Visual routines and base representations underlying search behavior. Ongoing work in our lab is attempting to extend a well-established
saliency map conception of search (meaning that items in a search display are processed in proportion to their similarity to the target) to
include real-world objects and eye movement behavior. Such extensions to real-world search are not trivial and require an interdisciplinary
effort to be successful. The computer vision community has a great deal of experience in representing real-world objects, but far less
experience in the behavioral techniques needed to test these representational schemes. The cognitive psychological community has
elaborate methods for describing complex behavior, but far less experience in the formal representation of real-world objects. As a result
of these mutual limitations, no computationally explicit theory of eye movements during real-world search had been validated by behavioral
data, and no behaviorally explicit theory of oculomotor search had been implemented as a computational model.
In a collaboration with Rajesh Rao, Mary Hayhoe, and Dana Ballard, we developed a computational model of Visual search to explain the
pattern of oculomotor behavior reported in Zelinsky et al. (1997). By combining image processing techniques from computer vision with
biological constraints identified in the computational neuroscience community, this interdisciplinary model represents arbitrarily complex
Visual patterns as high-dimensional vectors of feature properties (i.e., colors, orientations, spatial scales, etc.). A simple Visual routine
consisting of the sequential coarse-to-fine application of spatial filters then causes simulated gaze to move toward the target. We tested
this model by collecting eye movement data from human observers searching for real-world targets, then inputting these same scenes to
the model and comparing the simulated sequence of saccades and fixations to the human behavioral data. The results revealed a qualitative
similarity between the Zelinsky et al. (1997) pattern of results and the simulated gaze patterns generated by the model (Rao et al., 1996,
2002).
More recent work conducted at SBU has modified and extended this model in several key respects. First, the base representation used by
Rao et al. (2002) assumed a uniform clarity to the scene being viewed regardless of where gaze was positioned in the image. Humans,
however, have a fovea that limits high Visual acuity to only the region of the image that we are looking at directly with our eyes. In order
to bring the model and human representational constraints into closer agreement, we created for the model a simplified simulated retina.
The information available from each fixation is therefore acuity constrained much like human vision, requiring the model now to move its
simulated fovea over the scene to acquire new information as it searches for a target. We also abandon the Visual routine used in the Rao
et al. (2002) model in favor of a more dynamic method of driving gaze to the search target. As in the earlier model, this approach also uses
filter-based image processing techniques to represent real-world targets and search displays, then compares these target and display
representations to derive a salience map indicating likely target candidates. However, rather than applying a hard-wired coarse-to-fine
filtering scheme, the target of a simulated saccade is now determined by the spatial average of activity on this map, with this average
changing over time as a moving threshold removes those salience map points offering the least evidence for the target. As a result of this
threshold pruning points from the salience map, a sequence of eye movements is produced that eventually aligns simulated gaze with the
model's best guess as to the target's location. We are currently testing this routine by comparing the simulated oculomotor scanpaths to the
scanpaths of human observers viewing the same displays and searching for the same targets. Preliminary findings reveal considerable
spatio-temporal agreement between these gaze patterns, both at an aggregate level (e.g., general tradeoffs between saccade latency
and accuracy) as well as in the behavior of individual observers (Zelinsky, 1999a, 2000a, 2000b, 2002, 2003a, 2003b)
Research Philosophy
Each time we engage in a moderately complex task, we likely enlist the help of an untold number of simpler
visuo-motor operations that exist largely outside of our conscious awareness. Consider for instance the steps
involved in preparing a cup of coffee. For the sake of simplicity, assume that the coffee has already been
brewed and is waiting in the pot, and that all of the essential accessories, an empty cup, a spoon, a carton of
cream, and a tin of sugar, are sitting on a countertop in front of you. What is your first step toward accomplishing this goal? The very
first thing that you might do is to move your eyes to the handle of the coffee pot, followed shortly thereafter by the much slower
movement of your preferred hand to the same target. Because the coffee pot is hot and the handle is relatively small, this change in
fixation is needed to guide your hand to a safe and useful place in which to grasp the object. After lifting the pot, your eye may then
dart over to the cup. This action is needed, not only to again guide the pot to a very specific point in space directly over the cup, but
also to provide feedback to the pouring operation so as to avoid a spill. After sitting the pot back on the counter (an act that may or
may not require another eye movement), your gaze will likely shift to the spoon. Lagging shortly behind this behavior may be
simultaneous movements of your hands, with your dominant hand moving toward the sugar tin and your non-preferred hand moving to
the spoon. The spoon is a relatively small and slender object that again requires assistance from foveal vision for grasping; the tin is a
rather bulky and indelicate object that does not require precise Visual information to inform the grasping operation. Once the spoon is
in hand and the lid to the tin is lifted, gaze can then be directed to the tin in order to help scoop out the correct measure of sugar. To
ensure that the spoon is kept level, a tracking operation may be used to keep your gaze on the loaded spoon as it moves slowly to the
cup. After receiving the sugar, and following a few quick turns of the spoon, your coffee would finally be ready to drink (see Land et
al., 1998, for a similarly framed example).
eye movements and visual cognition