Research Philosophy
Each time we engage in a moderately complex task, we likely enlist the help of an untold number of simpler
visuo-motor operations that exist largely outside of our conscious awareness. Consider for instance the steps
involved in preparing a cup of coffee. For the sake of simplicity, assume that the coffee has already been
brewed and is waiting in the pot, and that all of the essential accessories, an empty cup, a spoon, a carton of
cream, and a tin of sugar, are sitting on a countertop in front of you. What is your first step toward accomplishing this goal? The very
first thing that you might do is to move your eyes to the handle of the coffee pot, followed shortly thereafter by the much slower
movement of your preferred hand to the same target. Because the coffee pot is hot and the handle is relatively small, this change in
fixation is needed to guide your hand to a safe and useful place in which to grasp the object. After lifting the pot, your eye may then
dart over to the cup. This action is needed, not only to again guide the pot to a very specific point in space directly over the cup, but
also to provide feedback to the pouring operation so as to avoid a spill. After sitting the pot back on the counter (an act that may or
may not require another eye movement), your gaze will likely shift to the spoon. Lagging shortly behind this behavior may be
simultaneous movements of your hands, with your dominant hand moving toward the sugar tin and your non-preferred hand moving to
the spoon. The spoon is a relatively small and slender object that again requires assistance from foveal vision for grasping; the tin is a
rather bulky and indelicate object that does not require precise Visual information to inform the grasping operation. Once the spoon is
in hand and the lid to the tin is lifted, gaze can then be directed to the tin in order to help scoop out the correct measure of sugar. To
ensure that the spoon is kept level, a tracking operation may be used to keep your gaze on the loaded spoon as it moves slowly to the
cup. After receiving the sugar, and following a few quick turns of the spoon, your coffee would finally be ready to drink (see Land et
al., 1998, for a similarly framed example).
The above example illustrates two points that are central to the research conducted in my lab. First, even seemingly simple tasks can
be decomposed into a sequence of even simpler underlying behavioral operations. Computational vision theorists refer to the sequence
of operations underlying a task as a "Visual routine" and the Visual information used by these routines as a "base representation"
(Ullman, 1984). Building on the computer analogy, if the Visual routine describes the visuo-motor operations required to perform some
task, the base representation specifies the type and structure of the variables used in these operations. The second point to take away
from the above example is that eye movements, although seldom noticed in our day-by-day activities, are a prominent component of
nearly every visuo-motor task that we are likely to perform. Saccades, the variety of eye movement highlighted in the example, are not
only the fastest human motor behavior, but at 3-5 each waking second, are also among our most frequent behaviors. Eye movements
can also be cognitively controlled, meaning that they can be used in a highly organized and systematic manner in the performance of a
task. These properties make eye movements a valuable tool in which to study human behavior at the level of the Visual routine.
Importantly, Visual routines and base representations are believed to exist at multiple levels in a behavioral hierarchy. For example,
the routine describing the preparation of a cup of coffee may be itself only one operation in a much larger "morning activities" routine.
Similarly, each operation in the "coffee preparation" routine can be divided into many even more elemental operations. Recall that
the very first operation in this routine involved moving your eyes to the handle of the coffee pot... but how did your eyes know where
to go? Presumably, this operation called a "search for the handle" Visual routine to obtain the spatial coordinates of the desired
oculomotor target in the scene. The search routine would itself have to call even more basic routines to segment the image impinged
on the retina into objects and to match the featural properties of these objects to the "handle" target. If we classify these routines as
either low-level (e.g., object segmentation), mid-level (e.g., searching for a pattern), or high-level (e.g., manipulating objects to
perform tasks, such as preparing coffee), then the research in my lab deals primarily with those Visual routines using a mid-level base
representation. In addition to Visual search, other routines mediated by this mid-level of perception might include counting tasks (e.g.,
determining how many instances of a particular pattern appears in an image), tracking the movement of multiple objects (e.g., ducks
swimming in a pond), comparing two patterns in Visual memory (e.g., determining if something has changed between two views of a
scene), and updating in spatial memory the coordinates of task relevant objects. As an example of this latter routine, after pouring
coffee into our cup and sitting the pot back on the counter, we may want to encode the new location of the handle in our visuo-spatial
memory so as to avoid having to re-search for this pattern if we need to again reach for the coffee pot later in our task, perhaps to top
off the cup after adding the cream.
We believe that all of these behaviors, and many more studied under the auspices of Visual spatial attention and Visual working
memory, are highly related in that they use the same mid-level base representation and share many of the same Visual routines. Our
broad research goals are to identify these mid-level routines and representations in common real-world tasks, then use eye movements
and other behavioral measures to specify each of the operations constituting these routines. One necessary step in understanding a
Visual routine is learning how it extracts and uses information from the world, and the sequence in which this information is obtained.
Given that individual operations may extract information from different regions of a scene, what is needed is a way to discern which part
of a scene is being processed at each step in an ongoing Visual routine. Eye movements provide this sort of window into the
moment-by-moment performance of a task. Because saccades are so fast, gaze position can often reveal the location in a scene being
processed during each operation. Much of the work in my lab asks where and how gaze is positioned during a task and attempts to piece
together from this information the Visual routine underlying the task behavior. We then make explicit this understanding of a Visual routine
by framing it in terms of a working computer model, which can be tested by comparing its simulated eye movement behavior to the actual
eye movement behavior of humans performing the same task. Through the adoption of this reciprocal experimental and computational
research plan, we hope to better understand not only the behavioral primitives that we enlist during the performance of a task, but also
the computational language used by our cognitive systems to construct organized Visual routines from these primitives. The following are
brief outlines of projects designed to advance these goals.
eye movements and visual cognition