Research Philosophy
Each time we engage in a moderately complex task, we likely enlist the help of an untold number of simpler visuo-motor operations that exist largely outside of our conscious awareness. Consider for instance the steps involved in preparing a cup of coffee. For the sake of simplicity, assume that the coffee has already been brewed and is waiting in the pot, and that all of the essential accessories, an empty cup, a spoon, a carton of
cream, and a tin of sugar, are sitting on a countertop in front of you. What is your first step toward accomplishing this goal? The very first thing that you might do is to move your eyes to the handle of the coffee pot, followed shortly thereafter by the much slower movement of your preferred hand to the same target. Because the coffee pot is hot and the handle is relatively small, this change in fixation is needed to guide your hand to a safe and useful place in which to grasp the object. After lifting the pot, your eye may then dart over to the cup. This action is needed, not only to again guide the pot to a very specific point in space directly over the cup, but also to provide feedback to the pouring operation so as to avoid a spill. After sitting the pot back on the counter (an act that may or may not require another eye movement), your gaze will likely shift to the spoon. Lagging shortly behind this behavior may be simultaneous movements of your hands, with your dominant hand moving toward the sugar tin and your non-preferred hand moving to the spoon. The spoon is a relatively small and slender object that again requires assistance from foveal vision for grasping; the tin is a rather bulky and indelicate object that does not require precise Visual information to inform the grasping operation. Once the spoon is in hand and the lid to the tin is lifted, gaze can then be directed to the tin in order to help scoop out the correct measure of sugar. To ensure that the spoon is kept level, a tracking operation may be used to keep your gaze on the loaded spoon as it moves slowly to the cup. After receiving the sugar, and following a few quick turns of the spoon, your coffee would finally be ready to drink (see Land et al., 1998, for a similarly framed example). The above example illustrates two points that are central to the research conducted in my lab. First, even seemingly simple tasks can be decomposed into a sequence of even simpler underlying behavioral operations. Computational vision theorists refer to the sequence of operations underlying a task as a "Visual routine" and the Visual information used by these routines as a "base representation" (Ullman, 1984). Building on the computer analogy, if the Visual routine describes the visuo-motor operations required to perform some task, the base representation specifies the type and structure of the variables used in these operations. The second point to take away from the above example is that eye movements, although seldom noticed in our day-by-day activities, are a prominent component of nearly every visuo-motor task that we are likely to perform. Saccades, the variety of eye movement highlighted in the example, are not only the fastest human motor behavior, but at 3-5 each waking second, are also among our most frequent behaviors. Eye movements can also be cognitively controlled, meaning that they can be used in a highly organized and systematic manner in the performance of a task. These properties make eye movements a valuable tool in which to study human behavior at the level of the Visual routine. Importantly, Visual routines and base representations are believed to exist at multiple levels in a behavioral hierarchy. For example, the routine describing the preparation of a cup of coffee may be itself only one operation in a much larger "morning activities" routine. Similarly, each operation in the "coffee preparation" routine can be divided into many even more elemental operations. Recall that the very first operation in this routine involved moving your eyes to the handle of the coffee pot... but how did your eyes know where to go? Presumably, this operation called a "search for the handle" Visual routine to obtain the spatial coordinates of the desired oculomotor target in the scene. The search routine would itself have to call even more basic routines to segment the image impinged on the retina into objects and to match the featural properties of these objects to the "handle" target. If we classify these routines as either low-level (e.g., object segmentation), mid-level (e.g., searching for a pattern), or high-level (e.g., manipulating objects to perform tasks, such as preparing coffee), then the research in my lab deals primarily with those Visual routines using a mid-level base representation. In addition to Visual search, other routines mediated by this mid-level of perception might include counting tasks (e.g., determining how many instances of a particular pattern appears in an image), tracking the movement of multiple objects (e.g., ducks swimming in a pond), comparing two patterns in Visual memory (e.g., determining if something has changed between two views of a scene), and updating in spatial memory the coordinates of task relevant objects. As an example of this latter routine, after pouring coffee into our cup and sitting the pot back on the counter, we may want to encode the new location of the handle in our visuo-spatial memory so as to avoid having to re-search for this pattern if we need to again reach for the coffee pot later in our task, perhaps to top off the cup after adding the cream. We believe that all of these behaviors, and many more studied under the auspices of Visual spatial attention and Visual working memory, are highly related in that they use the same mid-level base representation and share many of the same Visual routines. Our broad research goals are to identify these mid-level routines and representations in common real-world tasks, then use eye movements and other behavioral measures to specify each of the operations constituting these routines. One necessary step in understanding a Visual routine is learning how it extracts and uses information from the world, and the sequence in which this information is obtained. Given that individual operations may extract information from different regions of a scene, what is needed is a way to discern which part of a scene is being processed at each step in an ongoing Visual routine. Eye movements provide this sort of window into the moment-by-moment performance of a task. Because saccades are so fast, gaze position can often reveal the location in a scene being processed during each operation. Much of the work in my lab asks where and how gaze is positioned during a task and attempts to piece together from this information the Visual routine underlying the task behavior. We then make explicit this understanding of a Visual routine by framing it in terms of a working computer model, which can be tested by comparing its simulated eye movement behavior to the actual eye movement behavior of humans performing the same task. Through the adoption of this reciprocal experimental and computational research plan, we hope to better understand not only the behavioral primitives that we enlist during the performance of a task, but also the computational language used by our cognitive systems to construct organized Visual routines from these primitives. The following are brief outlines of projects  designed to advance these goals.
eye movements and visual cognition