Extrapolation occurs in multiple object tracking when eye movements are controlled

Luu, Tina; Howe, Piers D. L.

doi:10.3758/s13414-015-0891-8

Extrapolation occurs in multiple object tracking when eye movements are controlled

Published: 18 April 2015

Volume 77, pages 1919–1929, (2015)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Extrapolation occurs in multiple object tracking when eye movements are controlled

Download PDF

Tina Luu¹ &
Piers D. L. Howe¹

2031 Accesses
20 Citations
4 Altmetric
Explore all metrics

Abstract

There is much debate regarding the types of information observers use to track moving objects. Howe and Holcombe (Journal of Vision 12(13): 1-10, 2012) recently reported evidence that observers employ extrapolation while tracking. However, their study is potentially confounded because it did not control for eye movements. As eye movements can aid extrapolation, it is unclear whether extrapolation can still occur in multiple object tracking (MOT) when eye movements are eliminated. In the current study, we addressed this question using an eye tracker to ensure that fixation was always maintained on a central fixation point while observers performed a tracking task. In the predictable condition, objects always travelled along linear paths. In the unpredictable condition, objects randomly changed direction every 300–600 ms. If observers employ extrapolation, we would expect performance to be greater in the former condition than in the latter condition. Our results showed that observers did indeed perform better in the predictable condition than in the unpredictable condition, at least when tracking just two objects (Experiments 1, 3, and 4). Extrapolation occurred less when tracking loads increased or when the objects moved more slowly (Experiment 2).

Gaze position lagging behind scene content in multiple object tracking: Evidence from forward and backward presentations

Article 26 July 2016

Jiří Lukavský & Filip Děchtěrenko

Potential Systematic Interception Errors are Avoided When Tracking the Target with One’s Eyes

Article Open access 07 September 2017

Cristina de la Malla, Jeroen B. J. Smeets & Eli Brenner

Eye Tracking in Visual Search Experiments

Introduction

Our ability to simultaneously track multiple moving objects is critical as it allows us to successfully navigate the dynamic world in which we live. Without this ability, everyday tasks such as crossing the road, driving, or engaging in team sports would not be possible. For example, when crossing a busy street, pedestrians might need to keep track of the positions of oncoming vehicles, cyclists, and/or other pedestrians in order to avoid accidents and injuries. Tracking plays a fundamental role in processing and interpreting dynamic environments.

Tracking has been extensively studied in the laboratory using the multiple object tracking (MOT) paradigm (Pylyshyn & Storm, 1988). In a typical MOT trial (see Fig. 1), observers are presented with a set of identical disks and a subset of these disks is momentarily highlighted to denote that they are the targets to be tracked. The disks then revert to their original color and once again become indistinguishable from the other disks in the display (i.e., the distractors). All disks subsequently move randomly about the display and during this time observers are asked to track the targets. Once the disks come to a halt, observers must identify whether a probed disk is a target or a distractor.

Importantly, the MOT task taps into various properties of real-world visual cognition. Much like the situations that we encounter in our everyday life, be it driving or playing team sports, MOT is an inherently active task that requires the observer to continuously attend to multiple objects over time (Scholl, 2009; Wolfe, Place, & Horowitz, 2007). Because MOT and real-world dynamic tracking both demand sustained attention to multiple objects, it is hoped that researchers will be able to gain a better understanding of how observers track objects in the real world through experiments conducted on MOT in controlled laboratory settings (Cavanagh & Alvarez, 2005).

At present, there is considerable debate regarding how targets are tracked. Proponents of the “no extrapolation hypothesis” argue that observers rely only on location information when tracking the targets (Franconeri, Pylyshyn, & Scholl, 2012; Keane & Pylyshyn, 2006; Vul, Frank, Tenenbaum, & Alvarez, 2009). When targets move from one place to the next, the observer compares the targets’ current locations to their last remembered locations. The observer assumes that whichever object is closest to a given target’s last remembered location is that target. Conversely, advocates of the “extrapolation hypothesis” claim that observers track targets by using both location information and motion information to extrapolate the future locations of the targets. Targets are identified based on where they are expected to be, not just on where they have been in the past.

To date, studies examining whether motion information is used to track multiple objects have yielded mixed results (Howard, Masom, & Holcombe, 2011; Iordanescu, Grabowecky, & Suzuki, 2009). Keane and Pylyshyn (2006) addressed this question using a “target recovery” paradigm. Unlike the conventional MOT paradigm, a blank screen was briefly introduced at the end of the trial and all the disks disappeared during that period. When the screen was removed, all the disks reappeared and the observers were required to identify the targets. The researchers manipulated the reappearance positions of the disks so that they could either reappear where they had disappeared (i.e., no-move condition) or at a location predicted by their previous movement (i.e., move condition). Tracking accuracy was greater in the no-move condition than in the move condition. Based on this finding, the researchers concluded that only current location information is used during tracking.

Fencsik, Klieger, and Horowitz (2007) argued that while the results of Keane and Pylyshyn (2006) show that observers prefer to use location information over motion information during tracking, this does not prove that extrapolation cannot be used. It might be that while it is more efficient for observers to utilize location information rather than extrapolation to reacquire the targets, observers are still able to extrapolate when required to do so (Fencsik et al., 2007). Fencsik et al. (2007) addressed this concern using a slightly modified target recovery paradigm that encouraged observers to employ extrapolation during the blank period. In one condition, the disks continued to move during the blank interval, forcing the observers to extrapolate to anticipate where the targets would reappear. In the other condition, the disks were stationary before the blank interval, thereby making extrapolation impossible. Tracking performance was better in the extrapolation condition than in the static condition for a tracking load of two, but not four, targets. This suggests that observers can use extrapolation to facilitate tracking, but only when tracking two targets. However, the generalizability of this finding is limited because positional information was not available during the blank period in either condition. It might be that in other circumstances whereby positional information is continuously available, observers would not employ extrapolation (Horowitz, Birnkrant, Fencsik, Tran, & Wolfe, 2006).

Iordanescu et al. (2009) also investigated whether extrapolation is employed during tracking by examining how targets are recovered after they disappear (see also Howard et al., 2011). The task used in their study differed from the target recovery paradigm used by Keane and Pylyshyn (2006) in that the disks did not reappear after disappearing at the end of the trial. Instead, after all the disks had disappeared, the observers were asked to click on the location of a particular target (e.g., the red one). Having computed the vector between the target’s disappearance location and the mouse-click location, the researchers found that observers tended to select locations that matched the direction of the target’s trajectory. In other words, they selected locations slightly ahead of where the target disappeared. Furthermore, there was a positive correlation between the degree of displacement and the speed of targets, such that faster target speeds produced larger forward displacements, and vice versa. As such, this study provides evidence that extrapolation is used in tracking.

More recently, an investigation by Franconeri et al. (2012) reported that motion information is not used to recover targets. Instead of having all the objects disappear from the display, individual objects passed behind a vertical occluder whilst they were being tracked. Tracking accuracy was greater when the targets reappeared closest to where they disappeared rather than when they reappeared at the expected location on the other side of the occluder predicted by their motion prior to being occluded. While this finding indicates that observers used the last known positions of the targets more than their extrapolated positions, it does not prove that extrapolation cannot be employed during tracking. Moreover, since four targets were always tracked in that study, it is possible that reducing the tracking load would allow observers to more readily utilize extrapolation (Fencsik et al., 2007; Iordanescu et al., 2009).

Although a number of the findings from the aforementioned studies provide evidence for extrapolation in object tracking, they do not address whether extrapolation is used to track objects that are continuously visible. Extrapolation may have been used for recovery and not for tracking per se (St. Clair, Huff, & Seiffert, 2010). Evidence of extrapolation following reappearance does not address whether observers actually do extrapolate the future locations of targets during the moment-to-moment tracking of visible objects. It could be that observers only extrapolate when forced to do so because the objects are temporarily not visible.

St. Clair et al. (2010) addressed this concern by using a MOT paradigm in which the targets were always visible. Observers were asked to track a number of disks, each of which contained a texture that could move independently of its motion. Results showed that tracking accuracy declined when the embedded texture moved in the opposite direction to the disk on which it was attached, suggesting that motion information is used to track the disks. A limitation with this study was that it did not control for object visibility. The disks became less visible when the embedded texture moved in the opposite direction to the motion of the disk because this conflicting motion degraded the disks’ borders. This reduction in object visibility may have in turn diminished the quality of available positional information and led to the impairment in tracking accuracy.

This visibility confound was avoided by Vul et al. (2009) who in their study presented stimuli that were clearly visible. The ideal observer model proposed by these researchers posits that motion information can be used to predict the future locations of objects, though the extent to which this occurs is determined by an internal observer-specific parameter. By fitting the model to the data obtained from their observers, Vul et al. (2009) could determine the extent to which the observers utilized extrapolation during tracking. Their results indicated that observers do not use extrapolation. However, because the speeds of the disks were constantly changing in their experiment, this may have made it difficult for observers to extrapolate and so could be a potential confound. Keeping the speed of the disks constant would make them more predictable to observers, which would in turn increase the likelihood that observers would use this information during tracking.

Despite a number of studies that have addressed the question, it is still unclear whether observers use extrapolation when tracking continuously visible objects and, if so, under what conditions they occur. Howe and Holcombe (2012) recently conducted an experiment that addressed this question while controlling for the various confounds identified in previous studies. Their study used a MOT paradigm in which the objects were continuously visible. To address the confounds present in the studies of St. Clair et al. (2010) and Vul et al. (2009), the researchers ensured that the visibility of the objects was the same in all conditions and that the speed of the disks was held constant. Two variables were manipulated: the number of targets to be tracked (two vs. four) and the predictability of object motion (predictable vs. unpredictable).

In the predictable condition, objects always travelled along a linear path, changing direction only when they reached the boundaries of the display. In the unpredictable condition, the disks randomly changed direction every 300–600 ms. When objects move in a predictable manner, the effectiveness of any extrapolation process is maximized (Howard et al., 2011). Conversely, when the same objects move in an unpredictable fashion, extrapolation becomes less helpful. Better performance in the predictable condition would therefore be indicative of observers extrapolating when tracking objects. Across all their experiments, results showed that observers were more accurate in the predictable condition than in the unpredictable condition when tracking two but not four targets. This is consistent with the finding of Fencsik et al. (2007), indicating that observers are able to extrapolate when tracking two targets but are less able to do so when tracking four targets. However, Fencsik et al. (2007) and Howe and Holcombe (2012) did not control for eye movements, and as such their results are potentially confounded because eye movements aid extrapolation (Zhong, Ma, Wilson, Liu, & Flombaum, 2014). In particular, it is unclear whether they would have obtained the same results had they prevented their observers from making eye movements. It is possible that extrapolation only occurs in MOT when observers are free to move their eyes (Zhong et al., 2014).

Although tracking can occur even when observers are required to maintain fixation on a fixation cross throughout the tracking process (Howe, Pinto, & Horowitz, 2010; Intriligator & Cavanagh, 2001), it is becoming increasingly apparent that eye movements can play an important role in MOT under free viewing conditions. When tracking three targets, observers have a tendency to look at the center of the triangle formed by the targets even when none of the targets are located at this position (Fehd & Seiffert, 2008). This tendency is more pronounced when tracking two targets than four targets (Zelinsky & Neider, 2008). This does not occur simply because observers are trying to minimize eye movements but rather this strategy directly benefits tracking performance (Fehd & Seiffert, 2010). When observers are asked to fixate only on the individual targets rather than occasionally also fixating on the center point of the group of targets, their tracking accuracy decreases (Fehd & Seiffert, 2010). This is not to say that observers never need to fixate on the individual targets – they do this periodically, at least in part, to “rescue” targets that are in immediate danger of becoming lost (Zelinsky & Todor, 2010). So while it is clear that the strategy of fixating on the center point of a group of targets plays an important role in tracking, especially in situations where tracking is particularly difficult such as those containing abrupt viewpoint changes (Huff, Papenmeier, Jahn, & Hesse, 2010), it is not the only factor affecting eye movements (Lukavsky, 2013). In particular, it has recently been suggested that eye movements also play a role in extrapolation (Zhong et al., 2014).

For extrapolation to be effective, the observers must first accurately estimate the velocities of the targets. If there is just a single target, the observers can potentially do this by fixating on the target and following it with a smooth eye pursuit (Zhong et al., 2014). By knowing how their eyes are moving, the observers can then estimate the movement of the target. For single targets, observers can indeed accurately extrapolate to where they expect the target to be (Diaz, Cooper, Rothkopf, & Hayhoe, 2013; Hayhoe, McKinney, Chajka, & Pelz, 2012; Land & McLeod, 2000 ). As the number of targets to be tracked increases, this strategy becomes increasing less effective. This could explain why observers’ knowledge of the direction of motion of targets in MOT decreases as the number of targets to be tracked increases (Horowitz & Cohen, 2010; Shooner, Tripathy, Bedell, & Ogmen, 2010). Zhong et al. (2014) have postulated that the only way that observers can extrapolate in MOT is by using eye movements, and this explains why Howe and Holcombe (2012), who in their study enabled observers to freely move their eyes, found evidence for extrapolation when observers tracked two but not four targets.

The purpose of the current investigation was to test the claim that extrapolation in MOT can be achieved only by eye movements. This was done by replicating some of the key experiments of the Howe and Holcombe (2012) study while controlling for eye movements by using an eye tracker to ensure observers maintained fixation on a central fixation cross throughout the tracking task. This also ensured that any eccentricity effects on tracking accuracy (Intriligator & Cavanagh, 2001) would be the same in all conditions and would not vary either with the number of targets or with whether the targets move in a predictable or unpredictable manner.

Experiment 1

The first experiment attempted to replicate Experiment 2 of the Howe and Holcombe (2012) study with the addition of a central fixation cross to control for eye movements. If observers are able to utilize extrapolation, tracking performance should be greater in the predictable condition than in the unpredictable condition because the former condition would render motion information more useful than the latter condition.

Method

Participants

A power analysis run on Experiment 1 of Howe and Holcombe (2012) revealed that for a power level of 0.95 we would need to run 13 subjects. The power analysis was based on the effect between the predictable and unpredictable motion conditions in the two-target case. We decided to run more observers than this to be consistent with the number run in this previous study. A total of 18 (six males, 12 females) undergraduate students from the University of Melbourne aged between 18 and 28 years (M _age = 20.8, SD = 2.94) took part in this experiment. Of the 18 participants, two participants were excluded because they performed at ceiling levels (>97 % accuracy in both motion conditions for either the two-target or four-target case) and one participant was excluded because she did not meet the 20/25 visual acuity criterion (i.e., at least 20/25 in either eye). Therefore, the data for the remaining 15 participants were analyzed. All observers that were included in the analysis had normal or corrected-to-normal visual acuity (20/25 or better) as verified using a near vision (40 cm) Good-Lite® eye chart and normal color vision as determined by an Ishihara color blindness test.

Informed written consent was obtained prior to the commencement of the experimental session. The study was approved by the Department Human Ethics Advisory Group in the School of Psychological Sciences at the University of Melbourne.

Apparatus

Stimuli were presented on a 21-in Sony CRT monitor at a resolution of 1280 × 1024 pixels with a refresh rate of 85 Hz at a distance of 60 cm. The experiment was programmed and presented in MATLAB (Mathworks, Natrick, MA, USA) using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). A 200-Hz head-fixed ViewPoint EyeTracker® system (Arrington Research, Inc., Scottsdale, AZ, USA) was used to ensure that all participants maintained fixation on a central fixation cross. Any time fixation was broken, which was defined as occurring if the point of fixation left the 1.5° × 1.5° fixation window centered on the fixation cross, the trial was restarted with the positions and motion directions of the objects randomized. The trajectories of the objects were never repeated to prevent the observers from learning them. The number of times a trial was restarted did not vary significantly between conditions, F(3, 42) = 0.982, p = .41, η ²_p = .07. The participants would therefore not have received significantly more practice with one of the conditions.

Stimuli

The present study employed a 2 × 2 within-subjects factorial design. The independent variables were type of motion (predictable vs. unpredictable) and number of targets (two vs. four). In each of the four conditions, observers were always presented with eight solid black disks (luminance = 1.74 cd/m²) on a white background (luminance = 29.99 cd/m²). A fixation cross (+) subtending 0.95° × 0.95° was presented in the center of the screen. Each disk subtended 0.75° of visual angle. All disks were confined to move within a 15° gray-edged square, bouncing off the inside walls of the square but passing over each other without colliding. In the predictable motion condition, the disks always travelled along a linear path except when the walls of the square were encountered. In the unpredictable motion condition, the disks randomly changed direction every 300–600 s.

Figure 1 illustrates the structure of the MOT trial used in the study. Each trial began with either two or four disks identified as the targets by turning red for 1.5 s. The targets then reverted to black and once again became indistinguishable to all other disks (distractors). The participants were instructed to track the targets while maintaining fixation on a cross at the center of the screen for 5.5 s. When the disks stopped moving, two disks were highlighted, one after the other. Participants were required to indicate whether each highlighted disk was a target or distractor. There was always a 50 % chance that a given probed disk was a target regardless of whether observers initially had to track two or four targets. Since tracking accuracy was defined as the percentage of trials for which observers were able to correctly identify both probed disks at the end of the trial, chance performance was at 25 %.

Procedure

Calibration procedure: Following the completion of ten practice trials, participants performed a calibration procedure which consisted of two 45-trial QUEST staircase routines, one for the two-target predictable condition and the other for the four-target predictable condition (Watson & Pelli, 1983). This procedure determined the speed at which each observer was able to achieve 75 % tracking accuracy in the predictable motion conditions for each target number. The staircase routines were necessary in order to control for individual differences in tracking ability (Oksama & Hyönä, 2004). Equal performance levels in the two-target and four-target conditions enables direct comparisons to be made between the two sets of conditions. Any differences between these two conditions cannot be attributed to differences in tracking performance caused by varying the number of targets.

Main experiment: Using the disk speeds obtained from the calibration process, observers completed in total 120 experimental trials that were presented in a random, interleaved order. Observers had no prior knowledge of whether the motion for a given trial would be predictable or unpredictable.

Results and discussion

The average disk speeds generated by the QUEST staircase routine for all four experiments are shown in Fig. 2. Results for Experiment 1 are shown in Fig. 3. A 2 × 2 repeated measures ANOVA revealed a significant main effect of motion type on tracking accuracy, F(1,14) = 5.16, p = .039, η ²_p = .269, with observers performing better in the predictable motion condition than in the unpredictable motion condition. Accuracy was also significantly greater in the four-target conditions than in the two-target conditions, F(1,14) = 12.34, p = .003, η ²_p = .468, despite the attempt of the QUEST routine to equate performance in the two conditions. There was a significant interaction between the type of motion and the number of targets, F(1, 14) = 13.92, p = .002, η ²_p = .499. Subsequent t-tests revealed a significant difference in tracking accuracy between the predictable and unpredictable motion conditions in the two-target case, t(14) = 4.33, p = .001, r ² = .57, but not in the four-target case, t(14) = 0.22, p = .83, r ² = .003.

These results support our hypothesis that observers are able to employ extrapolation when tracking two but not four targets. However, there is a potential confound. In this experiment, the speed at which the disks moved in the four-target conditions was slower than the speed at which they moved in the two-target conditions so as to equate tracking performance in the two sets of conditions. It is possible that the difference between the two-target and four-target conditions was the result of differing disk speeds rather than differing target numbers. This issue was addressed in the following experiment.

Experiment 2

In this experiment, we arranged for all conditions to use the same disk speed. This ensures that any observed differences between the conditions are not due to differences in disk speed. This addresses the potential confound in Experiment 1 discussed above.