Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows

Foulsham, Tom; Teszka, Robert; Kingstone, Alan

doi:10.3758/s13414-010-0014-5

Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows

Published: 20 November 2010

Volume 73, pages 266–283, (2011)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows

Download PDF

Tom Foulsham¹,
Robert Teszka¹ &
Alan Kingstone¹

3006 Accesses
45 Citations
1 Altmetric
Explore all metrics

Abstract

When people view images, their saccades are predominantly horizontal and show a positively skewed distribution of amplitudes. How are these patterns affected by the information close to fixation and the features in the periphery? We recorded saccades while observers encoded a set of scenes with a gaze-contingent window at fixation: Features inside a rectangular (Experiment 1) or elliptical (Experiment 2) window were intact; peripheral background was masked completely or blurred. When the window was asymmetric, with more information preserved either horizontally or vertically, saccades tended to follow the information within the window, rather than exploring unseen regions, which runs counter to the idea that saccades function to maximize information gain on each fixation. Window shape also affected fixation and amplitude distributions, but horizontal windows had less of an impact. The findings suggest that saccades follow the features currently being processed and that normal vision samples these features from a horizontally elongated region.

Searchers adjust their eye-movement dynamics to target characteristics in natural scenes

Article Open access 07 February 2019

Saccades: Fundamentals and Neural Mechanisms

Fixation-related saccadic inhibition in free viewing in response to stimulus saliency

Article Open access 22 April 2022

Introduction

The human’s visual environment is extremely rich. At any one time, people are faced with a continuous array of information comprising important or potentially useful items amidst a background of less informative noise. The visual system’s answer to this complexity is twofold. First, the retinas encode the whole visual field in a non-uniform manner: Spatial resolution is greatest at the fovea and decreases rapidly, meaning that objects in central vision are processed in fine detail, while neural resources are spared the intensive task of representing the whole environment at this level of precision. Second, a series of fast eye movements are then programmed to align the high-resolution fovea with different parts of the visual array. The efficiency of the visual system at processing the parts of the environment most important for the current task, therefore, depends crucially on its ability to make efficient eye movements. Specifically, the eye guidance system must compute where to move the eyes in order to process important regions, but this computation can only be an estimate based on the low-resolution preview of the periphery. Although the resolution of the visual system drops off exponentially as a stimulus is moved further from the current fixation, researchers often divide the visual field into the fovea (within about 1° of fixation), the parafovea (between about 1° and 5° from fixation), and the periphery (more than 5° from fixation; see, e.g., Larson & Loschky, 2009).

In this study, we examined global changes in eye movements during a scene-encoding task, by manipulating the extent of the scene that could be processed on each fixation, using a gaze-contingent display. With this aim in mind, we will first review some of the previous research investigating eye guidance in scenes and the use of gaze-contingent displays.

Eye guidance in natural scenes

Two of the earliest studies of eye movements used pictures of natural scenes and identified two important facts about where people look in such images (Buswell, 1935; Yarbus, 1967). First, fixations are not uniformly distributed but cluster on points of interest (e.g. faces and objects). Second, eye movement patterns change depending on the viewer’s task. Subsequent researchers have sought to determine what aspects of the image or the task influence the decision of where to move the eyes (for a review, see the recent special issue: Tatler, 2009).

One approach has been to identify the features commonly found at fixated locations (Reinagel & Zador, 1999) and use these features to compute a saliency map of conspicuous points in the image (Itti & Koch, 2000). This model predicts that people will look at the most salient points in the image, and the implication is that the visual system is computing saliency from peripheral information and using this as an estimate of the most important places to fixate. The saliency map model can predict eye movements better than chance, and it has the advantage of being applicable to any arbitrary image (Foulsham & Underwood, 2008; Peters, Iyers, Itti, & Koch, 2005). However, even the highest estimates of the correlation between saliency and fixation are small, and because image-statistical approaches are fundamentally correlational, the question of whether saliency actually causes fixation selection often remains unanswered. Furthermore, the whole idea of a model of eye guidance based only on image features is called into question by the demonstration that eye movements are highly dependant on the observer’s task. In search tasks, for example, participants are able to ignore salient regions and look towards regions that are similar to the target, as well as to areas where they expect targets to be found, given the context (Chen & Zelinsky, 2006; Foulsham & Underwood, 2007; Torralba, Oliva, Castelhano, & Henderson, 2006).

The models of eye movements that have been developed on the basis of search data can be considered top-down, in the sense that they possess task-relevant knowledge (normally, target features) independent of the actual stimulus (Navalpakkam & Itti, 2005; Rao, Zelinsky, Hayhoe, & Ballard, 2002; Torralba et al., 2006; Zelinsky, 2008). For example, in the Rao et al. model, saccades are programmed to locations showing the highest correlation with the target, producing efficient search, as well as less intuitive eye movement behaviour, such as centre-of-gravity fixations that land between objects. The Torralba et al., model combines bottom-up saliency, guidance by target features, and contextual guidance: a spatial prior to bias attention towards areas where targets are likely to be found (when one searches for pedestrians, search should be concentrated on the street and not the sky). Kanan et al. (2009) extended these ideas with their saliency using natural statistics (SUN) model, which incorporates contextual guidance and probabilistic maps based on object appearance.

Najemnik and Geisler (2005) took a different approach with their ideal observer model of search. Rather than programming eye movements to locations resembling the target, this model emphasizes that an optimal searcher will place fixations that maximize the information gained. For example, because the human visibility map is horizontally elongated (i.e. empirically measured detection performance drops off with eccentricity more rapidly above and below fixation), the ideal searcher will often move to the top or bottom of the display, where target presence is more uncertain, in order to maximize the information that can be gained given the horizontal visibility map. This model successfully matched several aspects of human performance at searching for a sinusoid amid 1/f noise, including search times and some global eye movement behaviours (Najemnik & Geisler, 2008). This approach highlights the importance of foveated models, which take into account the limited resolution in the periphery. The predictiveness of both top-down and bottom-up models improves when this fundamental anatomical feature of human vision is included (Parkhurst, Law, & Niebur, 2002; Peters et al., 2005; Zelinsky, 2008).

Although these models are useful examples of top-down guidance, it is unclear how they can be applied to tasks other than search (or even to search where target features are not completely known). For example, Foulsham and Underwood (2007, 2008) used a memory-encoding task, where participants are asked to view a series of scenes in preparation for a memory test; and Underwood, Jebbett, and Roberts (2004) and Foulsham, Kingstone, and Underwood (2008) used a picture–sentence verification task where participants had to verify the accuracy of a sentence that appeared after the image it was describing. What determines where people look in these tasks, in which there is no explicit target? Some of the highest correlations between saliency and fixation are found in free-viewing tasks (although even in such tasks, the correlations are very weak and can be overidden by top-down demands; see, e.g., Einhauser, Rutishauser, & Koch, 2008). This is perhaps because, in the absence of a target, visual saliency coincides with places that are useful for interpreting or remembering the scene. A rather different approach to investigating eye movements in such tasks is to consider the general spatial biases that occur in saccade selection. Fixations tend to be biased towards the centre of most images, and this can be dissociated from photographer bias and the distribution of visual features (Foulsham & Underwood, 2008; Tatler, 2007). Saccades tend to move horizontally, and their amplitudes show a characteristic, positively skewed distribution. The trend for horizontal saccades is seen even in square images, and it changes as the scene is rotated, demonstrating that it is related to scene content and is not a fundamental property of the oculomotor system (Foulsham et al., 2008). These systematic tendencies of eye movements in scenes can potentially predict where people will fixate just as well as image-based models (Tatler & Vincent, 2009). It is therefore important to consider the causes of these tendencies and, in particular, the role of central and peripheral information. This article looks at how biases in saccade selection during an encoding task are altered by the use of a gaze-contingent viewing paradigm. We first review this technique.

The use of gaze-contingent displays

A gaze-contingent display is one that is updated in response to the viewer’s eye movements. This technique allows the experimenter to manipulate the information available at different eccentricities. In reading, the gaze-contingent moving-window design masks text outside a window that is centred on fixation. By varying the size of the window, the perceptual span at which reading can still proceed normally can be assessed (McConkie & Rayner, 1975; see Rayner, 1998, for a review). In search, a gaze-contingent window has been used to manipulate the target-similar features present in the periphery (Pomplun, Reingold, & Shen, 2001). Masking reduced the degree to which search was guided, supporting the idea that guidance operates preattentively and in parallel. In scene perception, Saida and Ikeda (1979) and Shioiri and Ikeda (1989) used a moving-window design to identify the useful field of view for picture memorisation—the size of the area around fixation which is actually used for perception. Memory performance improved as the size of the window increased, although large windows of around 10° in diameter elicited performance similar to that for normal viewing. Interestingly, an analysis of eye movements suggested that the useful field of view tended to overlap on consecutive fixations, with around 75% of the saccades moving to points that could be processed on the previous fixation. Multi-resolutional displays take the gaze-contingent technique further by allowing the resolution of the scene to be steadily degraded as a function of increasing eccentricity (Loschky & McConkie, 2002; Reingold, Loschky, McConkie, & Stampe, 2003). When the function relating eccentricity and resolution is lower than or matches that in the human visual system, this is not detected by the observer (Loschky, McConkie, Yang, & Miller, 2005). Some of these results were confirmed by Geisler, Perry, and Najemnik (2006), who varied the drop-off in peripheral resolution while observers searched for a target in noise. The authors’ ideal searcher model produced similar behaviour, given the same eccentricity limitations. Masking the periphery outside a moving window also seems to lengthen fixation durations (Greene, 2006), particularly with small windows (Loschky & McConkie, 2002). van Diepen and d'Ydewalle (2003) found that while masking the region at fixation affected fixation durations most severely (as would be expected if fixations are dominated by processing at the current location), peripheral masking also had an effect.

Despite this research, there are surprisingly few studies using gaze-contingent displays to study issues of saccade control in scenes, certainly as compared with reading research (Rayner, 2009, made a similar observation while reviewing the literature). In particular, despite evidence for asymmetries in eye movements (e.g., the predominance of horizontal saccades) and in the processing of information at fixation (which appears to be horizontally elongated; Najemnik & Geisler, 2005), no study has yet varied the shape and symmetry of a gaze-contingent window in scenes.

The present research

In the present research, we compared eye movements during a scene-encoding task in several different gaze-contingent conditions, and we looked for effects on trends in saccade direction and amplitude. A particular point of interest was the bias for horizontal eye movements. What causes this bias? If it is due to the distribution of features in the periphery, it should be reduced or eliminated in viewing where the visible information is restricted and equal in all directions (as in a gaze-contingent display with a symmetrical square or circular window). Here, we contrasted normal viewing with a symmetrical moving window and two asymmetrical window conditions (Fig. 1). How would altering the shape of the window around fixation affect the saccades made?

On the basis of previous research, we would expect scanning to suffer with the gaze-contingent window manipulations, leading to shorter saccades and longer fixations. The models discussed above suggest two potential determinants of saccade selection, and these lead to two hypotheses in the case of an asymmetric window. First, if saccades are targeted towards features whose importance can be detected from the current fixation location, currently visible regions of the scene will have a greater influence than unseen areas. In the window conditions, the only features available are within the window, so we would expect short saccades that target things within this region. If the features of potential saccade targets are represented as points on a spatial map, the saliency of points outside the window will be reduced (see also Loschky & McConkie, 2002). In the case of asymmetric windows, there are more features in one direction, and this should therefore result in a change in the distribution of saccade directions: more horizontal saccades with a horizontal window and more vertical saccades with a vertical window.

A second possibility is that saccades are chosen to maximize the new information gained on each fixation. How should we define information maximization in an encoding task? The best strategy in such a task would be to look at as much of the scene as possible. We therefore propose that maximizing the information gained means moving to a location where the most new features can be seen. If this were the case, we would expect more vertical saccades in the horizontal window condition and more horizontal saccades in the vertical window condition. This pattern would “reveal” more of the image with each gaze shift by its moving to locations that were invisible (where information was zero) on the previous fixation. Our experiments investigated the effects of the differently shaped gaze-contingent windows on saccades during encoding, with particular emphasis on distinguishing between these possibilities.

The results of Najemnik and Geisler (2008) emphasised that, for a full picture of the scanning process, saccade direction, saccade amplitude and fixation location distributions need to be analysed. In their study, they found that although fixation positions were biased to the top and bottom of the display (consistent with their ideal information maximization model and a horizontal visibility map), horizontal saccades were most common. These results could be reconciled by looking at the saccade amplitudes: Infrequent but large vertical saccades moved fixation to the top or bottom of the display, which was then explored with smaller but more common horizontal movements. The authors speculated that this strategy may also be due to our experience with natural scenes, where objects are often found on the horizon. To test the generalizability of this claim, we also looked at the interaction between window shape and scene type (landscapes vs. interiors), since we had found differences in these stimuli previously (Foulsham et al., 2008).

Experiment 1

Method

Participants

Inclusion in this study was contingent on having normal vision (without glasses) and on completing a good calibration on the eyetracker. Sixteen participants (9 females; age range, 18–24 years) took part for course credit and gave their informed consent.

Stimuli, apparatus and design

Eighty colour photographs showing indoor or outdoor scenes were used, half of which were presented in both the encoding phase and the test phase. All the images were high resolution and were collected from the Internet and commercially available collections and were resized to 1,024 × 768 pixels. Each encoding image was matched with a correct sentence (describing the state or position of something in the scene; e.g. “There is a towel on the bath”) and an incorrect sentence (e.g. “There is a towel on the floor”).

Eye movements were recorded using the Eyelink 1000 eyetracker (SR-Research). Participants were seated at a chinrest that ensured a constant viewing distance of 60 cm from the screen and eliminated head movements. Stimuli and instructions were presented on a 19-in. monitor with a 60-Hz refresh rate, the frame of which was visible throughout the experiment. The screen subtended approximately 30° × 25° of visual angle. Images were shown full-screen, and participants used a gamepad to respond after each trial. Eye movement events were parsed using the default EyeLink 1000 algorithm, which identified saccades where the velocity of the eye position signal was greater than 30°/s and acceleration was above 8,000°/s².

All the participants saw the same images in a random order. Four viewing conditions were used, and these were presented within participants in a blocked fashion, counterbalanced among participants: normal viewing, square gaze-contingent window, horizontal window and vertical window (see Fig. 1).

Stimuli in the normal condition were presented full-screen without modification. The gaze-contingent window conditions were continuously updated in response to the participants’ eye movements, and this process was controlled by EyeLink’s Experiment Builder software with custom Python programming. In each case, the stimulus consisted of a grey mask filling the screen, with a gaze-contingent window overlaid. This window had dimensions of 6.2° × 6.2°, 12.5° × 3.1° and 3.1° × 12.5° in the square, horizontal and vertical conditions, respectively. The area of the window was therefore the same in all the gaze-contingent conditions. A portion of the image of this size was cropped and centred on the current fixation, and this moved with the participant’s gaze, creating a moving window through which the image could be explored (see Fig. 2 for a description). On the basis of the size of these windows, visible information extended from the fovea to the parafovea (considered to be between about 1° and 5° from fixation) and, in the case of the elongated windows, into the periphery. A conservative estimate of the average time lag between an eye movement and the updating of the display is 24 ms (which included calculation of eye position, processing of the new image and monitor refresh rate). This lag is unlikely to have been detectable by observers (who are often unable to detect changes at lags of 80 ms; Loschky & Wolverton, 2007).

Stimuli appeared in all four conditions, across participants, and at encoding, images were equally likely to be paired with a correct or an incorrect sentence. At test, the images from encoding were presented again, interleaved with the same number of unseen images.

Procedure

Following calibration with a 9-point grid, two practice trials were given in order to familiarize the participants with the gaze-contingent display. The experiment proper then began with the encoding phase (Fig. 2).

Participants were shown four blocks of ten images, one block for each of the four viewing conditions. Each encoding trial began with a central fixation point, which participants were required to fixate before the trial began and which therefore ensured that scanning started in the centre. The image then appeared and remained on the screen for 10 s. Participants were instructed to inspect the scene and try to remember it for the sentence verification task. Following the image, a sentence appeared that could be correct or incorrect with regard to the previous scene. Participants were required to press one of two keys on the gamepad to indicate whether the sentence was correct or not, and this keypress terminated the display and initiated the next trial.

When all the encoding trials were complete, participants were given a surprise memory test for the images, which we will use as an additional measure of how well image encoding can proceed under the different viewing conditions. Participants were instructed to view each image and decide whether they had seen it previously in the encoding block. All 80 images (half of which were the ones seen at encoding) were then displayed in a random order. Each test trial began with a fixation point, followed by the presentation of the scene. The image remained on the screen until the participant made an old/new judgment by pressing one of two keys on the gamepad. The experimenter continued to monitor the validity of the eyetracker calibration, and it was recalibrated after encoding and whenever necessary to maintain a good calibration.

Analysis and results

We used participants’ memory as a preliminary indicator of encoding performance. In the subsequent recognition test, scenes were correctly recognized on 75% of trials (mean false alarm rate = 12%), and accuracy did not vary reliably with the viewing condition at encoding, F(3, 45) = 1.1, p = .34. There was a marginally reliable effect on correct recognition time, F(3, 45) = 2.7, p = .059. Recognition was fastest when the stimulus had been seen under normal conditions (mean RT = 2,990 ms) or when a square window had been used at encoding (3,014 ms). The asymmetric conditions were associated with the slowest performance (horizontal = 3,765 ms; vertical = 3,747 ms). Thus, scene encoding was worse in the gaze-contingent conditions, which is consistent with previous reports (Saida & Ikeda, 1979). Our subsequent analyses concentrated on the way in which the scenes were scanned with saccadic eye movements.

We first looked at eye movement measures across the whole trial: the number of fixations, their mean duration, and the mean amplitude of the saccades made. We then focused on our main question by looking at saccade direction and amplitude in the different conditions. In this study, we were concerned only with behaviour at encoding, where there was a fixed trial time. In each case, we compared the different viewing conditions, using a within-subjects ANOVA, with post hoc Tukey tests, which compensate for the familywise error associated with making multiple comparisons, being used to compare between each pair of conditions where necessary.

General eye movement measures (Table 1)

Viewing condition had a significant effect on the number of fixations made on each trial, F(3, 45) = 5.47, p < .05, but no reliable effect on the average fixation duration, F(3, 45) = 2.10, p = .11. Gaze-contingent trials tended to have more fixations, with a slightly shorter duration, than normal viewing. Pairwise comparisons demonstrated that the square window elicited reliably more fixations than did both normal viewing, q(15) = 4.3, p < .05,and viewing with a vertical window, q(15) = 5.6, p < .01. No other comparisons were reliable.

Table 1 Measures quantifying general eye movement behaviour during the scene-encoding task

Full size table

There was a highly reliable effect of condition on the mean saccade amplitude, F(3, 45) = 117.96, p < .001. The gaze-contingent conditions were characterized by saccades several degrees shorter, on average, than in normal viewing, all qs(15) < 13, all ps < .01. However, saccades in the horizontal condition were not as short as those in the vertical and square conditions, both qs(15) < 8, both ps < .01; a horizontal window did not produce such a severe change in the size of scanning movements. The vertical and square conditions did not differ reliably.

Saccade direction

Our analyses of saccade direction did not include any saccades that were shorter than 1°, so as to exclude readjustive saccades, microsaccades or minor artifacts of the eyetracker. Figure 3 illustrates the relative frequency of saccades in each direction, binned into 36 bins of 10°. All of the conditions contained a high proportion of horizontal saccades, but there was a change in the vertical condition, with more vertical saccades being made in those trials.

To perform statistics, we divided the full range of directions into four 90° arcs, centred on the cardinal directions (see the shaded regions in Fig. 3). We first confirmed the symmetry of the plots in Fig. 3. There was no difference in the frequency of upward versus downward saccades, and no difference in the frequency of leftward versus rightward saccades, in any of the conditions, all ts(15) < 1. As a result, we collapsed all the saccades into two categories, vertical and horizontal, and computed the frequency of saccades in each category for each participant in each condition. Finally, we calculated the proportion of horizontal saccades (hereafter, the HVP, calculated as the frequency of horizontal saccades divided by the frequency of all saccades). An HVP of 1 would indicate that all the saccades made were horizontal, whilst an HVP of 0 would show complete dominance of vertical eye movements.

In normal viewing, with no gaze-contingent window, there was a mean HVP of .70 (SE = .01). This quantifies the horizontal bias, which was present in all participants and reliably different from an equal proportion of saccades in each direction (one-sample t test against an HVP of .5), t(15) = 22.0, p < .001. Viewing condition had a reliable effect on the HVP, F(3, 45) = 11.44, p < .001. The vertical window produced a significantly smaller horizontal bias than in normal viewing (M ± SE = .59 ± .03), q(15) = 5.4, p < .01. However, the horizontal and square conditions were not significantly different from normal (.72 ± .01 and .67 ± .01, respectively). The vertically oriented window elicited reliably more vertical and fewer horizontal saccades, leading to a lower HVP than for the other shapes, both qs(15) > 4, both ps < .05. The square window resulted in behaviour somewhere between that in the two rectangular conditions, with a less pronounced horizontal tendency than in the horizontal condition, q(15) = 4.9, p < .05. These findings demonstrate that the window shape modified the frequency of saccades made in different directions.

These results are taken from the whole 10-s trial. However, saccade dynamics can change over time as more is learnt about the scene and objects are inspected in detail (Unema, Pannasch, Joos, & Velichkovsky, 2005). We therefore broke down the saccade distribution by ordinal saccade number (Fig. 4). We analysed only the first ten saccades, in order to see how quickly the effects of condition became apparent and because there were no other deviations in the effects of condition on later saccades. The first data point in this figure, for example, shows the mean proportion of first saccades that moved horizontally. A 4 (condition) × 10 (saccade) repeated measures ANOVA indicated that over the first ten saccades there was an effect of condition, F(3, 45) = 7.0, p = .001, but no effect of saccade number, F(9, 135) = 1.5, p = .17. These main effects were qualified by an interaction: The change in scanning with a different-shaped window was not equal on all saccades, F(27, 405) = 3.1, p < .001. Of particular interest is the very first saccade, which tended to be horizontal in all the conditions except the vertical window condition, which elicited more vertical eye movements (different from all conditions), all qs(15) > 4, all ps < .05. The vertical condition continued to be different from the other viewing conditions on most saccades, although the other conditions showed a large degree of overlap. The simple main effect of condition was reliable on six of the first ten saccades, all Fs(3, 13) > 4.5, ps < .05, with the exception of the 2nd, 4th, 5th and 10th, where the conditions were not significantly different.

Saccade amplitude

In light of the differences in saccade direction, it is pertinent to ask how amplitude and direction interact as a function of the window shape. If the changes in the saccade direction distribution were due to saccades that target locations within the window, we would expect the majority of saccades to have amplitudes of less than the extent of the viewing aperture. Saccades larger than this would have been made towards the masked background and would, therefore, indicate strategic or top-down selection.

Figure 5 shows the histogram of horizontal and vertical saccade amplitudes. We considered the whole distribution, because it is possible that the window conditions might have led to bi- or multi-modal distributions (e.g. by increasing the frequency of small and large saccades at the expense of intermediate lengths) and because saccade lengths tend to be positively skewed. We compensated for this skew in our statistics by analyzing the participant medians.

The amplitude distributions were unimodal and characterized by a few, very small saccades (of 1° or less), a majority of saccades of amplitude between about 1.5° and 4°, and a gradually decreasing frequency of larger eye movements. In normal viewing, horizontal and vertical saccades show similar distributions with a mode at about 1.5° and a median amplitude of 6.2° and 4.1° for horizontal and vertical saccades, respectively. The distribution for vertical saccades is sharper, with fewer long saccades: Only 17% of vertical saccades were over 8°, as compared with 30% of horizontal eye movements.

In comparison with normal viewing, the gaze-contingent window conditions had narrower distributions with a larger mode but fewer long saccades, resulting in lower medians. For example, the medians for horizontal and vertical saccades in the square condition were 3.4° and 3.0°, respectively, lower than those for normal viewing but showing the same trend as that for more large horizontal eye movements.

Looking at the bottom two panels in Fig. 5, it is clear that the asymmetrical window shapes led to a systematic change in saccade length. With a horizontal window, the distribution of horizontal saccades was more spread out and had a higher mode and more long-range saccades, leading to a higher average (median = 5.1°), relative to vertical eye movements (median = 2.9°). With the vertical shape, the opposite pattern was observed, and this was the only case where there were more large saccades moving vertically (median = 3.5°) than horizontally (median = 2.9°). An omnibus ANOVA performed on the participant medians confirmed that there was an effect of viewing condition, F(3, 45) = 76.6, p < .001. Direction was also reliable, with horizontal saccades resulting in a longer median, overall, than did vertical eye movements, F(1, 15) = 91.3, p < .001. However, these effects were qualified with an interaction, F(3, 45) = 89.4, p < .001. The median amplitude of horizontal saccades was greater than that of vertical saccades in normal viewing and on those trials with a square or a horizontal window, all ts(15) > 3.8, ps < .005. However, on trials with a vertical window, the median amplitude of vertical saccades was larger, t(15) = 3.6, p < .005.

The dotted lines in Fig. 5 indicate the extent of the moving window in each direction, which gives an idea of the frequency of saccades landing within versus beyond the window. We compared the landing site of each saccade with the coordinates of the aperture on the previous fixation. Although saccades in the gaze-contingent conditions were shorter than normal, about 50% of all the saccades went outside the window. In the square condition, 49% of the horizontal saccades and 37% of the vertical saccades went outside the window. The pattern in the horizontal condition (horizontal saccades, 37% outside the window; vertical saccades, 78%) was precisely the opposite of that seen on trials with a vertical window (horizontal saccades, 79%; vertical saccades, 32%). These observations suggest that it was perfectly possible for people to saccade to parafoveal or peripheral locations that were empty. In other words, the length of saccades was not completely curtailed by the presence of a gaze-contingent boundary, as evidenced by, for example, the tendency to make vertical eye movements beyond the edge of a horizontal window.

Conclusions from Experiment 1

There was a clear effect of the gaze-contingent viewing conditions, relative to normal viewing, and of the shape of the window. Image viewing with a moving window was characterized by more fixations and shorter saccades, and window shape had a differential effect on scanning direction and amplitude. There was a predominance of horizontal saccades in all the conditions, which suggests that this bias is not dependant solely on the visual features in the periphery (because it was also found in the masked-background conditions). However, a change from a horizontal window to a vertical one did change the pattern of saccade directions. A vertical window led to more vertical saccades: Participants preferred to move toward regions about which they already had some visible information. The distribution of saccade amplitudes shifted according to the boundaries of the window, although a significant number of saccades went beyond this boundary (i.e. into empty space). In the context of the memorization task, the moving-window conditions were detrimental to encoding and recognition, demonstrating that removing peripheral information had an impact on cognition.

In Experiment 1, the window conditions reduced the visual information available outside the aperture to zero (and in the case of the horizontal and vertical windows, this reduction was asymmetric). Complete masking of peripheral information is a rather artificial situation, and this may have been compounded in our experiment by the use of rectangular windows, which led to strong discontinuities, and straight edges at the boundary of the window. It is possible that the predominance of horizontal and vertical saccades was affected by these properties of the moving window or that it was unnatural for participants to saccade into empty space.

In Experiment 2, we used a more subtle manipulation to control the information available for planning saccades. We had two aims with this additional experiment. First, our aim was to replicate the changes in saccades found with vertical versus horizontal windows in a moving-window display without straight edges and with a less pronounced discontinuity between the window and the surround. Specifically, we used an elliptical window, and rather than mask the background completely, we presented high-resolution information at fixation and a low-pass-filtered (i.e. blurred) version of the image as a background. Second, we tested to see whether the effects of window shape would be moderated by the amount of information in the periphery. With a blurred background, all possible saccade targets contain some information, and the saccadic system must decide whether to move within the window, where visual information is preserved, or into the periphery, where current information is still present but is degraded. As previously, we manipulated the extent of preserved information in different directions by using a horizontally or vertically oriented window, and we explored whether the changes in saccade direction and amplitude remained. If the pattern of saccades with different window shapes is different—for example, if the vertical window no longer produces a higher frequency of vertical saccades—it would suggest that when some peripheral features are present, they are used by the saccadic system, perhaps in computations to maximize the information gained. Moreover, any differences between the experiments would demonstrate the importance of having something in the periphery, as opposed to nothing at all. One way that the extent of peripheral information might have an effect on eye movements is if scene type has an effect on the direction biases observed. We therefore also looked at saccade direction in both landscapes and interior scenes, for both experiments.