Introduction

The amount of information that our visual system can process at any given moment is limited. Therefore, the ability to allocate attention flexibly and selectively is essential to guarantee efficient processing of relevant stimuli. The Rapid Serial Visual Presentation (RSVP) has been used to investigate allocation of temporal visual selective attention. In this paradigm, a stream of stimuli is usually presented at a fixed central spatial location in rapid temporal succession (e.g., 10 items/second). Typically, participants are asked to monitor the stream for two targets presented among distractors and report at the end of the stream either their identity or—depending on the task—other targets' characteristics. The temporal distance (lag) between the first (T1) and the second target (T2) is manipulated. When T2 is presented within 200-500 ms from T1, accuracy report on T2 is impaired compared to when T2 immediately follows T1 (lag 1 sparing) or when it appears at later lags. This phenomenon, first reported by Broadbent and Broadbent (1987), is known as attentional blink (AB: Raymond, Shapiro & Arnell, 1992).

Different theoretical accounts have been put forward for these phenomena. Traditionally, lag 1 sparing occurs because when T1 and T2 appear in close temporal proximity, they share the same attentional episode. Therefore, T2 receives attentional enhancement along with T1, and it is encoded in working memory (Chun & Potter, 1995; Jolicoeur & Dell’Acqua, 1998). In contrast, the AB has been attributed to resource limitations due to stimulus encoding and consolidation in working memory. If T2 appears while T1 processing is still in progress, it cannot gain access to working memory and its representation remains vulnerable to decay and interference by trailing items, giving rise to the AB. However, this resource depletion account has been challenged by findings that the lag 1 sparing can extend to lag 2, or even to later lags, when three or more targets are presented in immediate sequence. For instance, Di Lollo et al. (2005) used a RSVP paradigm in which triplets of consecutive targets (i.e., T1-T2-T3) had to be identified. Whereas an AB-like effect, with lower accuracy on T3 than on T1, occurred when the three targets belonged to different categories (letters and symbols), a spread of sparing was observed when they belonged to the same category. This finding is difficult to reconcile with limited-resources accounts of the AB, because a greater deficit for T2 and T3 following T1 would be expected due to resource unavailability. Olivers et al. (2007) directly compared performance on two-, three-, and four-target RSVP. They found that when four-target letters were presented in immediate succession (i.e., T1-T2-T3-T4), report accuracy on T4 was superior to that on T2 presented at the same temporal position but separated by two distractors from T1 (i.e., T1-D-D-T2), although it was less accurate than performance on T3 and T2. Therefore, the lag 1 sparing can “spread” to lag 2 and even lag 3 when three or four targets are not separated by intervening distractors. In contrast, when a distractor item is inserted between T1 and the following targets (T1-D-T2-T3), or between T3 and the preceding targets (T1-T2-D-T3), an AB on T2 and/or T3 occurs.

Although these findings are difficult to reconcile with the resource depletion account, they can be explained by top-down attentional control mechanisms, which involve target enhancement and/or distractor inhibition, and affect both the sparing and the AB (Wyble, Bowman & Nieuwenstein, 2009; Olivers & Meeter, 2008; Di Lollo et al., 2005). Accordingly, temporal contiguity and attentional template matching between targets are considered to be crucial factors in determining whether a sparing or an AB is observed. Indeed, Visser and Ohan (2011) showed that both lag 1 and extended sparing are disrupted when the attentional set between targets changes, as in when targets appear at different spatial locations or belong to different stimulus categories (with no task switch), as well as when there is a task switch. More recently, Visser (2015) has showed that the spread of sparing is modulated by variations in the probability of appearance of three consecutive targets. He found that performance accuracy on T3 was higher when T3 was more likely to appear across the experiment (p = 0.67) than when it was more infrequent (p = 0.37). These findings suggest that when a target is encountered in the RSVP stream, top-down control can modulate the duration of the attentional window as a function of expectations about the number of subsequent targets. However, there also is evidence in line with a resource-depletion account of these effects. For example, Dux et al. (2009) presented 3-target RSVP streams and manipulated the relevance of a single target by asking participants of two different groups to report either all the three targets or only one target (T1 or T3). In the latter case, the target to be reported (T1 or T3) was task-relevant in 100% of the trials, whilst the other two targets were task relevant in 50% of the trials. Dux et al. (2009) found that in the T1-relevant group, performance on T1 was superior to that on T3, whereas the reverse pattern (T3 > T1) was observed in the T3-relevant group. These findings indicate that endogenous factors, such as knowledge about targets relevance, modulate the extended sparing on T3. Although, as the authors point out, they also could mean that endogenous control is only involved in directing the limited processing resources to one target at the expense of the other ones.

More clear evidence in line with limited-resource accounts has been provided by studies in which target’s exogenous salience has been manipulated within the RSVP streams. Dux et al. (2008) presented three consecutive target letters in RSVP streams containing white distractor digits. The three targets could appear in red or white color, and distractors appeared in white color. If the abrupt color onset makes a target more salient than the others, then more attentional resources should be allocated to a red T1 at the expense of white T2 and T3. Findings showed that accuracy for T3 was lower than for T1 when all three targets were red but not when targets and distractors were all white. These findings were taken as evidence that when multiple targets appear sequentially, the sparing is due to an equal distribution of limited processing resources: allocating more resources to one target (T1) leads to a deficit on subsequent targets (i.e., T3).

Other studies have provided evidence that the sparing is not immune to structural encoding limitations. For example, the higher proportion of order reversals in target reports during the sparing has been attributed to limits in episodic registration of sequential targets (Wyble et al., 2009; Wyble et al., 2011). Furthermore, when performance on T3 is analyzed conditional to correct reports of T1 and T2 (within-trial contingencies), performance is impaired on T3 compared with T1 and T2 (Dell’Acqua et al., 2009). Finally, there is a general decrement in reporting three consecutive targets compared with when targets are separated by seven distractors (Dell'Acqua et al., 2012), and the AB is larger for T3 when both T1 and T2 are identified compared with when only T1 or T2 is identified (Dux et al., 2014).

In summary, there is still considerable debate over the role of resource depletion and top-down attentional control in the genesis of the two typical phenomena of temporal selective attention: the spread of sparing and the AB. Moreover, it is not clear whether the prolonged sparing observed in the 3-target RSVP also occurs when more complex stimuli, different from single letters, are used. As reviewed, previous studies aimed at disentangling the contributions of resource depletion and active attentional enhancement/inhibition processes to the sparing and the AB, have manipulated exogenous or endogenous target salience. The rationale is that manipulating exogenous salience directs more attentional resources towards one target, impairing performance for the other targets due to consumption of limited shared resources (Dux et al., 2008). In contrast, manipulating endogenous salience highlights the flexibility of attentional allocation and how enhanced processing of one target does not necessarily impair processing of other targets (Visser, 2015). A novel and useful strategy to investigate further is how and to what extent salience affects the spread of sparing and the AB in 3-target RSVP is to manipulate targets’ emotional salience by using neutral and emotional words instead of letters and digits. Therefore, the present study represents the first attempt to generalize findings on the contribution of the mechanisms at play in the RSVP obtained with letters and digits to more complex stimuli, such as neutral and emotional words.

Empirical evidence from the 2-target RSVP shows that performance is better for emotional T1s than for neutral T1 and that emotional T1s induce a larger AB on T2 regardless of whether words (Schwabe et al., 2011; Mathewson et al., 2008; Ihssen & Keil, 2009), schematic faces (Maratos, 2011), or face pictures (Stein et al., 2009; Vermeulen et al., 2009) are used as T1. In contrast, when emotionally salient stimuli are used as T2 in 2-target RSVP (Anderson, 2005; Keil & Ihssen, 2004; Ihssen & Keil, 2009; Maratos et al., 2008; Todd et al., 2013), a reduction of the AB is observed. The evidence of better performance for emotional targets (T1 or T2) has been attributed to prioritization of attentional resources towards emotional stimuli, whereas that of a greater AB following emotional T1s has been interpreted as due to prolonged engagement of attentional resources on emotionally salient T1s, leading to longer encoding and consolidation in working memory at the expense of T2 (Mathewson et al., 2008; Ihssen & Keil, 2009). However, these effects also could be explained by mechanisms of attentional enhancement and inhibition engendered by emotional salience. In fact, there is evidence that stimulus emotional salience prioritizes attention through mechanisms whose neural underpinnings and time course are partially distinct from those of exogenous or endogenous attention (Pourtois et al., 2013). For instance, emotionally salient stimuli elicit an early activation of the amygdala, which via feedback signals to sensory cortices, prioritizes attention and enhances sensory processing (Pourtois et al., 2013). Because these effects can be additive to the effects of endogenous attention, stimulus emotional salience in the RVSP may lead to more efficient target processing without interfering, or even facilitating, processing of targets presented in immediate sequence and that share the same attentional episode (lag 1/spread of sparing). However, because past studies have mainly focused on the AB, little is known about the effects of emotional T1s on the lag 1 sparing. In addition, the effect of targets' emotional salience on both the spread of sparing and the AB has never been investigated with the 3-target RSVP. In the present study, a 3-target RSVP paradigm was used in which, in Experiment 1, T1 could be a neutral word or an emotionally negative word, whereas in Experiment 2, T3 could be either a neutral or negative word. The methodology was similar to that used in previous studies (Dux et al., 2008, 2014); T1 and T2 always appeared in immediate succession, whereas T3 appeared at variable lags, including lag 2 (spread of sparing), lag 3, and lag 4 (AB), as well as a later lag (lag 8). Compared with past studies using single letters, perceptual differences between targets and distractors were only used to distinguish targets (white words) from distractors (black words).

The two experiments allowed investigation of the allocation of temporal selective attention toward neutral and emotional stimuli presented in rapid temporal succession and attempted to disentangle the relative contribution of resource depletion and enhancement/inhibition mechanisms in the genesis of the sparing and the AB. As word processing is far more demanding than processing single letters or digits (Dux et al., 2014), it could be argued that increasing processing load would play in favor of resource depletion account. Although by the same token, one could argue that using simple letters and digits that do not load much on processing resources would favor the cognitive control account. This does not seem to be a useful dichotomy as resource depletion and cognitive control should represent general accounts of the mechanisms at play in temporal selective attention as assessed by the RSVP.

In summary, the present study by investigating the relative role of resource depletion, cognitive control, or both in affecting performance at the 3-targets RSVP when using neutral versus emotional words represents the first attempt to generalize the mechanisms playing a role with simple stimuli to more complex ones, namely neutral and emotional words. Importantly, increasing processing load by using words instead of single letters should be a general factor affecting performance overall not preventing from highlighting the relative contributions of resource depletion and emotional enhancement/inhibition mechanisms in the genesis of the sparing and the AB. The relevant comparison is between performance when all words are neutral versus performance when one of the words is emotional (not between performance with words and performance with letters and digits).

The two accounts provide differential predictions for the effects of emotional salience on the sparing and the AB. According to the resource depletion account (Dux et al., 2014), identifying both T1 and T2 should impair performance on T3 compared with when only T1 is reported. By the same token, if emotional T1s draw more attentional resources than neutral T1s (Ihssen & Keil, 2009; Mathewson et al., 2008), then performance for negative T1s should be better (i.e., they should be reported more often than neutral T1s), and this should be obtained at the expense of following targets, reducing the spread of sparing (i.e., lower performance on T3 at lag 2) and increasing the AB for the following targets (i.e., lower performance on T3 at lags 3 and 4) (Dux et al., 2008). In contrast, if resource depletion is not the only mechanism at play in the RSVP, then the attentional enhancement for negative targets (T1 in Experiment 1, T3 in Experiment 2) should not impair performance on the other targets or could even benefit T2 and T3 when they are presented in the same attentional episode, because they share perceptual features (color) and task (identification) with the negative T1 (i.e., enhanced spread of sparing). As the resource depletion account, the emotion-enhancement account also predicts a larger AB on T3 when T1 is negative, due to stronger inhibition of post-target items to protect processing of the salient target (Olivers & Meeter, 2008). In summary, limited-resources accounts predict that better performance for emotional targets is accompanied by impaired performance for the following targets regardless of lag, due to resource consumption. In contrast, accounts based on active enhancement/inhibition mechanisms predict that when the three targets appear in the same attentional episode (i.e., T1-T2-T3) enhanced processing and better performance for emotional targets does not affect performance—or even improves performance—for the following targets.

Experiment 1

Method

Participants

Twenty-seven psychology students (15 females, 12 males, age: M = 21.9 years, SD = 2.4) volunteered to take part in the study after giving written informed consent. They all had normal or corrected-to-normal vision, and were native Italian speakers. The sample size is in line with previous studies investigating the AB (Dux et al., 2014; Dell’Acqua et al., 2012) and its emotional modulations (Mathewson et al., 2008). The study was in compliance with institutional guidelines and had received approval by the Departmental Ethics Committee.

Materials and apparatus

Stimuli

The target set consisted of 144 words (120 low-arousal neutral, 24 high-arousal negative, length range 4-8 letters) selected from the Italian adaptation (Montefinese et al., 2014) of the Affective Norms for English Words (Bradley & Lang, 1999a). The 120 neutral words were divided into three subsets: a subset of 24 words used as T1 (Valence: M = 5.35, SD = 0.94; Arousal: M = 2.7, SD = 0.7; Length: M = 6.17, SD = 1.2; Lemma Frequency: M = 180.96, SD = 172.73), a subset of 48 words used as T2 (Valence: M = 5.34, SD = 0.94; Arousal: M = 2.59, SD = 0.73; Length: M = 6.19, SD = 1.42; Lemma Frequency: M = 187.67, SD = 215.59), and a subset of 48 words used as T3 (Valence: M = 5.25, SD = 0.76; Arousal: M = 2.66, SD = 0.7; Length: M = 6.1, SD = 1.26; Lemma Frequency: M = 197.42, SD = 376.57). The set of 24 negative target words (Valence: M = 2.22, SD = 0.54; Arousal: M = 6.74, SD = 0.59; Length: M = 6.17, SD = 1.03; Lemma Frequency: M = 194.88, SD = 261.76) significantly differed on valence and arousal from the neutral words used as T1 (valence: t(46) = 14.17, p < 0.001; arousal: t(46) = 21.57, p < 0.001), as well as from the neutral words used as T2 (valence: t(70) = 14.99, p < 0.001; arousal: t(70) = 24.11, p < 0.001), and from the neutral words used as T3 (valence: t(70) = 17.36, p < 0.001; arousal: t(70) = 24.56, p < 0.001). In contrast, the negative words used as T1 were matched for length and lemma frequency to the neutral words used as T1 (length: t(46) = 0.00, p = 1; frequency: t(46) = 0.22 p = 0.826), as well as to the neutral words used as T2, (length: t(70) = 0.08 p = 0.940; frequency: t(70) = 0.12 p = 0.901), and to the neutral words used as T3, (length: t(70) = 0.21 p = 0.833; frequency: t(70) = 0.03 p = 0.976).

Seventy additional words served as distractors. Their length ranged from 10 to 19 letters to guarantee efficient masking of target words (Anderson, 2005). In a separate validation study, 35 participants (16 males, 19 females, age: M = 22.3, SD = 1.7) rated each distractor word for valence and arousal. Word ratings were collected using a paper-and-pencil version of the Self Assessment Manikin (SAM: Bradley & Lang, 1994), originally used for the published sets, which includes 9-point, picture-oriented scales, ranging from the most negative (1) to the most positive (9) for valence, and from the least arousing (1) to the most arousing (9) for arousal. Item presentation order was randomized so that each participant received a different version of the task. Based on these data, the selected distractors consisted of 48 neutral, low-arousal words (Valence: M = 5.18, SD = 0.64; Arousal: M = 2.72, SD = 0.42; Length: M = 12.04, SD = 1.8), which did not differ from neutral T1, T2, and T3 on valence, t(70) = 0.89 p = 0.384, t(94) = 0.95 p = 0.345, t(94) = 0.5 p = 0.618, or arousal, t(70) = 0.19 p = 0.853, t(94) = 1.03 p = 0.304, t(94) = 0.51 p = 0.613. In contrast, distractors differed on valence, t(70) = 19.39, p < 0.001, arousal, t(70) = 33.27, p < 0.001, and on length from negative target words, t(70) = 15.27, p < 0.001. Finally, distractor words were longer than neutral, t(70) = 14.45, p < 0.001, t(94) = 19.04, p < 0.001, t(94) = 18.74, p < 0.001, and negative target words, t(70) = 14.85, p < 0.001. The selected words also were controlled for semantic associations using association norms for Italian words (Peressotti et al., 2002). When a word was missing from the Italian association norms, the English translation was used and associations controlled through the University of South Florida Free Association Norms (Nelson et al., 2004) and the Edinburgh Associative Thesaurus (Kiss et al., 1973). Semantically associated words were never presented within the same stream.Footnote 1

Each RSVP stream consisted of 16 items, including 3 target words and 13 distractor words. Words were presented in uppercase 24-point Courier New font on a grey background. Targets appeared in white, whereas distractors appeared in black. Stimuli were presented at a viewing distance of approximately 50 cm from a 19-inch LCD monitor (resolution 1920 x 1080, refresh rate 60 Hz). Stimulus presentation and data recording were controlled using E-Prime 2.0 software (Schneider et al., 2002).

Procedure

Each trial was initiated by the participant by pressing the ENTER key. A central fixation point then appeared for 1,000 ms, followed by the stream of words (Fig. 1). Each item of the stream was presented for 117 ms with no interstimulus interval. Serial position of T1 in the stream ranged from 5 to 7. T2 immediately followed T1 (lag 1, 117 ms), whereas T3 could either appear immediately after T2 (lag 2, 234 ms, "spreading of the sparing" condition) or after one (lag 3, 351 ms), two (lag 4, 468 ms), or six (lag 8, 936 ms) intervening distractors. The participants’ task was to monitor the stream for words presented in white among black distractors and report the identity of the three white-ink targets at the end of the stream by typing each target word. Participants were prompted by the appearance on screen of a dedicated response window. They were instructed that response speed and order of target presentation were not relevant and were encouraged to guess if uncertain.

Fig. 1
figure 1

Trial sequence used in Experiment 1. A negative T1 word is followed by a neutral T2 at lag 1 and a neutral T3 at lag 3. The words presented in the example are (top-down order): allocation, speaker, hieroglyphic, foundation, death, mail, transhumance, butter, calligraphy, and allotment.

Participants completed 384 trials, consisting of 48 trials for each factorial combination of T1 valence (2) and T1-T3 lag (4), subdivided into 8 blocks of 48 trials. In each block, targets (negative T1, neutral T1, T2, T3) were selected randomly without replacement from each subset; thus, each target appeared eight times across the experiment. T1 serial position, T1 valence, and T1-T3 lag (from now on referred to as Lag) were fully balanced within each block.

Experimental design and data analysis

The experimental design is a 2 by 2 by 4, with T1 Valence (Negative, Neutral), T2 Report (2: Identified, Missed) and Lag (Lag 2, Lag 3, Lag 4, Lag 8) as within-subjects factors. Performance accuracy for T1 was computed as the absolute percentage of correct T1 (p T1) identifications. The performance accuracy for T2 was computed as the absolute percentage of correct T2 identifications (i.e., regardless of reporting other targets; p T2), as well as conditional on correctly reporting T1 (p T2|T1). Similarly, performance accuracy for T3 was computed as the absolute percentage of correct T3 reports (p T3), as well as conditional on correctly reporting both T1 and T2 (p T3|T1&T2), and conditional on reporting T1 only (i.e., when T2 was missed) (p T3|T1). In all cases, responses with typing errors that were clearly attributable to the correct word (e.g., "omberllo" for ombrello) were accepted as correct (Huang, Baddeley & Young, 2008). The data of three participants were excluded from data analyses due to low accuracy (<50%) on all targets.

p T1, p T2, p T3, and p T2|T1 were analyzed using repeated-measures analyses of variance (ANOVA) with T1 Valence (2: Negative, Neutral) and Lag (4: lag 2, lag 3, lag 4, lag 8). In addition, p T3|T1&T2 and p T3|T1 were analyzed using repeated-measures ANOVA with T1 Valence (2: Negative, Neutral), T2 Report (2: Identified, Missed) and Lag (4: lag 2, lag 3, lag 4, lag 8) as within-subject factors. Pairwise comparisons were all Bonferroni-corrected.

Results

T1 accuracy (p T1)

Overall performance accuracy on T1 was 87.4% (SE = 1.9). ANOVA results showed a significant main effect of T1 Valence, F(1, 23) = 4.82, p = 0.038, partial η2 = 0.137, with greater accuracy for negative T1 (M = 88.3, SE = 2.0) than for neutral T1 (M = 86.5, SE = 1.9). The main effect of Lag, F (3, 69) = 0.4, p = 0.751, and the T1 Valence x Lag interaction, F (3, 69) = 1.65, p = 0.186, were not statistically significant (Table 1).

Table 1 Mean percentages (and standard errors) of correct T1, T2, and T3 identification reports as a function of T1 valence and/or lag

T2 accuracy (p T2 and pT2|T1)

Overall performance on T2 reports regardless of correctly reporting other targets (p T2) was 60.0% (SE = 5.0). The main effects of T1 Valence, F(1,23) = 3.22, p = 0.086, and Lag, F(1, 23) = 1.21, p = 0.314, as well as the Valence by Lag interaction, F(1, 23) = 2.05, p = 0.114 were not statistically significant. Overall performance accuracy for T2 reports contingent on correctly reporting T1 (p T2|T1) was 59.6% (SE = 5.2). The main effects of T1 Valence, F(1, 23) = 3.24, p = 0.085 and Lag, F (3, 69) = 1.28, p = .287, as well as the two-way interaction, F(3, 69) = 1.96, p = 0.128 were not statistically significant. These findings indicate that the better performance for negative T1s is not achieved at the expense of T2.

T3 accuracy (p T3)

The main effect of T1 Valence was not significant, F(1, 23) = 0.65, p = 0.427, whereas the main effect of Lag was, F(3, 69) = 78.8, p < 0.001, partial η2 = 0.774 (Table 1), and it was qualified by a significant T1 Valence by Lag interaction, F(3, 69) = 5.22, p = 0.003, partial η2 = 0.185. The results of the pairwise and post-hoc analyses for these effects mirror those obtained for p T3|T1&T2. Therefore, for the sake of brevity and readability, these comparisons are reported in Appendix 1.

T3 accuracy contingent on correctly reporting T1 and T2 (p T3|T1&T2; p T3|T1)

ANOVA results showed that the main effects of T1 Valence, F(1, 23) = 1.56, p = 0.224, and T2 reports, F(1, 23) = 2.5, p = 0.125, did not reach statistical significance. The main effect of Lag was significant, F(3, 69) = 78.8, p < 0.001, partial η2 = 0.774. Pairwise comparisons showed that p T3|T1&T2 was worse at lag 3 (M = 20.5, SE = 2.3) and lag 4 (M = 27.2, SE = 2.9) compared with lag 2 (M = 48.5, SE = 4.2) and lag 8 (M = 61.5, SE = 4.0; all ps < 0.001). Performance at lag 3 was worse than that at lag 4, p = 0.021, and performance at lag 2 was worse than that at lag 8, p = 0.031. This pattern is consistent with the typical performance deficit observed in the AB window (lag 3 and lag 4). Moreover, because sparing has been defined as 5% performance superiority at lag 1 or lag 2 (depending on whether 2 or 3 targets are presented consecutively), compared with the lag in the AB window, in which performance is least accurate (Visser, Bischof & Di Lollo, 1999), the findings show an extended sparing at lag 2. Importantly, this effect was qualified by a significant T1 Valence by Lag interaction, F(3, 69) = 9.78, p < 0.001, partial η2 = 0.298 (Fig. 2).

Fig. 2
figure 2

Experiment 1: mean percentages of correct T3|T1&T2 as a function of T1 valence and lag.

To analyze this interaction, the temporal patterns after neutral and negative T1s were analyzed separately using one-way ANOVAs with Lag as a within-subjects factor. For p T3|T1&T2 after negative T1s, the effect of Lag was significant, F(3, 69) = 59.7, p < 0.001, partial η2 = 0.722. Bonferroni-corrected pairwise comparisons showed that performance at lag 2 (M = 51.9, SE = 4.4) was better than at lags 3 (M = 17.2, SE = 2.5) and 4 (M = 24.4, SE = 2.6; all ps < 0.001), but it was not different from lag 8 (M = 61.3, SE = 4.0; p = 0.294). Performance between lags in the AB window also was different, as it was significantly worse at lag 3 than at lag 4, p = 0.029. Post-hoc analyses revealed the same pattern for p T3|T1&T2 after neutral T1s, with a significant main effect of Lag, F(3, 69) = 44.0, p < 0.001, partial η2 = 0.657. Again, pairwise comparisons with Bonferroni corrections showed that performance was better at lag 2 (M = 45.0, SE = 4.2) compared with the AB window (lag 3: M = 23.8, SE = 2.6, p < 0.001; lag 4: M = 30.0, SE = 3.4, p = 0.010), but it was worse than performance at lag 8 (M = 61.8, SE = 4.2; p = 0.005). The difference between lags 3 and 4 was not significant, p = 0.221 (Fig. 3).

Fig. 3
figure 3

Experiment 1: mean percentages of correct T3|T1 (T2 missed) as a function of T1 valence and lag.

To assess whether T1 emotional salience differentially modulated the sparing and the AB compared with T1 neutral, t tests compared p T3|T1&T2 between lags 2, 3, and 4 for the two types of T1. The tests revealed that at lag 2 performance accuracy for T3 was greater when it was preceded by a negative T1 compared with when T1 was neutral, t(23) = 3.7, p = 0.001, indicating that negative T1s led to a more pronounced spread of sparing. In contrast, performance accuracy for T3 was lower at lag 3 when it was preceded by a negative T1 compared with when T1 was neutral: t(23) = 3.0, p = 0.006. The same pattern was present at lag 4, t(23) = 2.7, p = 0.013. Therefore, the temporal pattern of performance on T3 preceded by a negative T1 shows enhanced spread of sparing, followed by a larger AB compared to when T1 is neutral.

The T1 Valence by T2 report interaction was not significant, F(1, 23) = 0.68, p = 0.418, whereas the T2 report by Lag was significant, F(3, 69) = 4.19, p = 0.009, partial η2 = 0.154. As for the T1 Valence by Lag interaction, the effect of Lag was analyzed separately depending on whether T2 was correctly reported. For the T3|T1&T2, the effect of Lag was significant, F(3, 69) = 78.8, p < 0.001, partial η2 = 0.774. Pairwise comparisons showed that performance at lag 2 (M = 44.9, SE = 5.9) was better than at lags 3 (M = 14.1, SE = 2.3) and 4 (M = 23.2, SE = 3.3), all ps < 0.001, but it was worse than at lag 8 (M = 63.2, SE = 4.5), p = 0.001. Performance was also worse at lag 3 compared with lag 4, p = 0.003. Similarly, for the T3|T1 condition the effect of Lag was significant, F(3, 69) = 22.24, p < 0.001, partial η2 = 0.492. Again, performance was better at lags 2 (M = 52.0, SE = 4.3) and 8 (M = 44.9, SE = 5.9) than at lags 3 (M = 26.9, SE = 3.5) and 4 (M = 31.2, SE = 4.1) (lag 2 vs. lag 3: p < 0.001; lag 2 vs. lag 4: p = .003; lag 8 vs. lag 3/lag4: p < 0.001). Performance did not differ between lags 2 and 8, p = 0.999, or between lags 2 and 3, p = 0.750. Between-conditions comparisons (T3|T1&T2 vs. T3|T1) for each lag further assessed the cost of reporting T2 on performance for T3. Results showed that performance did not differ depending on whether T2 was reported at lags 2 (t(23) = 1.15, p = 0.260) and 4 (t(23) = 1.69, p = 0.104). In contrast, performance was significantly worse at lag 3 when all the three targets were correctly identified (i.e., p T3|T1&T2), t(23) = 3.41, p = 0.002. Therefore, the cost of reporting T2 is most evident in the AB window, whereas the number of pre-T3 identified targets does not seem to affect the spread of sparing. Finally, the 3-way interaction was not significant, F(3, 69) = 0.32, p = 0.809. The findings from Experiment 1 show that the cost of reporting three rather than two targets is reflected in a general decrement in performance, whereas the modulation of the temporal pattern by emotional salience is independent of the number of targets reported. This suggests that emotional influences on temporal selective attention cannot be explained by limitations in attentional resources alone.

Discussion

The results of Experiment 1 show that, when three neutral target-words are presented in rapid succession in a RSVP stream, performance on T3 is characterized by the two typical phenomena of selective temporal attention: a performance sparing at the earlier lag (lag 2 spread of sparing), when the three targets are presented in succession, followed by a performance impairment (the AB) when the three targets are presented at lags 3 and 4. It should be noted that this is the first report of the spread of sparing at lag 2 using neutral words that load more on resources than single letters or digits. In addition, this pattern is modulated by T1 emotional salience and it is much accentuated when T3 followed a negative T1. The AB pattern is very similar to that observed with the typical 2-targets RSVP, with the only difference that in the 3-targets RSVP, the sparing spreads to lag 2 and the AB is shifted forward by one lag.

It is also important to note that the proportion of T3 identifications at lag 2 qualifies as “spread of sparing,” because it is higher compared with the AB window, but there is a progressive drop in performance from T1 to T3. Although this is in line with studies that analyzed T3 report contingent on reporting previous targets (Dux et al., 2014), the performance drop from T1 to T3 is difficult to reconcile with evidence showing comparable performance for the three consecutive targets (Olivers et al., 2007). However, this might simply be due to using word stimuli, which are more complex and costly to process compared with the numbers and digits used in previous studies. Moreover, the finding that correctly identifying both T1 and T2 enhanced the AB also suggests a role of resource depletion in the genesis of the AB. In contrast, that the spread of sparing was not affected by the number of processed targets, and that it was greater when T1 was negative indicates that the better performance for negative T1s is not due to more attentional resources being allocated on emotionally salient targets. If this were the case, performance on targets following negative T1s should have been worse than that observed after neutral T1s. Rather this pattern suggests that resource depletion is not the only mechanism at play in the modulation of temporal selective attention by emotional salience. The greater sparing for T3s following negative T1s suggests that emotion-induced attentional enhancement may have benefited the following targets.

Experiment 2, in which neutral words were used as T1 and T2 and negative target words as T3, helped to clarify to what extent the observed pattern reflects depletion of limited resources by T1, attentional enhancement of emotional stimuli, and/or inhibition of neutral distractors. According to the resource depletion account, if better performance for emotionally salient T3 is due to more attentional resources allocated to negative targets, then performance for T2 and T1 should be worse when the three targets share the same attentional episode (lag 2). In contrast, if better performance for negative T3 is due to a mechanism of attentional enhancement, then performance for T2 and T1 presented within the same attentional episode (lag 2) should be preserved.

Experiment 2

Method

Participants

Thirty psychology students (13 males, 17 females, age M = 22.9 years, SD = 3.9) took part in the study in partial fulfillment of course credits. They all had normal or corrected-to-normal vision, and provided written informed consent prior to participation.

Materials, apparatus, and procedure

Stimuli and experimental procedure were identical to Experiment 1; the only exception was that the negative and neutral words presented as T1 served as T3, and the neutral words presented as T3 in Experiment 1 were now used as T1.

Design and data analysis

The experimental design and data analysis were as in Experiment 1. The data of four participants were excluded from the analysis because of low accuracy (<50%) on all targets.

Results

T1 accuracy p(T1)

Overall T1 accuracy was 80.7% (SE = 2.7). ANOVA results showed that the main effects of T3 Valence, F(1, 25) = 0.07, p = 0.789, and Lag, F(3, 75) = 1.86, p = 0.144, were not significant, whereas the two-way interaction failed short of reaching full statistical significance, F(3, 75) = 2.49, p = 0.066 (Table 2).

Table 2 Mean percentages (and standard errors) of correct T1, T2, and T3 identifications as a function of T1 valence and T1-T3 lag

T2 accuracy (p T2; p T2|T1)

Overall T2 accuracy regardless of correctly reporting the other target was 55.1% (SE = 5.4). ANOVA results showed that the main effect of T1 Valence was not significant, F(1, 25) = 1.42, p = 0.245, whereas the effect of Lag was significant, F(3, 75) = 3.5, p = 0.020, partial η2 = 0.123. Pairwise comparisons showed a non-significant trend towards better performance at lag 2 (M = 58.5, SE = 4.9) compared with lag 3 (M = 53.3, SE = 5.7), p = 0.100. No other between lags comparisons (lag 4: M = 54.4, SE = 5.6; lag 8: M = 54.2, SE = 5.8) reached statistical significance, all ps > 0.400. The T1 Valence by Lag interaction was not significant, F(1, 25) = 0.83, p = 0.484.

T2 accuracy contingent on correctly reporting T1 (p T2|T1) was 51.5% (SE = 5.7). ANOVA results revealed that the main effect of T3 Valence, F(1, 25) = 1.62, p = 0.214, the main effect of Lag, F(3, 75) = 2.08, p = 0.110, and the T3 Valence by Lag interaction, F(3, 75) = 2.22, p = 0.093, were not statistically significant.

T3 accuracy (p T3)

ANOVA results showed that the main effect of T1 Valence was significant, F(1, 25) = 24.99, p < 0.001, partial η2 = 0.500, due to more accurate performance on negative (M = 35.03, SE = 3.0) compared with neutral T3s (M = 29.8, SE = 2.6). The main effect of Lag was also significant, F(3, 75) = 46.13, p < 0.001, partial η2 = 0.649 (Table 2). As in the previous experiment, the temporal pattern showed greater accuracy at lag 2 (M = 40.1, SE = 4.0) than at lags 3 (M = 15.1, SE = 2.2), p < 0.001, and 4 (M = 22.3, SE = 2.9), p = 0.001. Accuracy was also greater at lag 3 than at lag 4, p = 0.001, whereas it did not differ between lags 2 and 8 (M = 52.3, SE = 4.4), p = 0.088. The T1 Valence by Lag interaction was not significant, F(3, 75) = 1.5, p = 0.220. Overall, these findings indicate that negative T3s were more likely to be identified regardless of temporal separation from the previous targets. The following analyses will assess whether this influence of emotional salience on T3 report is modulated by the number of pre-T3 targets reported.

T3 accuracy contingent on correctly reporting T1 and T2 (p T3|T1&T2; p T3|T1)

ANOVA results revealed a significant main effect of T3 Valence, F(1, 25) = 26.48, p < 0.001, partial η2 = 0.514. Performance was better for negative T3s (M = 32.7, SE = 2.8) than for neutral T3s (M = 26.9, SE = 2.5). The main effect of Lag was also significant, F(3, 75) = 44.77, p < 0.001, partial η2 = 0.642. Pairwise comparisons showed that performance accuracy for T3 was greater at lag 2 (M = 37.0, SE = 4.2) than at lag 3 (M = 13.5, SE = 2.3) and at lag 4 (M = 18.4, SE = 2.3), all ps < 0.001, but it was lower than at lag 8 (M = 50.4, SE = 4.3), p = 0.048. The difference between performance accuracy at lag 3 and lag 4 was not statistically significant, p = 0.129. Therefore, as in Experiment 1, when all the three targets are correctly reported, a spread of sparing for T3 is observed, as performance at lag 2 is at least 5% more accurate than performance at lag 3 (i.e., the lag in which performance impairment is greater). The T3 Valence by Lag interaction did not reach statistical significance, F(3, 75) = 0.48, p = 0.695 (Fig. 4), indicating that a similar temporal pattern for both types of T3s and that the better performance for negative T3s was not achieved at the expenses of T1 and T2 when they shared the same attentional episodes as a resource depletion account would suggest (Fig. 5).

Fig. 4
figure 4

Experiment 2: mean percentages of correct T3|T1&T2 as a function of T1 valence and lag.

Fig. 5
figure 5

Experiment 2: mean percentages of correct T3|T1 (T2 missed) as a function of T1 valence and lag.

The main effect of T2 report was significant, F(1, 25) = 24.63, p < 0.001, partial η2 = 0.496. T3 accuracy was lower when T2 was reported (M = 22.4, SE = 3.4) compared with when it was missed (M = 37.2, SE = 2.5). This effect was qualified by a significant interaction with Lag, F(1, 25) = 3.75, p = 0.014, partial η2 = 0.121. As for Experiment 1, the effect of Lag was analyzed separately depending on whether T2 was reported (T3|T1&T2) or not (T3|T1). ANOVA results for T3|T2&T1 yielded a significant effect of Lag, F(1, 25) = 49.58, p < 0.001, partial η2 = 0.665. Performance accuracy at lag 2 (M = 26.2, SE = 5.6) was greater than that at lags 3 (M = 6.1, SE = 1.8) and 4 (M = 11.0, SE = 2.9) but worse than that at lag 8 (M = 46.4, SE = 4.8), all ps = 0.001. The difference between lags 3 and 4 failed to reach full statistical significance, p = 0.067. Similarly, for the T3|T1, the effect of Lag was significant, F(3, 75) = 25.51, p < 0.001, partial η2 = 0.505. Pairwise comparisons revealed a typical AB pattern with better performance at lag 2 (M = 47.8, SE = 5.6) than at lag 3 (M = 20.9, SE = 3.6), p < 0.001, and at lag 4 (M = 25.9, SE = 3.2), p = 0.001, whereas it did not differ from lag 8 (M = 54.3, SE = 4.5), p < 0.999. Therefore, the cost of correctly reporting T2 is reflected in an overall reduction of T3 accuracy spanning both the sparing and the AB. Indeed, between-lags contrasts showed poorer performance at lag 2, t(25) = 4.76, p < 0.001, lag 3, t(25) = 4.61, p < 0.001, and lag 4, t(25) = 3.79, p = 0.001, when T2 was correctly reported compared with when it was not. Finally, the T1 Valence by T2 Report, F(1, 25) = 0.32, p = 0.859, and the T1 Valence by T2 Report by Lag interactions, F(1, 25) = 0.52, p = 0.667 were not statistically significant.

In summary, the performance for Experiment 2 was better when T3 was a negative word. This superiority was present at all temporal lags and was independent of the number of identified targets in a single attentional episode. In addition, the pattern of temporal attention was similar regardless of whether T3 was a negative or a neutral word. Finally, performance accuracy for T3 was greater when only one preceding target was reported than when both T1 and T2 were correctly reported. This occurred for both negative and neutral T3, indicating that there is a cost for processing multiple targets when they occur in a single as well as when they occur in two distinct attentional episodes of the RSVP. This cost does not vary across temporal positions, and it does not depend on T3 emotional salience. The implications of these findings are discussed next.

General discussion

Research on the allocation of temporal selective attention has used the RSVP with three targets to disentangle the contribution of resource depletion and cognitive control mechanisms in the genesis of the sparing and AB. It has been shown that when three targets are presented in sequential succession in the same attentional episode (i.e., T1-T2-T3), performance is spared and processing of all targets is equally accurate (spreading of the sparing: Olivers et al., 2007). In contrast, when the three targets are separated by distractors (i.e., T1-D-T2-T3 / T1-T2-D-T3), performance is impaired (AB) for targets following T1. Interestingly, there is still debate over the underlying mechanisms engendering these phenomena; some theorists see the AB as a result of depletion of attentional resources due to T1 encoding in working memory (Chun & Potter, 1995; Jolicœur & Dell’Acqua, 1998). Others attribute the AB to cognitive-control mechanisms involved in targets’ attentional enhancement and/or distractors’ inhibition (Olivers & Meeter, 2008; Di Lollo et al., 2005). Yet, others attribute the AB to a combination of capacity-limited and top-down processes involved in T1 processing (Wyble et al., 2009).

To date, it is still unclear to what extent these accounts also apply to emotionally salient targets, because no study has investigated the effects of emotional salience on the lag 1 sparing and the AB. Evidence shows that emotional T1s increase the AB for neutral T2s (Ihssen & Keil, 2009; Schwabe et al., 2011), whereas performance for emotional T2s is better than for neutral targets, reducing the AB (Anderson, 2005; Todd et al., 2013). These effects have been attributed to prioritization of attentional resources by emotional stimuli (Ihssen & Keil, 2009; Mathewson et al., 2008).

In the present study, the emotional modulations of the spread of sparing and the AB were investigated using a 3-target RSVP task. In two experiments, participants monitored RSVP streams of words to identify three white target-words among black distractor-words. T1 and T2 always appeared without intervening distractors (T2 at lag 1), whereas T3 could be presented at lag 2 (“spread of sparing”), at lags 3 and 4 (AB window), or at lag 8. In Experiment 1, the effect of T1 emotional salience on performance for subsequent targets was investigated. T1 could be either a neutral or a negative word, whereas T2 and T3 were always neutral. In contrast, in Experiment 2, T1 and T2 were neutral words, whereas T3 could be either neutral or negative. Findings from Experiment 1 showed the typical pattern observed when using simple letters in the RSVP, with spared performance at lag 2 (i.e., when the three targets appeared sequentially), an AB at lags 3 and 4, followed by a recovery at lag 8. This pattern was observed both when all the three targets were correctly reported, as well as when T2 was missed (i.e., only T1 and T3 were identified), although there was a cost for reporting three rather than two targets as overall performance was better when T2 was missed. That performance for T3 varied, depending on how many preceding targets were correctly reported within a stream and that the AB for T3 was greater when T1 and T2 were both identified is in line with previous findings (Dell'Acqua et al., 2009; Dux et al., 2014) and is indicative of a resources depletion contribution to the AB.

Taken together, these findings are consistent with the view that the AB for T2 is engendered by resource depletion due to encoding T1 (Chun & Potter, 1995; Jolicœur & Dell’Acqua, 1998). This interpretation is in line with the eSTST model (Wyble et al., 2009; Dux et al., 2014), according to which there are structural limitations in the amount of information that can be efficiently encoded within a single attentional episode due to interference between target representations that compete for access to working memory. Consequently, the number of items to be encoded in one attentional episode leads to a greater deficit for temporally close but distinct attentional episodes. Accordingly, recent psychophysiological evidence shows that lag 1 sparing is associated to enhanced frontal (P3a) and parietal (P3b) activation, indexing attentional selection and encoding in visual working memory of consecutive targets, but the time course of the parietal response is longer for two consecutive targets than for a single target (Dell’Acqua et al., 2016).

Importantly, the temporal pattern of performance observed in Experiment 1 was modulated by targets' emotional salience. In line with typical findings from 2-target RSVP (Ihssen & Keil, 2009; Mathewson et al., 2008; Schwabe et al., 2011), Experiment 1 shows that when T1 is a negative word, the AB is more pronounced for T3 than when T1 is a neutral word. A two-stage account for this finding has called upon the prioritization of attentional resources by negative targets during working memory encoding (Ihssen & Keil, 2009; Mathewson et al., 2008). However, resource depletion can neither explain the enhanced spread of sparing for T3s preceded by negative T1s, nor that this effect was independent of the number of targets reported. That is, if limited processing resources are prioritized to greater extent by negative T1s, T2, and T3 processing should suffer when the three targets appear in the same attentional episode. Therefore, when T1 is negative, performance impairment on T2 and T3 should be greater compared with when T1 is neutral. Whereas previous studies showed that the processing of a neutral target word is hampered when it is immediately preceded by an emotional word (Bocanegra & Zeelenberg, 2009; Mathewson et al., 2008; Huang et al., 2008; Arnell et al., 2007), in these studies emotional words were distractors. Hence, the observed deficit on T1 processing could be due to attentional capture by task-irrelevant yet salient emotional information at the expense of task-relevant information that appears close in time (Stein, Zwickel, Kitzmantel, Ritter & Schneider, 2010; Huang et al., 2008). In contrast, in Experiment 1, a task-relevant item with emotional salience was presented as T1, which led to higher accuracy on negative T1s compared with neutral T1s, indicating that emotional salience was implicitly processed even though it was not specified in the attentional set. Therefore, although a larger AB for T3 preceded by a negative T1 can still be explained by a resource depletion account, the greater sparing at lag 2 when T3 followed a negative T1 is better explained by an interplay of top-down and emotional enhancement of T1 processing, which yields a benefit to T3 processing due to the stronger attentional boost triggered by the negative T1. In contrast, the greater AB generated by negative T1s indicates a stronger top-down inhibition for stimuli appearing in close temporal proximity with the emotional target, aimed at protecting the processing of salient information from interference (Olivers & Meeter, 2008).

With Experiment 2, we provided additional evidence that the attentional enhancement of an emotionally salient target is not achieved at the expense of other targets that share the same attentional episode. In fact, not only negative T3s were correctly reported more often than neutral T1s when presented in the AB window—a finding typically observed in studies using the 2-targets RSVP (Todd et al., 2013; Anderson, 2005). However, because this also happened when T3 appeared within the spread of sparing (i.e., when it immediately followed T1 and T2) indicates that this advantage was not obtained at the expense of immediately preceding targets (Dux et al., 2008). Importantly, more accurate report of negative T3s across lags occurred regardless of the number of pre-T3 targets correctly identified, indicating that the amount of encoded information did not prevent the attentional enhancement of emotional targets.

In summary, the present findings indicate that emotionally salient targets prioritize attentional resources, leading to better performance, and they affect temporal selective attention toward subsequent (Exp. 1) as well as toward preceding (Exp. 2) targets that share the same attentional episode: negative T1s enhanced the spread of sparing at lag 2 and increased the AB at successive lags. Furthermore, negative T3s were more accurately reported than neutral T3s regardless of their temporal position from T1 and T2, and regardless of whether T1 and T2 had been correctly identified.

One could argue that emotional and endogenous attention contributed in an additive way, with beneficial effects on performance for negative targets (in Experiment 1 and 2) and for targets sharing the same attentional episode (in Experiment 1). However, when a negative T1 was followed by a neutral T3 in the AB temporal window, the attentional enhancement for negative T1 (Mathewson et al., 2008), and/or a stronger top-down inhibition of post-T1 stimuli (Olivers & Meeter, 2008), conflicted with allocating temporal attention towards a new target, leading to the observed increase of the AB for neutral T3s following a negative T1. The beneficial effects of emotional salience on the spread of sparing (lag 2), and the detrimental effects on the AB (lags 3-4) were both modulated by the number (1 or 2) of pre-T3 targets reported, suggesting that the effect of emotional salience on temporal attention is not independent from depletion of attentional resources (Dux et al., 2014).

In conclusion, the present findings highlight the complex interplay between cognitive control, emotion-enhancement, and capacity limitations on temporal selective attention and help disentangling the relative contributions of these different mechanisms to the spread of sparing and the AB. A final note of caution relates to the fact that only negative target words were used in the present research. Although evidence from 2-target RSVP indicates that high-arousal positive and negative T1s yield comparable AB deficits on neutral T2s (Ihssen & Keil, 2009; Schwabe et al. 2011), valence-specific modulation on the AB (de Jong et al., 2010), and on performance for positive and negative T2s (Srivastava & Sreenivasan, 2010) also have been reported. Therefore, an interesting extension to this work would be to investigate how positive targets modulate temporal selective attention in the 3-targets RSVP.