Recognition memory involves the judgment of prior occurrence of a learned stimulus. Recognition can be tested in a number of different ways; in the case of item recognition, a list of single items is studied and memory is tested by asking the participant to discriminate between studied and unstudied items. In associative recognition, pairs of items (such as A–B, C–D, etc.) are studied and memory is tested by having the participant discriminate between studied pairs (e.g., A–B; referred to as intact pairs) and pairs that are composed of studied words in a new arrangement (such as A–D; referred to as rearranged pairs).

A watershed moment in recognition memory theorization came with the advent of the global matching models of recognition memory (Clark & Gronlund, 1996; Humphreys, Pike, Bain, & Tehan, 1989b), which include the theory of distributed associative memory (TODAM; Murdock, 1982), the search of associative memory model (SAM; Gillund & Shiffrin, 1984), the Minerva 2 model (Hintzman, 1988), and the matrix model (Humphreys, Bain, & Pike, 1989a; Pike, 1984). In these models, retrieval is specified as a global match between the retrieval cues and the contents of memory, producing a summed “familiarity” value that is the basis of a recognition memory decision. What was considered a success of these models at the time was the ability to account for the list length effect in recognition memory performance, in which performance is reduced as the length of a study list is increased (Strong, 1912). Global matching models predict a list length effect because the matching strength between the probe cue and each stored item representation has nonzero variance, due to spurious matches between the features of the cue and the stored item. When the matching strengths are summed together, the mean and variance of the resulting familiarity distribution are the sums of the means and variances of the matching strengths between the probe cue and the stored item representations. Thus, as more items are added to the contents of memory, the familiarity distributions have more variance components, increasing the overlap between the distributions for targets and distracters and decreasing discriminability as a consequence (Clark & Gronlund, 1996). Since all of the interference in these models comes from the studied items in the list episode (as opposed to preexisting memories), these models have been described as item noise models.

The “first wave” of global matching models described by Clark and Gronlund (1996) were abandoned, due to the finding of the null list strength effect in item recognition. A counterintuitive prediction of the global matching models was that increasing the strength of the studied items, either by additional study time or repetition, would harm the performance of nonstudied items, due to the additional variance contribution to retrieval from the strengthened items (see Shiffrin, Ratcliff, & Clark, 1990, for formal descriptions of why each of the models makes this prediction). One way to think about why the models made this prediction is to consider a case in which a repeated item is added to the contents of memory. The repetition should contribute additional variance to the retrieval process in the same manner as increasing the length of the study list and performance should decrease for the nonstrengthened items. Ratcliff, Clark, and Shiffrin (1990) tested the list strength prediction of the global matching models and found that it was incorrect: Weak items were not harmed by the strengthening of other list items, and strong items did not benefit from the accompaniment of weak items on the study list.

The null list strength effect in item recognition has been replicated several times since its initial discovery by Ratcliff et al. (1990; Hirshman, 1995; Kahana, Rizzuto, & Schneider, 2005; Murnane & Shiffrin, 1991a, 1991b; Ratcliff, McKoon, & Tindall, 1994; Ratcliff, Sheu, & Gronlund, 1992; Shiffrin, Huber, & Marinelli, 1995; Yonelinas, Hockley, & Murdock, 1992). Computational models have since been developed that can properly predict a null list strength effect in item recognition. These include the retrieving-effectively-from-memory model (REM; Shiffrin & Steyvers, 1997), the subjective likelihood in memory model (SLiM; McClelland & Chappell, 1998), and the bind–cue–decide model of episodic memory (BCDMEM; Dennis & Humphreys, 2001).

Despite the generality of the null list strength effect in item recognition, little work has been done to establish whether the effect applies to associative recognition. Congruence among the two tasks is not guaranteed: Positive list strength effects have been found in free recall (Malmberg & Shiffrin, 2005; Ratcliff et al., 1990; Tulving & Hastie, 1972) and cued recall (Kahana et al., 2005; Ratcliff et al., 1990), although it should be mentioned that in the studies by Ratcliff et al. (1990), the cued-recall list strength effect was smaller and less robust than the one found in free recall. Additionally, positive list strength effects have been found in the plurality discrimination task (Buratto & Lamberts, 2008; Norman, 2002), in which participants study a list of nouns that are singular or plural and are asked to discriminate between studied nouns and switched-plurality lures (Hintzman, Curran, & Oppy, 1992). Given the similarities that have been noted between the plurality discrimination and associative recognition tasks (both involve discrimination of studied content from highly similar lures; Malmberg, 2008; Xu & Malmberg, 2007), it would not be unreasonable to expect to find a list strength effect in the associative recognition task, as well.

Only a couple of studies have investigated associative recognition using a list strength paradigm. A study by Murnane and Shiffrin (1991b) showed no effect of list strength on intact–rearranged discrimination. However, the materials used in the study were sentences, and it is not exactly clear how the use of linguistic materials may have affected the results. A study by Verde and Rotello (2004) reported a small positive effect of list strength on associative recognition performance. However, their design was somewhat unconventional, since it used a single list composition with different sets of overlapping pairs (e.g., A–B, B–C); they found that strengthening half of the pairs in an overlapping set reduced performance on the weak pairs from that set, relative to a separate, overlapping set of pairs in which all of the pairs were weak. In other words, if pairs such as A–B, B–C, and D–E were studied, and A–B was presented three times, B–C would elicit worse performance than D–E. This design somewhat resembles an associative interference design, in which the fan (the number of times that each item in a pair occurs in other pairs) of the items is manipulated (e.g., Buchler, Light, & Reder, 2008). To our knowledge, no published list strength experiment has featured a traditional design in which unrelated and nonoverlapping pairs of words are strengthened, despite the relevance of the potential findings to computational models of recognition memory.

Representational overlap and the list strength effect

Dennis and Humphreys (2001) argued that computational models of recognition memory can predict a null list strength effect if interference among the item representations (item noise) is sufficiently low. Dennis and Humphreys presented a model in which all of the interference comes from associations to prior contexts. The model exhibits no item noise, because all of the item representations are orthogonal to each other; consequently, there is no overlap among their representations. If one views item representations as vectors of psychological features, orthogonal item representations imply that the items do not share features and are completely dissimilar to each other. Although this might strike some readers as implausible, a number of hippocampal theories describe the role of the hippocampus as creating sparse representations from distributed neocortical inputs, such that highly similar events can yield distinct nonoverlapping episodic representations in the hippocampus (Marr, 1971; McClelland, McNaughton, & O’Reilly, 1995; Norman & O’Reilly, 2003; O’Reilly & McClelland, 1994). Additionally, orthogonal item representations have been employed in recent models of free recall, including the temporal context model and its variants (TCM; Howard & Kahana, 2002; Polyn, Norman, & Kahana, 2009; Sederberg, Howard, & Kahana, 2008) and the model of Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, and Usher (2005).

When item overlap is zero, a probe cue only matches its own previously stored memories (both in the experimental context and its preexperimental contexts) and does not match the representations of the other list items. Therefore, both the number and strength of the other list items cannot hinder performance, and null effects of list length and list strength on recognition memory performance are predicted. Although the prediction of a null list length effect might seem problematic for this view, Dennis and colleagues noted that a number of confounds in list length designs artifactually contribute to the finding of a list length effect. When such confounds are controlled, investigations have shown no effect of list length on recognition memory performance for item recognition (Dennis & Humphreys, 2001; Dennis, Lee, & Kinnell, 2008; Kinnell & Dennis, 2011) and associative recognition (Kinnell & Dennis, 2012).

The BCDMEM model cannot be used to derive predictions for the associative recognition paradigm, because the model lacks a mechanism for interitem associations. Nonetheless, we can implement some of the assumptions of the BCDMEM model, specifically orthogonal item codes and interference from preexperimentally stored memories, in the matrix model (Humphreys et al. 1989a, 1989b; Pike, 1984). In the matrix model, items are represented as vectors, and associations are represented as outer products of the constituent item vectors. For a list of pairs, all associations are summed to produce a memory matrix M that is a composite of all stored memories. At test, the outer product of the two item cues produces a matrix cue that is matched with M by taking the dot product between the two matrices.

Whereas the original matrix model was an item noise model, if the item vectors were orthogonal, the resulting outer product associations would be orthogonal to each other as well. What this means is that for a list of pairs A–B, C–D, E–F, and so forth, the matrix representing the A–B pair would not exhibit any match to those for the C–D pair and the E–F pair, and visa versa. Additionally, whereas it is common for memory models to assume the contents of memory are empty prior to list presentation, we instead simulated prior associations in memory by initializing the matrix with noise prior to list presentation.

We simulated a list strength paradigm with both the original matrix model and a version in which all of the item vectors were orthogonal and preexperimental interference was included in the memory matrix. In the former case, 64 element vectors were generated from a normal distribution with μ = 0 and σ 2 = 1/64, to ensure unit length vectors. In the latter case, 64 element vectors were generated from an identity matrix, and preexperimental interference was simulated by initializing M with samples from a normal distribution with μ = 0 and σ 2 = .0006.

A list strength paradigm was simulated by storing 16 pairs with a learning rate of .05 along with an additional 16 interference pairs with a learning rate that varied between .05 and 1.0. Response rates were derived by comparing memory strengths to a criterion of .025. The model predictions can be seen in Fig. 1. One can see that with the standard matrix model, performance decreases quite rapidly as the strength of the list is increased. However, for the orthogonal matrix model, d′ is constant for all learning rates of the interference pairs. Additionally, the standard matrix model predicts that the false alarm rate (FAR) for interference pairs should increase as their strength is increased. This is because a rearranged pair A–D overlaps with the stored pairs A–B and C–D. Thus, strengthening of A–B and C–D results in more variance in the distribution of strength values for A–D. For the orthogonal matrix model, however, the rearranged pair A–D does not overlap with the partially matching A–B and C–D pairs, and the FAR thus remains constant as strength is increased.

Fig. 1
figure 1

Matrix model predictions for a standard matrix model, along with one employing orthogonal representations. Hit rates (solid lines) and false alarm rates (dashed lines) for nonstrengthened critical pairs (left) and the strengthened interference pairs (middle) are presented, along with discriminability (d′) for the nonstrengthened pairs (right). In both models, 64 element vectors were simulated with a learning rate of .05 for the critical pairs, and learning rates of .05, .1, .3, .5, .75, and 1.0 for the interference pairs. A criterion of .025 was used to derive the hit and false alarm rates

One should note that this is a very simplified model of associative recognition performance and is by no means comprehensive. This simulation was merely meant to demonstrate that a matrix model employing some of BCDMEM’s assumptions makes similar predictions for the associative recognition paradigm. Specifically, no interference is present among the stored list pairs, a null list strength effect is predicted, and the FARs for both weak and strong rearranged pairs are equivalent. Although we used orthogonal representations to demonstrate a null list strength effect, we are more committed to the idea that overlap between representations is minimal than to strict orthogonality.

Present investigation

We conducted two list strength experiments in which 32 pairs of words were studied—half of these pairs were the critical pairs that were later tested, and half of these pairs are interference pairs. In the pure-weak condition, all pairs were studied once. In the mixed condition, the interference pairs were presented four times (similar paradigms were used by Buratto & Lamberts, 2008; Diana & Reder, 2005; Norman, 2002; Norman, Tepe, Nyhus, & Curran, 2008). A list strength effect would be observed if performance on the critical pairs is worse in the mixed condition than in the pure-weak condition. The experiment was conducted using both yes–no (Exp. 1) and two-alternative forced choice (2AFC; Exp. 2) testing methods to ensure the generality of the results. Our procedure, design, and parameters were very similar to those used in the list length experiment of Kinnell and Dennis (2012), which revealed a null effect of list length on associative recognition performance.

Another utility of our experimental design is that it enables us to measure the strength-based mirror effect (SBME) in associative recognition. A regularity in item recognition memory is that whenever the strength of a set of items is increased, the hit rate for the strengthened items increases and the false alarm rate decreases (Hirshman, 1995; Stretch & Wixted, 1998). This can also be observed as a more conservative response criterion (c) in conditions of stronger list strength; a meta-analysis conducted by Hirshman (1995) showed this pattern to be present in nearly all published list strength designs. In associative recognition, in contrast, when a mixed list of strong and weak pairs are studied, false alarm rates for rearranged pairs composed of weak pairs (weak FAR) is often equivalent to false alarm rates composed from strong pairs (strong FAR; Buchler et al., 2008; Cleary, Curran, & Greene, 2001; Kelley & Wixted, 2001; Mickes, Johnson, & Wixted, 2010). However, when pure weak lists are compared to pure strong lists, a robust strength-based mirror effect is observed, with FARs being significantly lower in the pure strong list (Clark & Shiffrin, 1992; Hockley & Niewiadomski, 2007).

Hockley and Niewiadomski (2007) proposed that the discrepancy among the associative recognition results can be explained if one assumes that a response criterion is set on the basis of list strength but does not change from trial to trial throughout testing (this hypothesis was proposed in item recognition by Stretch & Wixted, 1998). Thus, the FAR is lower when the strength of a list is increased, but rearranged pairs of different strengths on a given list are subject to the same unchanging response criterion, and thus FARs are identical for strong and weak rearranged pairs. However, this hypothesis aimed to explain results from different experiments and it would be useful to obtain both an across-condition SBME and a mixed-list null SBME in the same experiment. If the hypothesis of Hockley and Niewiadomski is correct, the FAR for weak pairs should be lower in the mixed condition than in the pure-weak condition, a higher value of the response criterion c should be observed in the mixed list relative to the pure-weak list, but the FARs for weak and strong pairs should be equivalent in the mixed condition.

In addition to the list strength comparisons, we also manipulated the word frequency of the items in each word pair. We manipulated this variable in part to keep our design as similar as possible to the list length associative recognition experiment conducted by Kinnell and Dennis (2012), but also because only a small number of studies have evaluated the presence of a word frequency effect in associative recognition. This stands in sharp contrast to the item recognition literature, where the effects of word frequency have been extensively investigated (see Glanzer & Adams, 1985, for a review) and have constrained model development (Dennis & Humphreys, 2001; Glanzer, Adams, Iverson, & Kim, 1993; McClelland & Chappell, 1998; Shiffrin & Steyvers, 1997).

To foreshadow our results, null effects of list strength on recognition memory performance were found in both experiments. The FAR results in Experiment 1 were in concordance with the hypothesis of Hockley and Niewiadomski (2007).

Experiment 1

Method

Participants

A total of 96 participants contributed to this experiment. Participants were undergraduate students at Ohio State University who participated in exchange for credit in an introductory psychology course.

Materials

The stimuli were 240 words between five and seven letters in length that were sampled from the Sydney Morning Herald Word Database (Dennis, 1995). To be consistent with the procedure and stimuli used by Kinnell and Dennis (2012), a word frequency manipulation was also included: Half of the words were high-frequency words (HF; 100–200 occurrences per million) and half were low-frequency (LF; 1–4 occurrences per million). All words were randomly paired with other words from the same frequency class and were randomly assigned to the pure-weak or mixed-list conditions.

Procedure

A diagram of the basic procedure can be seen in Fig. 2. Participants completed both the pure-weak and mixed-list conditions, and the order of the conditions was counterbalanced across participants. In both conditions, a total of 32 pairs were presented to the participants at 3,000 ms per pair, with a blank screen occurring between presentations for 250 ms. Participants were also instructed to rate the degree of semantic relatedness between the two words presented on the screen on a 4-point scale and to press a corresponding key on the keyboard during presentation. Upon registration of the participant’s response, the pair remained on the screen for the remainder of the 3,000 ms.

Fig. 2
figure 2

Diagram of the experimental procedure. In the pure-weak condition, 16 critical pairs and interference pairs are studied once; participants engage in a long filler task and are then tested on eight intact and eight rearranged pairs created from the 16 critical pairs. In the mixed condition, 16 critical and 16 interference pairs are studied once, and then the interference pairs are presented an additional three times. Next, participants engage in a shorter filler task; in the test phase, they are tested on eight intact and rearranged pairs created from the 16 critical pairs, and then on eight intact and rearranged pairs created from the 16 interference pairs

In the pure-weak condition, all pairs were presented once during the study list. In the mixed condition, all pairs were presented once, and subsequently, half of the pairs (the interference pairs) were repeated an additional three times, leading to a total of four presentations for the strengthened pairs. During the repetitions, all of the interference pairs were shuffled, and all of them were presented before an additional repetition occurred. That is, if a given interference pair was presented twice, all other interference pairs were then presented twice before a third presentation of any of the pairs was possible.

After the completion of the study list, participants completed a distractor task, in which images of playing cards appeared on the screen and they were instructed to press the space bar when they observed specific sequences (such as two cards in a row that shared the same suit or two cards that summed to 11). The length of the distractor activity was 366 s, for the pure-weak condition, or 210 s, for the mixed condition. The purpose of the different distractor lengths was to ensure that the time between the beginning of the study list and the beginning of the test list was the same for both conditions.

Before each test list, participants were instructed that they would be presented with pairs that were identical to the studied pairs, along with pairs that were composed of studied words in novel arrangements; the former pairs should be endorsed, and the latter should be rejected. In the pure-weak condition, participants were tested on eight weak intact pairs and eight weak rearranged pairs (i.e., rearranged pairs composed of once-presented words) in a randomized order. In the mixed condition, participants were tested on eight weak intact pairs and eight weak rearranged pairs, and subsequently were tested on eight strong intact pairs and eight strong rearranged pairs (i.e., rearranged pairs in which both words had been presented four times).

Although some might find it unusual that the weak pairs were tested before the strong pairs in the mixed condition, this design was chosen to ensure that the test positions for the weak pairs were the same in both the pure-weak and mixed conditions. Performance has been shown to decline across test positions in item recognition (Criss, Malmberg, & Shiffrin, 2011; Ratcliff & Murdock, 1976), and a mixed test list of strong and weak pairs would make it such that weak pairs would be tested at later positions in the mixed condition than in the pure-weak condition, potentially degrading performance and artifactually inducing a list strength effect (pure weak d′ > mixed d′). For this reason, we did not include an additional pure-strong condition with 100 % studied items; it would then be impossible to simultaneously control for retention interval and test position across all three conditions.

Each pair group (weak intact, weak rearranged, strong intact, and strong rearranged) was equally divided into high- and low-frequency pairs. In the rearranged distractor pairs, the left–right orientation of the words from their study presentations was preserved, and the rearranged and intact pairs did not overlap.

The experiment was designed and run using pyEPL (Geller, Schleifer, Sederberg, Jacobs, & Kahana, 2007).

Results

The data were analyzed using equal-variance signal detection measures. Although research using the receiver operating characteristic (ROC) has shown that target variability is approximately 1.25 times the variability of the lure distribution in item recognition (Glanzer, Kim, Hilford, & Adams, 1999; Ratcliff et al., 1994; Ratcliff et al., 1992), equal-variance signal detection models have been found to yield a good fit to the ROC in associative recognition (Kelley & Wixted, 2001; Mickes et al., 2010). To avoid infinite values of d′, edge corrections were performed by adding .5 to the hit and false alarm counts and 1 to the target and distractor counts (Snodgrass & Corwin, 1988). This correction was only performed for the signal detection analyses and all statistical analyses on response rates were on the raw, uncorrected response rates.

The data from 15 participants were excluded for having performance at or below chance (d′ ≤ 0) in one of the experimental conditions. Additionally, the data from one participant were excluded for failing to finish the experiment.

The data from the yes–no tests in this experiment were analyzed using 2 × 2 (Word Frequency × List Strength) repeated measures analyses of variance (ANOVAs). To facilitate exposition of the different results, the list strength results are reported in this section and the word frequency results are reported in the next section. The results for the signal detection measures calculated on the once-presented critical pairs can be seen in Fig. 3, whereas the response rates can be seen in Fig. 4. We found no significant effect of list strength, in that performance on the critical pairs was not significantly worse in the mixed condition (d′ = 1.42) than in the pure-weak condition (d′ = 1.47), F(1, 79) = 0.52, η p 2 = .006, p > .05. The response criterion was significantly higher in the mixed condition (c = .163) than in the pure-weak condition (c = −.065), F(1, 79) = 27.91, η p 2 = .26, p < .001. This is reflected by both lower hit rates, F(1, 79) = 15.31, η p 2 = .16, and lower FARs, F(1, 79) = 12.81, η p 2 = .14, to weak pairs in the mixed condition than in the pure-weak condition, both ps < .001.

Fig. 3
figure 3

Performance measures for Experiments 1 and 2, along with a fit of the matrix model with orthogonal item representations (black circles). Experiment 1 was yes–no (YN) recognition, and performance and bias were measured in terms of the signal detection theory (SDT) measures d′ and c for the once-presented critical pairs. Experiment 2 was two-alternative forced choice (2AFC) recognition, and performance was measured using the probability of correct choices [p(c)]. Error bars represent 95 % within-subjects confidence intervals

Fig. 4
figure 4

Hit rates (HR) and false alarm rates (FAR) for Experiment 1, along with a fit of the matrix model with orthogonal item representations (black circles). Error bars represent 95 % within-subjects confidence intervals

In the mixed condition, strong hit rates were higher than weak hit rates in the yes–no tests, t(79) = 9.87, p < 001. The difference between the FARs for weak (.148) and strong (.178) rearranged pairs was not significant, t(79) = −1.14, p > .05.

Word frequency results

The word frequency results for Experiment 1 can be seen in Table 1. The difference in the d′s between low-frequency (1.39) and high-frequency (1.57) pairs was only marginally significant, F(1, 79) = 3.02, η p 2 = .037, p = .08. Analyses on the response rates indicated that hit rates did not differ significantly across the two word frequency classes, F(1, 79) = 0.172, η p 2 = .002, p > .05, but we did observe a significant difference in the FARs, with low-frequency pairs exhibiting higher FARs than high-frequency pairs, F(1, 79) = 7.91, η p 2 = .09, p < .01. No significant Condition × Word Frequency interactions emerged.

Table 1 Mean hit rates (HR) and false alarm rates (FAR) for strong and weak pairs for both word frequency classes (low frequency [LF] and high frequency [HF]; standard errors of the means are in parentheses)

Discussion

These results indicated no list strength effect for word pairs in an associative recognition task. To our knowledge, this represents the first demonstration of a null list strength effect using nonoverlapping pairs of words. Although our result is contrary to the positive list strength effect reported by Verde and Rotello (2004), their finding may have been due to the overlapping words pairs employed in their study, which resembled a fan manipulation. Our result is consistent with the null list strength effect in associative recognition using sentence materials found by Murnane and Shiffrin (1991b).

Results on the strength based mirror effect were consistent with the hypothesis of Hockley and Niewiadomski (2007). The FARs for weak pairs were lower in the mixed condition than in the pure-weak condition, and yet the FARs for strong and weak rearranged pairs were nearly equivalent in the mixed condition. Such a result can be explained by a model in which the response criterion varies by list strength but is not altered from trial to trial on the basis of a pair’s strength.

The word frequency manipulation produced a higher FAR for low-frequency than for high-frequency pairs, but no difference in the hit rates. This resulted in a small difference in d′s between the two word frequency classes that was only marginally significant. This is a replication of the word frequency results found by Kinnell and Dennis (2012), and similar results (higher FAR for LF pairs) were obtained by Clark and colleagues (Clark, 1992; Clark & Burchett, 1994; Clark & Shiffrin, 1992).

For Experiment 2, we wanted to see whether we could replicate our findings using 2AFC testing to evaluate the generalizability of our results.

Experiment 2

Experiment 2 was identical to Experiment 1 with the exception that the test phase consisted of 2AFC trials instead of yes/no recognition trials. Participants were presented with two pairs, an intact pair and a rearranged pair, and were instructed to select which was the studied pair. The generality of the null list strength effect found in Experiment 1 would be evident if the p(c)s for weak pairs were equivalent across the pure-weak and mixed conditions. A list strength effect would be observed if p(c) was lower in the mixed than in the pure-weak condition.

Method

Participants

A total of 103 participants contributed to this experiment.

Materials

The materials were identical to those used in Experiment 1.

Procedure

The only critical difference between this experiment and Experiment 1 is that at test, on each trial participants were presented with two pairs (one on the left and the other on the right side of the screen), with one being an intact pair and the other being a rearranged pair, and were asked to indicate which was the intact pair. Due to 2AFC tests requiring presentation of two pairs on every trial instead of one, the number of test trials was cut in half, such that eight test trials were presented in the pure-weak condition and 16 in the mixed condition. All test trials contained pairs that were of equal strength (both pairs consisted of once-presented words or four-time-presented words). Like in Experiment 1, in the mixed condition all weak pairs were tested before the strong pairs were tested.

Each pair group (weak target, weak rearranged, strong target, and strong rearranged) was equally divided into high- and low-frequency pairs. Also, in equal numbers of trials the intact pair and the rearranged pair were from the same frequency class or from different frequency classes.

Results

The data from 14 participants were excluded for having at- or below-chance performance in one of the experimental conditions. Additionally, the data from one participant was excluded for failing to finish the experiment.

The results are depicted in Fig. 3. The data were analyzed using 2 × 2 × 2 (Word Frequency × List Strength × Same vs. Different Frequency Choice Trials) repeated measures ANOVAs. We found no significant effect of list strength, in that performance was roughly equivalent across the mixed [p(c) = .849] and pure-weak [p(c) = .859] conditions, F(1, 87) = .44, η p 2 = .005, p > .05.

In the mixed condition, p(c) was higher for strong than for weak pairs, t(87) = 7.58, p < .001.

Word frequency results

The word frequency results for Experiment 2 can be seen in Table 2. We found no significant differences in performance between low-frequency [p(c) = .856] and high-frequency [p(c) = .852] pairs, F(1, 87) = 0.042, η p 2 = .0005, p > .05. Participants also did not show significant differences in performance between trials in which the choices were from the same frequency class [p(c) = .865] and trials in which the choices were from different frequency classes [p(c) = .843], F(1, 87) =1.58, η p 2 = .017, p > .05. No significant interactions emerged.

Table 2 Probabilities of correct choices [p(c)] for both word frequency classes (low frequency [LF] and high frequency [HF]; standard errors of the means are in parentheses) in two-alternative forced choice tests, separated by trial types

Discussion

The results of Experiment 2 replicated the null list strength effect found in Experiment 1 and extends the generalizability of the results to 2AFC testing.

General discussion

We conducted two experiments using a list strength design and found that performance was not reduced when the strength of a set of interference pairs was increased from one presentation to four presentations. This pattern applied to both yes–no testing (Exp. 1) and 2AFC testing (Exp. 2). This also extends the generality of the null list strength effect to the associative recognition paradigm, which has been found to be highly regular in item recognition and has replicated across two decades of recognition memory research.

As we stated in the introduction, null list strength effects can be predicted from a global matching framework in which the overlap among the item representations is minimal. Dennis and Humphreys (2001) argued that these representational assumptions provide a parsimonious account of the null effects of both list length and list strength in item recognition, since the lack of overlap among item representations means that the matching strengths for a given cue item will not be affected by the strength or number of other items stored in memory. The modeling effort here has extended this argument to the case of associative recognition. To evaluate how effectively such a model could accommodate the aspects of the data, we fitted the matrix model with orthogonal item representations described in the introduction to the group mean choice probabilities, collapsed across the two word frequency classes in Experiments 1 and 2. Although several key aspects of the data are qualitatively captured by the simulation presented in Fig. 1, we fitted the model to the data to ensure that it was capable of yielding quantitatively accurate predictions as well. To fit Experiment 1’s data, memory strength was calculated for all pairs and compared to a response criterion specific to that experimental condition. In the fit to Experiment 2’s data, memory strength was calculated for both an intact and a rearranged pair, and the pair was chosen that elicited greater strength. The same parameters were used in fitting both experiments, with the exception that the response criteria used in the fit to Experiment 1’s data were not used in the fit to the 2AFC data from Experiment 2, since 2AFC tests do not require a response criterion.

The learning rate for weak pairs (α weak) was fixed at 1.0 in all simulations. A total of four parameters were optimized in the model fit: the learning rate for strong pairs (α strong); response criteria for the pure-weak and mixed conditions (c PW and c M ); and the variance of the initialized matrix values, to reflect interference from previous associations (γ). To get stable estimates of the choice probabilities, 10,000 Monte Carlo simulations were performed for each set of parameters. The model parameters were optimized using a best-first grid search. The best-fitting parameters were α strong = 1.65, c PW = .45, c M = .61, and γ = .406, and the total sum of squared errors was .00192. The predicted choice probabilities from the matrix model can be seen as the round dots in the preceding figure graphs of the data for Experiments 1 and 2.

Despite the simplicity of the model, it yields an excellent fit to the data. Specifically, the lack of overlap among the item vectors allows it to match the null list strength effects found in both the yes–no data from Experiment 1 and the 2AFC data from Experiment 2. Another consequence of the lack of overlap among item vectors is that the model predicts an equivalence between strong and weak rearranged pairs for a mixed list of strong and weak pairs. In our experiment, we found nearly equivalent FARs between the weak and strong pairs in the mixed condition (although a slight, nonsignificant increase in FARs was visible for strong pairs), which is consistent with the findings of several previous studies (Buchler et al., 2008; Cleary et al., 2001; Kelley & Wixted, 2001; Mickes et al., 2010). Although some studies have shown lower FARs for strong pairs in a mixed list, relative to weak pairs (Light, Patterson, Chung, & Healy, 2004; Malmberg & Xu, 2007; Xu & Malmberg, 2007), this pattern of data is relatively rare, and it appears to be affected by factors such as the time spent giving the response (Light et al., 2004; Malmberg & Xu, 2007) and the stimulus type (Xu & Malmberg, 2007).

We also implemented the assumptions of Hockley and Niewiadomski (2007) in our fit to the data from Experiment 1 (yes–no testing), by allowing the criterion to vary across conditions but not to vary from trial to trial. The more conservative criterion in the mixed condition allowed the model to account for the reduced hit rates and FARs in that condition relative to the pure-weak condition. Additionally, since strong and weak rearranged pairs are subject to the same degrees of interference due to the lack of overlap among the item vectors, they also exhibited equivalent FARs.

As we mentioned in the introduction, positive list strength effects have been found in cued recall (Kahana et al., 2005; Ratcliff et al., 1990), which stands in contrast to the null list strength effect in associative recognition found in the present investigation. A number of theories posit that associative recognition decisions reflect the outcome of two retrieval processes: a familiarity-based component that is similar to the one employed in the matrix model, along with a slower cued-recall component that is used to reject rearranged pairs (a “recall-to-reject” process; Light et al., 2004; Malmberg, 2008; Malmberg & Xu, 2007; Rotello & Heit, 2000; Rotello, Macmillan, & Van Tassel, 2000; Xu & Malmberg, 2007). Although we were able to effectively model our results without an additional cued-recall process, our present results and modeling do not necessarily argue against dual-process models of associative recognition. First, dual-process models that predict a positive list strength effect in cued recall may be able to account for the present data if the relative contribution of cued recall is small relative to the familiarity process, or if the cued-recall process only contributes under circumstances not present in our experiments (such as when participants are forced to wait to give their response; see, e.g., Light et al., 2004). Second, the positive list strength effects found in cued recall have come from investigations that did not employ all of the controls used in our experiments, such as equating the retention intervals and test positions of the tested pairs across the list strength conditions. As was stated in the Method section, these confounds can artifactually induce the finding of a list strength effect by inducing poorer performance in conditions of higher list strength. Given that the cued-recall list strength effects found by Ratcliff et al. (1990) and Kahana et al. (2005) were quite small in magnitude, it would not be surprising if an investigation controlling for these confounds were to show a null list strength effect in associative recognition, as well.

One of the rather perplexing aspects of the null list strength effect that we found is how it contrasts with the positive list strength effect found in the plurality discrimination task (Buratto & Lamberts, 2008; Norman, 2002). The plurality discrimination task, in which participants study words like “cat” or “bananas,” involves testing lures that are the same words as the studied words, but with changed plurality (i.e., “cats” instead of “cat” or “banana” instead of “bananas”). Malmberg (2008) noted similarities between the plurality discrimination task and the associative recognition task—namely, that participants are asked to discriminate between studied content and highly similar content. On the basis of this similarity, one might expect to see positive list strength effects in the two tasks. What is even more surprising is that Buratto and Lamberts used a very similar experimental design that equated retention intervals across the pure-weak and mixed conditions, and they found a positive list strength effect for plurality discrimination. In addition, their strength ratio was lower than the one in our experiments, in that strong items were presented three times, whereas in our experiments strong pairs were presented four times, so it is unlikely that the discrepancy between the positive list strength effect that they found in plurality discrimination and the null list strength effect that we found in associative recognition was due to power differences.

It is unclear how computational models can address this discrepancy between the plurality discrimination and associative recognition tasks. Although some theorists have suggested that an additional recall process plays a critical role in the plurality discrimination task (Hintzman et al., 1992; Malmberg, Holden, & Shiffrin, 2004), Buratto and Lamberts (2008) found no effect of list length on plurality discrimination judgments, but did find a positive list strength effect in the same experiment. Models of cued recall generally make opposite predictions, in that they naturally predict a list length effect due to the increased size of the search set in longer lists, whereas the prediction of a list strength effect depends on the parameterization of the model. In discussion of a variant of the SAM model that predicts a null list strength effect in item recognition, Shiffrin et al. (1990) noted that the model could only predict a positive list strength effect in cued recall if the context cue were given greater weight than the item cue at retrieval.

Comparison to the REM model of recognition memory

Another current computational model of recognition memory that can address the associative recognition task is the REM model (Shiffrin & Steyvers, 1997). Like the matrix model that we used, REM represents items as vectors of features, but associations are represented as concatenations of item vectors rather than outer products. REM predicts a null list strength effect in item recognition because of its differentiation mechanism, in which item repetitions accumulate into a single memory trace that is more responsive to its own cue but less responsive to other cues. This mechanism makes it such that strong memory traces exhibit weaker responses to all other cues, regardless of whether the cues are targets or distractors; thus, interference decreases with list strength, and a null list strength effect is predicted in item recognition (see Criss, 2006, for an illustration of why differentiation models make this prediction).

We conducted simulations of our own and found that REM predicts a null list strength effect in associative recognition, meaning that our results are consistent with the basic REM model. This prediction is a direct consequence of the differentiation mechanism: Concatenated memory traces give weaker responses to other cues (assuming that the cues share no words with the trace) as strength is increased, and d′ remains roughly constant. Thus, the REM model predicts a null list strength effect because item noise is reduced as the strength of the pairs is increased, whereas in the matrix model that we have presented a null list strength effect is predicted, because item noise is absent as a consequence of the orthogonal item representations. Nonetheless, the models make different predictions with regard to manipulations of list length.Footnote 1

Nonetheless, one problem with the model is that it predicts a much larger FAR to strong rearranged pairs than to weak rearranged pairs in the mixed condition, due to the method of association by concatenation of the memory traces. To understand how this works, consider a case in which pairs A–B and C–D are studied, leading to a concatenated A–B memory trace as well as a C–D memory trace. If a rearranged pair such as A–C were to be used, the A features in the probe would partially match the A–B trace, and the C features of the probe would partially match the C–D memory trace. As the strength of the A–B and C–D memory traces is increased, the strengths of the two partial matches increase, increasing the FAR. This pattern has been noted previously in the literature, and various solutions, such as adding a recall-to-reject process (Malmberg, 2008; Malmberg & Xu, 2007; Xu & Malmberg, 2007) or features that represent the ensemble itself (Criss & Shiffrin, 2005), have been proposed to solve this problem. Although these are both plausible solutions, there has yet to be a comparison of the relative advantages and disadvantages of each of these model extensions.

The matrix model that we have presented does not require an additional recall process or distinctions between item and ensemble features to predict equivalent FARs across different levels of list strength. The orthogonal item representations lead to outer product associations that are completely dissimilar to each other, such that an A–B association has no match to an A–C rearranged pair. Thus, increases in the strength of the A–B association have no bearing on the match of the A–C cue to the contents of memory, and thus the FARs are equivalent for weak and strong rearranged pairs.

Word frequency effects in associative recognition

Although the word frequency manipulation was only peripheral to our investigation of the list strength paradigm, it is somewhat surprising that discrepant results emerged in Experiments 1 and 2. In Experiment 1, we found a higher FAR for low- than for high-frequency pairs, which is consistent with some previous investigations of the word frequency effect in associative recognition (Clark, 1992; Clark & Burchett, 1994; Clark & Shiffrin, 1992; Kinnell & Dennis, 2012). Nonetheless, our second experiment showed no differences in p(c) between low- and high-frequency pairs. Although we were not able to find any previous studies that had investigated word frequency differences in associative recognition using the 2AFC testing method, some previous studies have shown no differences in performance between low- and high-frequency pairs, while showing the usual mirror effect in item recognition using the same stimuli (Hockley, 1994; Ratcliff, Thapar, & McKoon, 2011). These disparate results make it unclear how models that are successful in accounting for the word frequency effect in item recognition could also explain word frequency results in associative recognition.

Conclusion

In two experiments, we examined the presence of a list strength effect in associative recognition and found no such effect: Strengthening a set of interference pairs caused no decrement in performance for a set of nonstrengthened pairs. Additionally, we found a near equivalence between the FARs for weak and strong rearranged pairs in a mixed list of strong and weak pairs (the mixed condition). We have demonstrated that both findings can be explained using a model in which item representations do not overlap with each other.