Associative recognition and the list strength paradigm

Osth, Adam F.; Dennis, Simon

doi:10.3758/s13421-013-0386-6

Associative recognition and the list strength paradigm

Published: 07 December 2013

Volume 42, pages 583–594, (2014)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Associative recognition and the list strength paradigm

Download PDF

Adam F. Osth¹ &
Simon Dennis²

468 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

When a subset of list items is strengthened, the discriminability of the nonstrengthened items is unaffected. This regularity has been dubbed the null list strength effect (LSE), and despite its many replications in item recognition, little research has investigated whether an LSE occurs in associative recognition. We conducted two experiments in which a set of pairs were studied once and a set of interference pairs were studied either once (pure-weak-list condition) or four times (mixed-list condition). Equivalent levels of performance for the nonstrengthened pairs were observed in both the pure-weak and mixed conditions using both yes–no and two-alternative forced choice testing. Additionally, equivalent false alarm rates were observed between rearranged pairs composed of weak and strong items. Both sets of results were found to be consistent with a matrix model that has no overlap among its item representations.

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

Twenty years of load theory—Where are we now, and where should we go next?

Article 04 January 2016

Guided Search 6.0: An updated model of visual search

Article 05 February 2021

Recognition memory involves the judgment of prior occurrence of a learned stimulus. Recognition can be tested in a number of different ways; in the case of item recognition, a list of single items is studied and memory is tested by asking the participant to discriminate between studied and unstudied items. In associative recognition, pairs of items (such as A–B, C–D, etc.) are studied and memory is tested by having the participant discriminate between studied pairs (e.g., A–B; referred to as intact pairs) and pairs that are composed of studied words in a new arrangement (such as A–D; referred to as rearranged pairs).

A watershed moment in recognition memory theorization came with the advent of the global matching models of recognition memory (Clark & Gronlund, 1996; Humphreys, Pike, Bain, & Tehan, 1989b), which include the theory of distributed associative memory (TODAM; Murdock, 1982), the search of associative memory model (SAM; Gillund & Shiffrin, 1984), the Minerva 2 model (Hintzman, 1988), and the matrix model (Humphreys, Bain, & Pike, 1989a; Pike, 1984). In these models, retrieval is specified as a global match between the retrieval cues and the contents of memory, producing a summed “familiarity” value that is the basis of a recognition memory decision. What was considered a success of these models at the time was the ability to account for the list length effect in recognition memory performance, in which performance is reduced as the length of a study list is increased (Strong, 1912). Global matching models predict a list length effect because the matching strength between the probe cue and each stored item representation has nonzero variance, due to spurious matches between the features of the cue and the stored item. When the matching strengths are summed together, the mean and variance of the resulting familiarity distribution are the sums of the means and variances of the matching strengths between the probe cue and the stored item representations. Thus, as more items are added to the contents of memory, the familiarity distributions have more variance components, increasing the overlap between the distributions for targets and distracters and decreasing discriminability as a consequence (Clark & Gronlund, 1996). Since all of the interference in these models comes from the studied items in the list episode (as opposed to preexisting memories), these models have been described as item noise models.

The “first wave” of global matching models described by Clark and Gronlund (1996) were abandoned, due to the finding of the null list strength effect in item recognition. A counterintuitive prediction of the global matching models was that increasing the strength of the studied items, either by additional study time or repetition, would harm the performance of nonstudied items, due to the additional variance contribution to retrieval from the strengthened items (see Shiffrin, Ratcliff, & Clark, 1990, for formal descriptions of why each of the models makes this prediction). One way to think about why the models made this prediction is to consider a case in which a repeated item is added to the contents of memory. The repetition should contribute additional variance to the retrieval process in the same manner as increasing the length of the study list and performance should decrease for the nonstrengthened items. Ratcliff, Clark, and Shiffrin (1990) tested the list strength prediction of the global matching models and found that it was incorrect: Weak items were not harmed by the strengthening of other list items, and strong items did not benefit from the accompaniment of weak items on the study list.

The null list strength effect in item recognition has been replicated several times since its initial discovery by Ratcliff et al. (1990; Hirshman, 1995; Kahana, Rizzuto, & Schneider, 2005; Murnane & Shiffrin, 1991a, 1991b; Ratcliff, McKoon, & Tindall, 1994; Ratcliff, Sheu, & Gronlund, 1992; Shiffrin, Huber, & Marinelli, 1995; Yonelinas, Hockley, & Murdock, 1992). Computational models have since been developed that can properly predict a null list strength effect in item recognition. These include the retrieving-effectively-from-memory model (REM; Shiffrin & Steyvers, 1997), the subjective likelihood in memory model (SLiM; McClelland & Chappell, 1998), and the bind–cue–decide model of episodic memory (BCDMEM; Dennis & Humphreys, 2001).

Despite the generality of the null list strength effect in item recognition, little work has been done to establish whether the effect applies to associative recognition. Congruence among the two tasks is not guaranteed: Positive list strength effects have been found in free recall (Malmberg & Shiffrin, 2005; Ratcliff et al., 1990; Tulving & Hastie, 1972) and cued recall (Kahana et al., 2005; Ratcliff et al., 1990), although it should be mentioned that in the studies by Ratcliff et al. (1990), the cued-recall list strength effect was smaller and less robust than the one found in free recall. Additionally, positive list strength effects have been found in the plurality discrimination task (Buratto & Lamberts, 2008; Norman, 2002), in which participants study a list of nouns that are singular or plural and are asked to discriminate between studied nouns and switched-plurality lures (Hintzman, Curran, & Oppy, 1992). Given the similarities that have been noted between the plurality discrimination and associative recognition tasks (both involve discrimination of studied content from highly similar lures; Malmberg, 2008; Xu & Malmberg, 2007), it would not be unreasonable to expect to find a list strength effect in the associative recognition task, as well.

Only a couple of studies have investigated associative recognition using a list strength paradigm. A study by Murnane and Shiffrin (1991b) showed no effect of list strength on intact–rearranged discrimination. However, the materials used in the study were sentences, and it is not exactly clear how the use of linguistic materials may have affected the results. A study by Verde and Rotello (2004) reported a small positive effect of list strength on associative recognition performance. However, their design was somewhat unconventional, since it used a single list composition with different sets of overlapping pairs (e.g., A–B, B–C); they found that strengthening half of the pairs in an overlapping set reduced performance on the weak pairs from that set, relative to a separate, overlapping set of pairs in which all of the pairs were weak. In other words, if pairs such as A–B, B–C, and D–E were studied, and A–B was presented three times, B–C would elicit worse performance than D–E. This design somewhat resembles an associative interference design, in which the fan (the number of times that each item in a pair occurs in other pairs) of the items is manipulated (e.g., Buchler, Light, & Reder, 2008). To our knowledge, no published list strength experiment has featured a traditional design in which unrelated and nonoverlapping pairs of words are strengthened, despite the relevance of the potential findings to computational models of recognition memory.

Representational overlap and the list strength effect

Dennis and Humphreys (2001) argued that computational models of recognition memory can predict a null list strength effect if interference among the item representations (item noise) is sufficiently low. Dennis and Humphreys presented a model in which all of the interference comes from associations to prior contexts. The model exhibits no item noise, because all of the item representations are orthogonal to each other; consequently, there is no overlap among their representations. If one views item representations as vectors of psychological features, orthogonal item representations imply that the items do not share features and are completely dissimilar to each other. Although this might strike some readers as implausible, a number of hippocampal theories describe the role of the hippocampus as creating sparse representations from distributed neocortical inputs, such that highly similar events can yield distinct nonoverlapping episodic representations in the hippocampus (Marr, 1971; McClelland, McNaughton, & O’Reilly, 1995; Norman & O’Reilly, 2003; O’Reilly & McClelland, 1994). Additionally, orthogonal item representations have been employed in recent models of free recall, including the temporal context model and its variants (TCM; Howard & Kahana, 2002; Polyn, Norman, & Kahana, 2009; Sederberg, Howard, & Kahana, 2008) and the model of Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, and Usher (2005).

When item overlap is zero, a probe cue only matches its own previously stored memories (both in the experimental context and its preexperimental contexts) and does not match the representations of the other list items. Therefore, both the number and strength of the other list items cannot hinder performance, and null effects of list length and list strength on recognition memory performance are predicted. Although the prediction of a null list length effect might seem problematic for this view, Dennis and colleagues noted that a number of confounds in list length designs artifactually contribute to the finding of a list length effect. When such confounds are controlled, investigations have shown no effect of list length on recognition memory performance for item recognition (Dennis & Humphreys, 2001; Dennis, Lee, & Kinnell, 2008; Kinnell & Dennis, 2011) and associative recognition (Kinnell & Dennis, 2012).

The BCDMEM model cannot be used to derive predictions for the associative recognition paradigm, because the model lacks a mechanism for interitem associations. Nonetheless, we can implement some of the assumptions of the BCDMEM model, specifically orthogonal item codes and interference from preexperimentally stored memories, in the matrix model (Humphreys et al. 1989a, 1989b; Pike, 1984). In the matrix model, items are represented as vectors, and associations are represented as outer products of the constituent item vectors. For a list of pairs, all associations are summed to produce a memory matrix M that is a composite of all stored memories. At test, the outer product of the two item cues produces a matrix cue that is matched with M by taking the dot product between the two matrices.

Whereas the original matrix model was an item noise model, if the item vectors were orthogonal, the resulting outer product associations would be orthogonal to each other as well. What this means is that for a list of pairs A–B, C–D, E–F, and so forth, the matrix representing the A–B pair would not exhibit any match to those for the C–D pair and the E–F pair, and visa versa. Additionally, whereas it is common for memory models to assume the contents of memory are empty prior to list presentation, we instead simulated prior associations in memory by initializing the matrix with noise prior to list presentation.

We simulated a list strength paradigm with both the original matrix model and a version in which all of the item vectors were orthogonal and preexperimental interference was included in the memory matrix. In the former case, 64 element vectors were generated from a normal distribution with μ = 0 and σ ² = 1/64, to ensure unit length vectors. In the latter case, 64 element vectors were generated from an identity matrix, and preexperimental interference was simulated by initializing M with samples from a normal distribution with μ = 0 and σ ² = .0006.

A list strength paradigm was simulated by storing 16 pairs with a learning rate of .05 along with an additional 16 interference pairs with a learning rate that varied between .05 and 1.0. Response rates were derived by comparing memory strengths to a criterion of .025. The model predictions can be seen in Fig. 1. One can see that with the standard matrix model, performance decreases quite rapidly as the strength of the list is increased. However, for the orthogonal matrix model, d′ is constant for all learning rates of the interference pairs. Additionally, the standard matrix model predicts that the false alarm rate (FAR) for interference pairs should increase as their strength is increased. This is because a rearranged pair A–D overlaps with the stored pairs A–B and C–D. Thus, strengthening of A–B and C–D results in more variance in the distribution of strength values for A–D. For the orthogonal matrix model, however, the rearranged pair A–D does not overlap with the partially matching A–B and C–D pairs, and the FAR thus remains constant as strength is increased.

One should note that this is a very simplified model of associative recognition performance and is by no means comprehensive. This simulation was merely meant to demonstrate that a matrix model employing some of BCDMEM’s assumptions makes similar predictions for the associative recognition paradigm. Specifically, no interference is present among the stored list pairs, a null list strength effect is predicted, and the FARs for both weak and strong rearranged pairs are equivalent. Although we used orthogonal representations to demonstrate a null list strength effect, we are more committed to the idea that overlap between representations is minimal than to strict orthogonality.

Present investigation

We conducted two list strength experiments in which 32 pairs of words were studied—half of these pairs were the critical pairs that were later tested, and half of these pairs are interference pairs. In the pure-weak condition, all pairs were studied once. In the mixed condition, the interference pairs were presented four times (similar paradigms were used by Buratto & Lamberts, 2008; Diana & Reder, 2005; Norman, 2002; Norman, Tepe, Nyhus, & Curran, 2008). A list strength effect would be observed if performance on the critical pairs is worse in the mixed condition than in the pure-weak condition. The experiment was conducted using both yes–no (Exp. 1) and two-alternative forced choice (2AFC; Exp. 2) testing methods to ensure the generality of the results. Our procedure, design, and parameters were very similar to those used in the list length experiment of Kinnell and Dennis (2012), which revealed a null effect of list length on associative recognition performance.

Another utility of our experimental design is that it enables us to measure the strength-based mirror effect (SBME) in associative recognition. A regularity in item recognition memory is that whenever the strength of a set of items is increased, the hit rate for the strengthened items increases and the false alarm rate decreases (Hirshman, 1995; Stretch & Wixted, 1998). This can also be observed as a more conservative response criterion (c) in conditions of stronger list strength; a meta-analysis conducted by Hirshman (1995) showed this pattern to be present in nearly all published list strength designs. In associative recognition, in contrast, when a mixed list of strong and weak pairs are studied, false alarm rates for rearranged pairs composed of weak pairs (weak FAR) is often equivalent to false alarm rates composed from strong pairs (strong FAR; Buchler et al., 2008; Cleary, Curran, & Greene, 2001; Kelley & Wixted, 2001; Mickes, Johnson, & Wixted, 2010). However, when pure weak lists are compared to pure strong lists, a robust strength-based mirror effect is observed, with FARs being significantly lower in the pure strong list (Clark & Shiffrin, 1992; Hockley & Niewiadomski, 2007).

Hockley and Niewiadomski (2007) proposed that the discrepancy among the associative recognition results can be explained if one assumes that a response criterion is set on the basis of list strength but does not change from trial to trial throughout testing (this hypothesis was proposed in item recognition by Stretch & Wixted, 1998). Thus, the FAR is lower when the strength of a list is increased, but rearranged pairs of different strengths on a given list are subject to the same unchanging response criterion, and thus FARs are identical for strong and weak rearranged pairs. However, this hypothesis aimed to explain results from different experiments and it would be useful to obtain both an across-condition SBME and a mixed-list null SBME in the same experiment. If the hypothesis of Hockley and Niewiadomski is correct, the FAR for weak pairs should be lower in the mixed condition than in the pure-weak condition, a higher value of the response criterion c should be observed in the mixed list relative to the pure-weak list, but the FARs for weak and strong pairs should be equivalent in the mixed condition.

In addition to the list strength comparisons, we also manipulated the word frequency of the items in each word pair. We manipulated this variable in part to keep our design as similar as possible to the list length associative recognition experiment conducted by Kinnell and Dennis (2012), but also because only a small number of studies have evaluated the presence of a word frequency effect in associative recognition. This stands in sharp contrast to the item recognition literature, where the effects of word frequency have been extensively investigated (see Glanzer & Adams, 1985, for a review) and have constrained model development (Dennis & Humphreys, 2001; Glanzer, Adams, Iverson, & Kim, 1993; McClelland & Chappell, 1998; Shiffrin & Steyvers, 1997).

To foreshadow our results, null effects of list strength on recognition memory performance were found in both experiments. The FAR results in Experiment 1 were in concordance with the hypothesis of Hockley and Niewiadomski (2007).

Experiment 1

Method

Participants

A total of 96 participants contributed to this experiment. Participants were undergraduate students at Ohio State University who participated in exchange for credit in an introductory psychology course.

Materials

The stimuli were 240 words between five and seven letters in length that were sampled from the Sydney Morning Herald Word Database (Dennis, 1995). To be consistent with the procedure and stimuli used by Kinnell and Dennis (2012), a word frequency manipulation was also included: Half of the words were high-frequency words (HF; 100–200 occurrences per million) and half were low-frequency (LF; 1–4 occurrences per million). All words were randomly paired with other words from the same frequency class and were randomly assigned to the pure-weak or mixed-list conditions.

Procedure

A diagram of the basic procedure can be seen in Fig. 2. Participants completed both the pure-weak and mixed-list conditions, and the order of the conditions was counterbalanced across participants. In both conditions, a total of 32 pairs were presented to the participants at 3,000 ms per pair, with a blank screen occurring between presentations for 250 ms. Participants were also instructed to rate the degree of semantic relatedness between the two words presented on the screen on a 4-point scale and to press a corresponding key on the keyboard during presentation. Upon registration of the participant’s response, the pair remained on the screen for the remainder of the 3,000 ms.

In the pure-weak condition, all pairs were presented once during the study list. In the mixed condition, all pairs were presented once, and subsequently, half of the pairs (the interference pairs) were repeated an additional three times, leading to a total of four presentations for the strengthened pairs. During the repetitions, all of the interference pairs were shuffled, and all of them were presented before an additional repetition occurred. That is, if a given interference pair was presented twice, all other interference pairs were then presented twice before a third presentation of any of the pairs was possible.

After the completion of the study list, participants completed a distractor task, in which images of playing cards appeared on the screen and they were instructed to press the space bar when they observed specific sequences (such as two cards in a row that shared the same suit or two cards that summed to 11). The length of the distractor activity was 366 s, for the pure-weak condition, or 210 s, for the mixed condition. The purpose of the different distractor lengths was to ensure that the time between the beginning of the study list and the beginning of the test list was the same for both conditions.

Before each test list, participants were instructed that they would be presented with pairs that were identical to the studied pairs, along with pairs that were composed of studied words in novel arrangements; the former pairs should be endorsed, and the latter should be rejected. In the pure-weak condition, participants were tested on eight weak intact pairs and eight weak rearranged pairs (i.e., rearranged pairs composed of once-presented words) in a randomized order. In the mixed condition, participants were tested on eight weak intact pairs and eight weak rearranged pairs, and subsequently were tested on eight strong intact pairs and eight strong rearranged pairs (i.e., rearranged pairs in which both words had been presented four times).

Although some might find it unusual that the weak pairs were tested before the strong pairs in the mixed condition, this design was chosen to ensure that the test positions for the weak pairs were the same in both the pure-weak and mixed conditions. Performance has been shown to decline across test positions in item recognition (Criss, Malmberg, & Shiffrin, 2011; Ratcliff & Murdock, 1976), and a mixed test list of strong and weak pairs would make it such that weak pairs would be tested at later positions in the mixed condition than in the pure-weak condition, potentially degrading performance and artifactually inducing a list strength effect (pure weak d′ > mixed d′). For this reason, we did not include an additional pure-strong condition with 100 % studied items; it would then be impossible to simultaneously control for retention interval and test position across all three conditions.

Each pair group (weak intact, weak rearranged, strong intact, and strong rearranged) was equally divided into high- and low-frequency pairs. In the rearranged distractor pairs, the left–right orientation of the words from their study presentations was preserved, and the rearranged and intact pairs did not overlap.

The experiment was designed and run using pyEPL (Geller, Schleifer, Sederberg, Jacobs, & Kahana, 2007).

Results

The data were analyzed using equal-variance signal detection measures. Although research using the receiver operating characteristic (ROC) has shown that target variability is approximately 1.25 times the variability of the lure distribution in item recognition (Glanzer, Kim, Hilford, & Adams, 1999; Ratcliff et al., 1994; Ratcliff et al., 1992), equal-variance signal detection models have been found to yield a good fit to the ROC in associative recognition (Kelley & Wixted, 2001; Mickes et al., 2010). To avoid infinite values of d′, edge corrections were performed by adding .5 to the hit and false alarm counts and 1 to the target and distractor counts (Snodgrass & Corwin, 1988). This correction was only performed for the signal detection analyses and all statistical analyses on response rates were on the raw, uncorrected response rates.

The data from 15 participants were excluded for having performance at or below chance (d′ ≤ 0) in one of the experimental conditions. Additionally, the data from one participant were excluded for failing to finish the experiment.

The data from the yes–no tests in this experiment were analyzed using 2 × 2 (Word Frequency × List Strength) repeated measures analyses of variance (ANOVAs). To facilitate exposition of the different results, the list strength results are reported in this section and the word frequency results are reported in the next section. The results for the signal detection measures calculated on the once-presented critical pairs can be seen in Fig. 3, whereas the response rates can be seen in Fig. 4. We found no significant effect of list strength, in that performance on the critical pairs was not significantly worse in the mixed condition (d′ = 1.42) than in the pure-weak condition (d′ = 1.47), F(1, 79) = 0.52, η _p ² = .006, p > .05. The response criterion was significantly higher in the mixed condition (c = .163) than in the pure-weak condition (c = −.065), F(1, 79) = 27.91, η _p ² = .26, p < .001. This is reflected by both lower hit rates, F(1, 79) = 15.31, η _p ² = .16, and lower FARs, F(1, 79) = 12.81, η _p ² = .14, to weak pairs in the mixed condition than in the pure-weak condition, both ps < .001.

In the mixed condition, strong hit rates were higher than weak hit rates in the yes–no tests, t(79) = 9.87, p < 001. The difference between the FARs for weak (.148) and strong (.178) rearranged pairs was not significant, t(79) = −1.14, p > .05.

Word frequency results

The word frequency results for Experiment 1 can be seen in Table 1. The difference in the d′s between low-frequency (1.39) and high-frequency (1.57) pairs was only marginally significant, F(1, 79) = 3.02, η _p ² = .037, p = .08. Analyses on the response rates indicated that hit rates did not differ significantly across the two word frequency classes, F(1, 79) = 0.172, η _p ² = .002, p > .05, but we did observe a significant difference in the FARs, with low-frequency pairs exhibiting higher FARs than high-frequency pairs, F(1, 79) = 7.91, η _p ² = .09, p < .01. No significant Condition × Word Frequency interactions emerged.

Table 1 Mean hit rates (HR) and false alarm rates (FAR) for strong and weak pairs for both word frequency classes (low frequency [LF] and high frequency [HF]; standard errors of the means are in parentheses)

Full size table

Discussion

These results indicated no list strength effect for word pairs in an associative recognition task. To our knowledge, this represents the first demonstration of a null list strength effect using nonoverlapping pairs of words. Although our result is contrary to the positive list strength effect reported by Verde and Rotello (2004), their finding may have been due to the overlapping words pairs employed in their study, which resembled a fan manipulation. Our result is consistent with the null list strength effect in associative recognition using sentence materials found by Murnane and Shiffrin (1991b).

Results on the strength based mirror effect were consistent with the hypothesis of Hockley and Niewiadomski (2007). The FARs for weak pairs were lower in the mixed condition than in the pure-weak condition, and yet the FARs for strong and weak rearranged pairs were nearly equivalent in the mixed condition. Such a result can be explained by a model in which the response criterion varies by list strength but is not altered from trial to trial on the basis of a pair’s strength.

The word frequency manipulation produced a higher FAR for low-frequency than for high-frequency pairs, but no difference in the hit rates. This resulted in a small difference in d′s between the two word frequency classes that was only marginally significant. This is a replication of the word frequency results found by Kinnell and Dennis (2012), and similar results (higher FAR for LF pairs) were obtained by Clark and colleagues (Clark, 1992; Clark & Burchett, 1994; Clark & Shiffrin, 1992).

For Experiment 2, we wanted to see whether we could replicate our findings using 2AFC testing to evaluate the generalizability of our results.

Experiment 2

Experiment 2 was identical to Experiment 1 with the exception that the test phase consisted of 2AFC trials instead of yes/no recognition trials. Participants were presented with two pairs, an intact pair and a rearranged pair, and were instructed to select which was the studied pair. The generality of the null list strength effect found in Experiment 1 would be evident if the p(c)s for weak pairs were equivalent across the pure-weak and mixed conditions. A list strength effect would be observed if p(c) was lower in the mixed than in the pure-weak condition.