Every day we make decisions by identifying, discriminating, comparing, and categorizing objects that have different visual features which come from the same dimension (e.g., colors, sizes, shapes etc.). To give a few examples of this everyday decision-making, imagine you wish to select a ripe banana from a bunch: How do you decide which banana is the most preferable? An obvious approach is to compare the color of the bananas, preferring the bright yellow bananas while avoiding the green and overripe brown bananas. Likewise, a food safety inspector may need to comply with federal guidelines on the color of meat in order to decide whether the rib eye is safe, and must therefore compare the color of the rib eye with a meat safety color chart. A navy ship office may need to identify and interpret the color combination of the signal flags of an approaching vessel. Understanding how we make categorization decisions using stimuli that vary on the same type of feature is a fundamental question in the psychology of perception and cognition. Our interest in these examples and the present paper is not how we identify the color, but rather how that color satisfies some criterion that informs some decision: Is this the ripest banana? Is the meat fresh enough? What is the other ship’s intention?

Recent work has utilized rule-based theories of categorization incorporating theories of response time (RT), not only to characterize decision-making, which necessitates the integration of a range of different features (e.g., size, color, shape) and their configurations, but also to answer more fundamental questions regarding perception, attention, and decision-making (Griffiths, Blunden, & Little, 2017; Little, Nosofsky, & Denton, 2011; Fifić, Little, & Nosofsky, 2010; Little, Nosofsky, Donkin, & Denton, 2013; Moneer, Wang, & Little, 2016). One critical question regards the underlying architecture of decision-making. Here, architecture refers to the organization of mental processes, or the way in which we combine information (Kantowitz, 1974; Sternberg, 1969; Schweickert, 1993; Townsend, 1984). More technically, it refers to distinguishing between serial, parallel, and coactive processing. For example, when selecting a banana do you make a decision on the color of each banana one at a time (i.e., serially)? Or alternatively, do you do so simultaneously, in separate decision-making channels (i.e., in parallel), or do you pool all your perceptual information into one single decision-making channel and make one overall decision (i.e., by processing the colors coactively; (Miller, 1982); see Fig. 1)?

Fig. 1
figure 1

Schematic illustration of the three processing architectures: serial, parallel, and coactive. This example uses a hypothetical stimulus with two dimensions: X and Y. In a serial model, decisions about each dimension are made sequentially, one after the other. In a parallel model, decisions are made simultaneously, but in independent processing channels. In a coactive model, the information from both stimuli is combine into a single processing channel from which a decision is made

Beyond identifying the processing architecture, which is not always straightforward due to pervasive model mimicry in many tasks (Townsend, 1990), a further question is what factors affect the processing architecture of decision-making? For example, characteristics such as whether or not features or dimensions can be attended to in isolation, and the configurations of these dimensions, such as spatial separation, have been shown to play a key role in determining how information from different dimensions is integrated (Moneer et al., 2016; Little et al., 2011). For example, it is difficult to attend to hue, brightness, or saturation independently. It would be therefore impossible to judge the ripeness of a banana based on the hue of its color alone, while ignoring saturation or brightness. Instead, information about these three dimensions must be pooled in order to make a categorization decision. On the other hand, judging which variety of bananas to buy (e.g., the Cavendish or the Lady Fingers, two popular Australian varieties) might require comparison of two separate dimensions, which are easy to attend to in isolation: freshness and price. Such information is likely to be located in two different spatial locations requiring attention to be independently deployed to each, with categorization decisions made independently. The ability to diagnose the underlying processing architecture is therefore crucial as it provides information about the role of spatial attention and selective attention in the sequencing of dimensions for processing. Here we use spatial attention to refer specifically to the allocation of attention across space, while selective attention refers to the more general process of weighting specific stimulus attributes, potentially including distance. Many paradigms for studying attention focus on accuracy, choice data, or mean RT alone (for example, spatial cuing tasks, see e.g., Posner, Snyder, & Davidson, 1980, or visual search, see e.g., Treisman and Gelade (1980)); however, these paradigms are often limited in their ability to differentiate serial, parallel, and coactive architectures from each other.

We seek to add to the growing body of work on visual processing in categorization by testing an important configuration of multidimensional stimuli such as those in the introductory examples: stimuli which are composed of dimensions separated in space but which comprise differing levels of the same type of feature, henceforth referred to as within-dimension features (see Fig. 2 for some examples). This type of stimulus is interesting from the perspective of visual attention as it has features that would appear on the same feature map, for instance, in Treisman and Gelade’s (1980) Feature Integration Theory (FIT). In our experiments, the features are separated spatially and take on different luminance values but can be continuously transformed from one to another. By contrast, this would not be the case for features conceived to be on different feature maps (e.g., shape vs. color), which we term between-dimension features (see Fig. 2). This is a key distinction in many theories of visual search, which find differences in performance between the two stimulus types (Wolfe et al., 1990). It is therefore possible that FIT may foreshadow a difference in categorization strategy for between and within-dimension stimuli.

Fig. 2
figure 2

Examples of different feature types. Left panel: Examples of two experiments utilizing between-dimension stimuli which are separable. The stimuli are between-dimension as they incorporate features which would be located on separate feature maps (namely, color and line position). The stimuli are separable as each dimension can be easily identified in isolation from the other (i.e., the color is red, and the line is to the left). Both of these characteristics hold whether the stimuli are co-located (bottom), or separated in space (top). Middle panel: Examples of stimuli which are within-dimension. These stimuli are within-dimension as they incorporate features which would be located on the same feature map (in these examples, luminance and orientation). By definition, these cannot be overlapped in two-dimensional space. Right panel: An example of an integral stimulus. Color is considered integral as its comprising dimensions (i.e., hue, saturation, and brightness) cannot be readily identified in isolation

Theories of visual attention such as FIT (Treisman & Gelade, 1980) and Guided Search (Wolfe, 1994a, 2007) provide useful insight into how attention operates as a function of which features are present in the visual scene. Although these theories do not explicitly address categorization, they propose that focused visual attention is driven by an early pre-attentive parallel processing stage. In this stage, different visual dimensions (e.g., luminance, orientation, spatial frequency, etc.) are registered separately and only later combined or bound to form a visual representation of an object or visual scene. Guided Search, for example, suggests that different dimensions are represented in separate maps which are summed together to form a master salience map that subsequently guides attention to relevant areas of a visual scene. Because of this guidance, when a target does not share any features with the distractors (e.g., a red square among green squares), search occurs efficiently, and purportedly in parallel. In these cases, the target “pops out”, and search time is generally fast and independent of the number of distractors. If, however, there is not a sufficient difference between the target and the distractors to cause “pop out”, search is not efficient (see e.g., Duncan & Humphreys, 1989) and is instead driven by effortful, attentive processing, which is purportedly more serial in nature (Wolfe, 1994a, 1998, 2007). Importantly, these theories emphasize attention as being driven by the specific features involved in the task, and therefore these tasks are often used to infer how features are processed. Given that we are using within-dimensions features, we may expect that these are combined pre-attentively into the same feature map, which then provides a single signal for decision-making. This would mean decision-making would occur coactively. Independent processing (e.g., serial or parallel) may be more consistent with between-dimension stimuli from different feature maps (indeed, this is what we see in the categorization literature; see e.g., Fifić et al.,, 2010).

To assess the processing of within-dimension features in perceptual categorization, we utilized Systems Factorial Technology (SFT; Townsend & Nozawa, 1995; Little, Altieri, Fifić, & Yang, 2018). Specifically, we investigated stimuli of opposite luminance polarity in separate locations of a visual display in a categorization decision-making task. Several previous studies of visual attention (Duncan & Humphreys, 1989; Wolfe et al., 1990; Mordkoff & Yantis, 1993) have also examined within-dimension features, and we return to these studies in our discussion; we first review related work in perceptual categorization and then present our current experiments.

Dimensional processing in categorization

The diagnosis of processing architecture during decision-making using visual information has been the focus of recent work in categorization (Blunden, Wang, Griffiths, & Little, 2015; Cheng, Moneer, Christie, & Little, 2017; Fifić et al., 2010; Little et al., 2011, 2013; Moneer et al., 2016). This literature draws on theories of visual attention to investigate how information from multi-dimensional sources is integrated during decision-making. The goal of this work is to provide a quantitative and detailed diagnosis of mental architecture using the logical rules modeling framework (Fifić et al., 2010), which complements the SFT analyses (Townsend & Nozawa, 1995; Little et al., 2018) in order to yield patterns of RTs across the entire time course of information processing.

To date, this work has focused on the integration of information from between-dimension features and has been successful in characterizing the underlying architecture of this process using a variety of different dimension types and configurations. Most relevant to the current work is the seminal paper by Fifić et al., (2010) introducing the logical rule models. Fifić et al., (2010) presented participants with two rectangles separated in space, one of which varied in saturation, and the other of which varied in the position of a line contained within the rectangle. Because these dimensions could easily be attended to independently (i.e., because they are separable—see e.g., Garner and Felfoldy, 1970, Garner, 1974—and physically separated), these dimensions could be processed in serial when participants were instructed to do so. Serial processing was also found for spatially separate dimensions when participants were allowed to adopt their own categorization strategies (Little et al., 2011). However, when two separable, between-dimension features were presented in a spatially overlapped fashion (Experiment 2, Little et al., 2011), processing was more parallel.

These findings are interesting as they highlight the importance of spatial configuration and attention in determining the architecture that underlies categorization. Before the introduction of the logical rule models, all models of categorization RT, including the successful exemplar-based random walk model (Nosofsky and Palmeri, 1997), stochastic general recognition theory (GRT; Ashby, 2000), and decision-bound models (Ashby, Boynton, & Lee, 1994; Maddox, 1992; Maddox & Ashby, 1996), assumed that the features of objects were pooled together into single objects (i.e., which we term coactivity). In both of the cases described above, both used separable dimensions, but having features positioned in separate locations induced serial processing, whereas overlapping the features in space produced more parallel processing. This would suggest that feature type is not necessarily a key component affecting processing strategy in categorization decision-making tasks but rather that location of the features is primary. Following this logic, we would expect our within-dimension luminance stimuli to require selective attention to resolve the feature values at each location, and consequently, categorization decision should proceed serially with separate micro-decisions made at each location combined using logical rules to determine the final response.

On the other hand, there are likely additional factors that may determine how features are processed. For instance, not all features can easily be attended to independently or selectively (Garner, 1974; Shepard, 1987; Nosofsky, 1988); these types of features, termed integral features, are thought to be processed holistically or configurally (e.g., hue, saturation, and brightness of colors in the Munsell color space; Lockhead & King, 1977; Nosofsky, 1988). Using the same categorization methodology, Little et al., (2013) found that the categorization of colors varying in saturation and brightness was best described by a coactive process (see also Blunden et al., 2015). Instead of making decisions separately along the brightness dimension and saturation dimension, participants instead pooled information about these dimensions into a single decision-making channel. In contrast to Fifić et al., (2010) and Little et al., (2011), the type of dimension—that is, the fact that the dimensions were integral rather than separable—played a key role in determining the processing architecture.

Integral dimensions necessarily occupy the same spatial location. However, Moneer et al., (2016) investigated categorization of whole-object features, which are features that comprise an entire object, and therefore spatially co-located, but are notionally separable, such as shape and size or shape and color. These whole-object features are between-dimension features in the sense that they would activate different feature maps,Footnote 1 but, despite being comprised of separable features, these features have been traditionally been characterized as integral (Biederman and Checkosky, 1970; Smith & Kilroy, 1979). Using the logical rules paradigm, Moneer et al., (2016) showed that these dimensions elicit independent, multichannel processing (i.e., serial or parallel processing). Similar results have been found with composite faces, which have also been traditionally treated as holistic (Cheng, McCarthy, Wang, Palmeri, & Little, 2018). Hence, the processing of different feature types not only in the same spatial location but comprising the whole object, depends on whether those features are separable or integral.

A natural question arising from these experiments concerns the categorization of within-dimension but spatially separated features of the type which are often used in studies of visual search (e.g., a red pop-out target in a field of green distractors). In the present paper, we use the strong inferential methods to answer this question. Within-dimension stimuli, such as luminance discs, provide an important point of investigation. Likely, because these within-dimension features are presented in separate locations, selective attention will be required to resolve the feature values. Consequently, spatial configuration will play a stronger role in determining processing strategy, and we expect processing to proceed serially as a result. On the other hand, these luminance values would be represented by the same feature map (Treisman & Gelade, 1980) and so it is possible that they would instead be processed coactively. While parallel processing seems somewhat unlikely, it is nonetheless worthy of consideration as a possible means for processing the within-dimension features. For example, in simple redundant target detection of two luminance targets, processing appears to proceed in parallel but with limited capacity (Townsend & Nozawa, 1995; Yang, Little, & Hsu, 2014). Visual attention, stimulus configuration, and dimension type, may all play key roles, and none of the candidate models (serial, parallel, and coactive) can be ruled out a priori. In the remainder of the introduction, we provide a detailed explanation of the categorization decision-making paradigm and logical rule models which we utilize to uncover how within-dimension features are processed.

Logical rules design

In order to differentiate the processing architectures, we utilize a design that provides strong diagnostic contrasts between the predictions of each of the candidate architectures. It is useful to describe these predictions with reference to Fig. 3, which shows a variant of the double factorial design proposed by Fifić et al., (2010). In this design, individual stimuli are comprised of the orthogonal combination of two dimensions, each of which vary over three levels. This generates nine stimuli, each of which comprise different levels of the two dimensions. While the double factorial paradigm has previously been implemented using single items which comprise both dimensions (e.g., halves of a face; Cheng et al., 2018, or two parts of a lamp; Fifić et al.,, 2010) the within-dimension feature stimuli used in the present experiment have different values of the same feature at different respective locations in space. Hence, each stimulus comprises a pair of discs, with each disc varying on three levels of saliency with respect to the background.

Fig. 3
figure 3

Top left panel: Schematic diagram of the stimulus space. Stimuli are comprised of two dimensions (dimension X, the right disk, and dimension Y, the left disk). Each dimension varies on three levels which combine orthogonally to form a nine-item stimulus space. Top right panel: Schematic diagram showing discriminability and category membership. The target category (A) includes positions of high (H) and low (L) salience items and is defined by a conjunctive (“AND”) rule. The contrast category (B) is defined by a disjunctive (“OR”) rule and includes internal (I), external (E), and redundant (R), stimuli. The dotted line represents the decision boundary. Bottom panel: Stimulus space showing example stimuli. Discs are comprised of three different salience levels (1 = Low Luminance, 2 = Medium Luminance, and 3 = High Luminance), for both black (darker than the background) and white (lighter than the background) levels of luminance. Note: each pair of discs forms one stimulus. Stimuli to the right and above of the decision boundary (indicated by the dotted line) belong to the target category. All other stimuli belong to the contrast category

The upper-right quadrant comprises the target category (category A) stimuli. The dotted line represents the category boundary between the target category and the contrast category (category B). Items that lie closer to the category boundary should be more difficult to discriminate (Ashby & Gott, 1988; Nosofsky, 1986); hence, stimulus dimensional values of either high discriminability (H) or low discriminability (L) combine to form four stimuli which are defined by their relative difficulty in discriminability: HH, HL, LH, and LL.

The contrast category stimuli are also identified by their location in the category space. The redundant stimulus, R, satisfies both of the boundary decisions necessary to classify a stimulus as belonging to the contrast category (i.e., it is both to the left and below the decision boundary). The stimuli adjacent to R are termed the interior stimuli, IX and IY, whereas the stimuli at the far edges are termed the exterior stimuli, EX and EY.

In order to correctly classify a target category stimulus, a conjunctive rule on both dimensions must be satisfied. That is, a stimulus must have a value on both dimension X and Y that exceeds the horizontal and vertical decision boundaries, respectively. Specifically, the luminance of both disks must have a value of 2 (medium luminance) or higher. Hence, stimuli from the target category must be processed exhaustively (both dimensions must be processed before a correct decision can be made). Stimuli belonging to the contrast category can be correctly classified using a disjunctive rule (i.e., stimuli need only be below or to the left of the horizontal and vertical decision boundaries, respectively).

Note that these rules, used to instantiate the categories in the task, do not presume any sort of processing architecture. The fact that both dimensions must be processed exhaustively to correctly classify a target category stimulus, does not preclude this processing from being carried out one dimension at a time in serial, or simultaneously in parallel, or indeed pooled into a single integrated percept. The following section describes how the predictions from each of these models (and combinations of stopping rules) varies across both categories. The important point to note is that the model predicts that processing will be exhaustive for the target category when responding is correct, but processing may be self-terminating or exhaustive for the contrast category. Of course, a participant might self-terminate and still be accurate when responding to a contrast category item.

Target category predictions

Discriminating target category discs that are close to the decision boundary should be slower than discriminating discs that are further away (Ashby & Gott, 1988). The way in which these discriminations are combined varies for each model, and qualitatively different mean RTs are predicted by each model architecture for the target category stimuli (shown in the left panel of Fig. 4). These RTs are readily summarized by the mean interaction contrast (MIC). The MIC is calculated by finding the difference between the difference of the low and high discriminability values on one dimension and the difference of the low and high discriminability values on the other dimension:

$$ MIC = (RT_{LL} - RT_{LH}) - (RT_{HL} - RT_{HH}) $$
(1)

Serial models predict an additive pattern of mean RTs (MIC = 0), parallel models predict an under-additive pattern of mean RTs (MIC < 0), and coactive models predict an over-additive pattern of mean RTs (MIC > 0; Townsend & Nozawa, 1995). A brief explanation for these predictions is outlined below; however, for a more detailed outline, please see Fifić et al., (2010).

Fig. 4
figure 4figure 4

Illustrative RT predictions for each mental architecture. These predictions were generated from simulations but are proven to hold under mild assumptions (Townsend and Nozawa, 1995). Left panel: predictions for target category, category A. Right panel: predictions for contrast category, category B. Each row represents one of the candidate architectures. D1 = First processed dimension. D2 = Second processed dimension. EBRW = Exemplar-based random walk

Serial models predict an additive pattern as both LH and HL items will show some slowing relative to the HH item due to their lower discriminability on one of the dimensions. The increase of RT for the LL item compared to the HH item is simply the sum of the individual sources of slowing. Parallel models predict an under-additive pattern because the RTs for the target category items are determined by the slower of the two decisions (i.e., the maximum processing time). The LH and HL stimuli will be therefore much slower than the HH stimuli. The LL stimulus will be only slightly slower than either the LH or HL stimuli. Finally, Townsend and Nozawa (1995) provide a mathematical proof demonstrating coactive models result in an over-additive pattern of results. This finding has been corroborated by simulations done by Fifić, Nosofsky, and Townsend (2008a). The predictions outlined above are non-parametric in that they do not depend on the particular forms of the RT distributions; hence, the qualitative contrasts apply to the entire class of serial, parallel, and coactive models.

Further diagnostic evidence for processing architecture from the target category can be found by calculating the survivor interaction contrast (SIC). The SIC is calculated using the survivor function for each stimulus, at each time value, t:

$$ SIC(t) = [S_{LL}(t) - S_{LH}(t)] - [S_{HL}(t) - S_{HH}(t)] $$
(2)

where the survivor function, S(t), is the complement of the cumulative distribution function, F(t), and represents the probability that a response has not been made by time, t.

Different mental architectures also produce qualitatively distinct predictions for the SIC (Townsend & Nozawa, 1995; see Fig. 5) when using an exhaustive stopping rule (as is necessary for correct responses in the target category). Serial models predict an initially negative function which becomes positive, with the entire function integrating to zero (i.e., the MIC equals zero). Parallel models predict an entirely negative function. Coactive models predict an initial negative blip, with the majority of the SIC being positive. Coactive models do not integrate to zero but rather integrate to a positive value (note that since the target category uses an AND rule, all of the serial and parallel models, including those with a self-terminating rule, must predict an exhaustive SIC as shown in Fig. 5).Footnote 2

Fig. 5
figure 5

Schematic of the survivor interaction contrast (SIC) predictions for serial, parallel, and coactive architectures

Contrast category predictions

The contrast category stimuli can also be used to make diagnostic judgments using mean RTs (Fifić et al., 2010; see also Little, Eidels, Fific, & Wang, 2015, 2017). Illustrative predictions for each model are shown in Fig. 4. For example, consider the predictions of a fixed-order serial self-terminating model, in which dimension x is processed first followed by, if necessary, dimension y (for reference see Fig. 3, left panel). The presentation of items which satisfy the disjunctive contrast category rule on dimension x (i.e., x0y0, x0y1, x0y2) will lead to a decision without further processing. If, however, x1y0 or x2y0 are presented and x is processed first, then further processing of dimension y is necessary in order to make a correct categorization decision. This leads to a general prediction that RTs for the first processed dimension will be approximately equivalent, whereas RTs for the second processed dimension are comparatively slower (as they need to wait for the completion of the first processed dimension). Further, the exterior item (x2y0) is processed faster compared to the interior item (x1y0) because the first processed x dimension for the exterior stimulus is further from the decision boundary and therefore easier to judge as not belonging to the contrast category.

For mixed-order self-terminating models, the first-processed dimension may be dimension x on some trials and dimension y on other trials. It follows that the redundant stimulus has the greatest processing advantage, as both dimensions satisfy the disjunctive decision rule. Further, these models predict that exterior items will be processed faster than interior items on both dimensions. Averaged across trials, where participants switch from one dimension to the other, the time preceding the switch is shorter for exterior items.

For a parallel self-terminating model, the RT for a contrast category item is determined by the minimum processing time needed to make a categorization decision. More specifically, in the current task, the processing time will be determined by the dimension which yields a contrast category response. The redundant stimulus therefore has a processing advantage in this case as both dimensions yield contrast category responses, allowing for statistical facilitation between the dimensions (Raab, 1962).

In the case of exhaustive models, both dimensions are processed regardless of whether the disjunctive rule is satisfied or not. For a serial exhaustive model, this means that total RT comprises the sum of RTs on each single dimension. This leads to a prediction that the redundant stimulus will have the slowest RT (as it lies closest to both decision boundaries), with interior items being processed slower than exterior items. The parallel exhaustive model predicts that the total RT is determined by the maximum processing time needed to make a categorization decision. Again, redundant and interior items are slower as they lie closer to the decision boundaries compared to the exterior items.

Finally, a general prediction of the coactive architecture is that the interior stimuli will be processed faster than the exterior stimuli. The intuition is that the interior items are located closer to the left-most corner of the decision space, and therefore when the dimensions are pooled, interior items pool more evidence for a contrast category response compared to exterior items (where the evidence for one of the pooled dimensions is comparatively more diagnostic of a target category response).

This design offers considerable diagnosticity as data from each item can be used to differentiate architectures and associated stopping rules. As the analysis is non-parametric it does not rely on assumptions regarding the underlying probability distributions of the RTs. Further inferences are provided by formally instantiating each of the candidate architectures in a parametric RT model. In this model, the representation of each stimulus is instantiated as a multivariate signal detection model (e.g., using General Recognition Theory; Ashby & Townsend, 1986). This representation is used to derive rates of accumulation which are passed to a sequential sampling model (or pair of models for the serial and parallel models) to model the RT. These models, described fully below, allow us to account for correct and error RTs across all of the items simultaneously.

Current study

In the present study, we sought to determine the processing architecture underlying categorization decisions about luminance discs of different polarity in different spatial locations (e.g., lighter on the left, darker on the right). Requiring the discs to have different polarity ensured we would be able to manipulate the brightness of each disc independently without participants relying on the overall brightness of the display. However, due to the nature of the discriminability manipulation, it is possible that participants could make categorization decisions based on the overall contrast polarity (see Fig. 3); that is, the difference between the left and right disc luminance is greater for the target category stimuli compared to the contrast category stimuli. Consequently, a correct target category decision could be made based on high contrast polarity and a correct contrast category decision could be made based on a low contrast polarity. As we were interested in how participants made decisions based on individual luminance levels, rather than an overall evaluation of the contrast of the presented stimuli, we included a set of catch trial stimuli. These stimuli have high contrast polarity (now lighter on the right, darker on the left; essentially the reversal of the four target category items) but are associated with a contrast category response. This manipulation ensured participants could not make categorization decisions based on contrast, and instead necessitated that participants use the individual luminance levels of each disc.

We ran two versions of this experiment. In Experiment 1, the screen was divided by a series of boundary discs whose luminance levels were randomly sampled from the possible luminance levels of all discs in the category space. This was done to reduce the probability that items were grouped into a single object. These discs were removed in Experiment 2. Experiment 2, therefore, provides an almost direct replication of Experiment 1 but with a single minor methodological difference.

Experiments

General method

Participants

For Experiments 1 and 2, respectively, seven participants from the University of Melbourne community (14 in total, ten females and four males, aged between 19 and 26 years) with normal or corrected-to-normal vision completed the study. Participants were recruited via advertising placed on notice boards within the Melbourne School of Psychological Sciences and through the school’s online recruiting system. All participants were naïve to the purpose of the experiment. Participants provided informed consent and were reimbursed $10 per session, plus an extra $3 bonus for accuracy within a session greater than 90%. The participants from Experiment 1 are referred to as B1 - B7 (B denoting “boundary”), and participants from Experiment 2 are referred to as NB1 - NB7 (NB denoting “no boundary”). Testing was approved by the Melbourne Human Research Ethics Committee (Approval Number 1034866).

As we were interested in the individual-level decision mechanisms, we adopted an expert observer paradigm in which each observer acted as an independent replication of the experiment (see e.g., Little & Smith, 2018; Normand, 2016). Following relevant precedents, we collected a large number of trials for each individual item (N ≈ 300) in order to estimate the RT distribution for each item. As demonstrated in Smith and Little (2018), this approach has considerable advantages over traditional group designs which tend to be underpowered. Our goal is therefore not to estimate a population level parameter but is to test the predictions of each of the models.

Stimuli and apparatus

Illustrative examples of the stimuli used in Experiments 1 and 2 are shown in Fig. 3. Stimuli were presented at a monitor resolution of 1280 × 1024 and participants viewed the screen at a distance of approximately 60 cm. Stimuli were nine sets of two discs of different luminance levels presented on a gray background (RGB color space values [128 128 128]). The discs subtended a visual angle of 1.91 with centers 11.34 of visual angle to the left and right of fixation (the center of the screen).

For each stimulus, there was a white disc on the left and a black disc on the right. The specific level of luminance was varied. The set of stimuli was created by orthogonally combining the luminance level of the left disc and the luminance level of the right disc. These follow the logical rules design introduced by Fifić et al. (2010), whereby the discriminability manipulation was achieved by varying the luminance level of both discs by three possible increments in comparison to the background.Footnote 3 RGB coordinates were as follows for each salience manipulation: High salience black: [64 64 64], mid salience black: [88 88 88], low salience black: [112 112 112], high salience white: [200 200 200], mid salience white: [178 178 178], and low salience white: [156 156 156]. An additional four pairs of discs were added to act as catch trial stimuli. For these discs, the contrast was an orthogonal combination of high or mid salience black and white; however, the left disc was now darker than the background, and the right disc was lighter than the background (i.e., the contrasts were reversed compared to the primary experimental stimuli).

In Experiment 1, the screen was divided by a boundary of 29 discs (also subtended at a visual angle of 1.91 at 60-cm viewing distance) presented as a central vertical column. The luminance values of the boundary discs were randomized from trial to trial using six possible RGB color space values (drawn from the values used to implement the salience manipulation). This boundary was removed in Experiment 2. All other aspects were the same. RTs for categorization were collected using a calibrated RT box (Li, Liang, Kleiner, & Lu, 2010).

Procedure

For Experiment 1, two participants completed five one-hour sessions of categorization on consecutive or near consecutive days.Footnote 4 The remaining participants from Experiments 1 and 2 completed six sessions. At the beginning of the task, participants were shown experimental instructions, as well as an example of the stimuli. Each session consisted of 867 trials (17 practice trials and 850 experimental trials). The contrast category stimuli (nine in total, including the catch trials) were presented five times per block, and the target category items (four in total) were presented ten times per block (i.e., 85 trials per block). This was done to minimize the development of a response bias for contrast category items. All stimuli were presented in randomized order within blocks. In between each block, participants were shown their percent correct on the current block and given the option to take a short break. During each trial, a fixation cross was presented for 1500 ms. A stimulus was then presented and participants were asked to decide whether the stimulus belonged to either category A or category B. Stimuli were presented for 5000 ms or until a response was made. Feedback was presented for incorrect responses. For responses greater than 5000 ms, the feedback “Too Slow” was presented and the trial was removed from the analysis. An example of a single trial for both experiments (with and without boundary discs) is shown in Fig. 6.

Fig. 6
figure 6

First panel: Trial order for Experiment 1. Second panel: Trial order for Experiment 2. The repeated fixation cross indicates the start of the subsequent trial

Data analysis

To analyze the target category, we focused on individual participant ANOVAs, using the interaction effect in a 2 × 2 factor design to assess the MIC. We used a series of planned t-tests to assess the pattern of contrast category RTs. These analyses follow relevant precedents (Fifić et al., 2010; Little et al., 2011, 2013), and allow us to make inferences about the specific processing patterns for each individual participant. Two shortcomings of this method are apparent. First, each analysis only considers a subset of the data. That is, even though the models make predictions across all of the items, an ANOVA across all nine items would be unwieldy and difficult to interpret. Second, each analysis only considers correct RTs. Although accuracy is high for most participants, a more complete analysis would also take into account patterns of error RTs. In order to deal with these issues, we complement our statistical analyses with computational model fitting in which we fit parametric instantiations of each of the models of interest (and relevant extensions) to the correct and error RT distributions for all of the items simultaneously. We then use model selection (i.e., the Deviance Information Criterion, DIC; Gelman, Hwang, & Vehtari, 2014) to select the model that provides the best explanation for our data. In summary, our analysis proceeds in two passes. We first use the non-parametric SFT analyses coupled with statistical tests to rule out specific models; we then instantiate the remaining models parametrically and fit them to the data comparing how well each fits the data taking into account the complexity of the model.

Experiment 1

Results

For all participants, the first session was considered practice and was excluded from further analysis. This was done to ensure participants had appropriately learned the categories and had developed a stable categorization strategy. Additionally, RTs less than 200 ms or greater than 3000 ms were excluded. These cut-offs are commonly used in the RT literature (see e.g., Donkin, Brown, & Heathcote, 2011). Less than 1% of trials in total were removed using this method. Mean correct RTs, mean error RTs, and error rates are presented in Table 1. Error rates tended to be low across all participants excepting in some cases for the LL, EX, EY, and IY stimuli. While the analyses of SFT assume perfect accuracy (Townsend & Nozawa, 1995), Townsend and Wenger (2004) support the robustness of the SIC functions up to an error rate of 30%, which is much higher than that usually seen in studies utilizing the double factorial paradigm. The simulations of Fifić et al. (2008a) further demonstrate that estimates of the SIC are robust to violations of this assumption.

Table 1 Observed mean correct and error RTs (ms), and error rates for individual stimuli for each participant in Experiment 1

In order to interpret the SIC functions, it is necessary that the Survivor functions are ordered such that SHH(t) ≤ SHL(t) ≈ SLH(t) ≤ SLL(t) with the strict inequality holding for at least one time point (Townsend & Nozawa, 1995). A series of Kolmogorov–Smirnov (KS) tests (Houpt, Blaha, McIntire, Havig, & Townsend, 2013) were used to check that each participant’s survivor functions followed this ordering. If the assumption of stochastic dominance holds, the first four columns of Table 11 should be significant, whereas the last four should not. No violations of stochastic dominance were found for this experiment (see Appendix A, Table 11). Survivor functions are shown in Appendix A, Fig. 14.

Target category

Figure 7 shows the mean RTs and corresponding MICs. All the MICs are near zero but in the positive direction.

Fig. 7
figure 7

Observed target category mean RTs and MICs for individual participants in Experiment 1. The two left-hand points represent low discriminability on the left disc and the two right-hand points represent high discriminability on the left disc. The solid line represents low discriminability on the right disc, and the dotted line represents high discriminability on the right disc. Error bars represent one standard error

To analyze the target category RTs, we conducted a series of 5 (sessions: 2-6) × 2 (left disc: L or H) × 2 (right disc: L or H) ANOVAs on the Target Category RTs for each individual participant (see Table 2).Footnote 5

Table 2 Target category statistical results for individual participants in Experiment 1

We first summarize the results which were common across all or most participants:

  1. 1.

    There was a main effect of session, indicating that RTs become faster over the course of the experiment.

  2. 2.

    There was a significant main effect of disc discriminability for both discs indicating that the discriminability manipulation was effective.

  3. 3.

    For some participants, session interacted with one or both of the dimensions indicating for some sessions the left disc was processed faster than the right and vice versa.

  4. 4.

    The three-way interaction was not significant, indicating a stable relationship between target category items across sessions. That is, participants were not changing processing strategy from session to session.

  5. 5.

    With the exception of participant B6, the Left Disc × Right Disc interaction (see Fig. 7) was not significantly different from zero. Although the MIC was positive, a non-significant interaction is consistent with serial processing. The test of the interaction in the present case is a test of the point prediction of the serial model, which predicts that the interaction term should equal zero (cf. Sternberg, 1969). This presents a different goal to the typical null hypothesis significance testing case, where the goal of the significance cut-off is to place some criteria on the false-positive rate. In the present case, an alpha criterion of .05 is biased toward the serial model (Fox & Houpt, 2016). Consequently, caution must be taken when interpreting a non-significant result in this context.

Figure 8 shows the SICs. When considering the model predictions presented in Fig. 5, it can be seen that in all cases the SICs have a large positive portion, ruling out parallel processing for target category stimuli. Generally, the SICs also appear to have a greater positive region than negative region. This is most consistent with a coactive pattern of results.

Fig. 8
figure 8

Observed target category SICs (red line) for individual participants in Experiment 1. Blue lines represent 95% bootstrapped confidence intervals

Using two one-sided KS-Tests from Houpt’s (2013) SFT analysis package, we also sought to determine whether the positive and negative portions of the SICs were significantly different to zero. Two null-hypothesis tests were performed: one which determines whether the largest value of the SIC is significantly greater than zero (D+) and one which determines whether the lowest value of the SIC is significantly lower than zero (D-; see Houpt & Townsend, 2010). Like the MIC, the null hypothesis for the Houpt–Townsend statistic is SIC(t) = 0 for all times t, a conservative significance level biases the test toward retaining the null hypothesis (i.e., a serial model). We therefore adopted a less conservative cut-off of α = .33. This value has been shown to work well in model recovery tests using this statistic (Fox & Houpt, 2016). Both positive and negative D-tests are displayed in Table 3.

Table 3 Directional KS-tests for individual participants in Experiment 1

For most participants, the positive deflection in the SIC is significantly greater than zero, whereas the negative deflection was not. This provides support for the coactive processing architecture. For B3, however, both positive and negative values were significant, which is indicative of either serial processing or coactive processing. For B5 neither value was significant.

Contrast category

The mean RTs for the contrast category are displayed in Fig. 9. For the majority of participants the interior stimulus was faster than the exterior stimulus, on at least one of the dimensions, which is suggestive of coactivity. No other model predicts a faster interior item compared to an exterior item on any dimension. Consequently, coactive processing provides a potential explanation for the contrast category items for most of the participants. Nonetheless, participant B1, despite having a faster interior compared to exterior item on one dimension, shows the reverse pattern on the other, which could be indicative of serial processing. For B4, the RTs seem more consistent with a fixed-order serial model, with the interior item being slower on one dimension, and the exterior and interior being approximately equal on the other. Further, for B5 both interior items were slower than exterior items, suggesting mixed-order serial self-terminating processing.

Fig. 9
figure 9

Observed contrast category mean RTs and MICs for individual participants in Experiment 1. Error bars represent one standard error. R = redundant stimulus, I = interior stimulus, E = exterior stimulus

For contrast category items, we conducted a series of planned t tests comparing interior and exterior items on both dimensions and comparing the redundant stimulus to the other contrast category items (see Table 4). Except for two instances, the redundant stimulus was processed significantly faster than the other stimuli, providing evidence against an exhaustive stopping rule for the contrast category.Footnote 6

Table 4 Contrast category statistical results for individual participants in Experiment 1

Although a smaller mean RT was recorded for the interior items compared to the exterior items for the majority of participants, as expected under coactive processing, this pattern was only significant for B2 and B7 for the right dimension and B3 for the left dimension. Nonetheless, this pattern of faster interior than exterior items is not predicted by any other model. However, parallel processing cannot be ruled out since that model predicts no significant difference between the interior and exterior items. For B4 and B5, the exterior items were faster than the interior items, suggesting serial processing. However, again, this pattern was only significant for B5 on the right dimension.

Discussion

Taken together, the target category results for the majority of participants tend towards coactivity. While the MICs are positive for all participants except B5, the non-significant interaction between left and right dimensions in the target category is consistent with serial processing for all participants. When coupled with the SICs, however, a clearer pattern of coactivity emerges. First, the SICs all have a greater positive portion than negative portion which rules out parallel processing. Further, the positive portion of the SIC appears greater than the negative portion, suggesting coactive, rather than serial processing. This interpretation is supported by the directional KS-tests for most participants (excluding B3 and B5). Nonetheless, the target category data do not clearly rule-out serial processing.

Generally, the contrast category results also somewhat point to coactivity with the interior item being faster than the exterior item at least on one dimension for four of the seven participants. For three participants (B1, B4 and B5), an interior item was slower than an exterior item, which could indicate serial processing (although B1 also shows the opposite pattern on the other dimension, which is more indicative of coactive processing). We defer further discussion of these results until after the presentation of Experiment 2.

Experiment 2

Results

In Experiment 2, we removed the column of discs with randomly varying luminance that separated the left and right targets. Again, for all participants, the first session was considered practice and was excluded from further analysis. Additionally, RTs less than 200 ms or greater than 3000 ms were excluded. Less than 1% of trials in total were removed using this method. Mean correct RTs, mean error RTs, and error rates are presented in Table 5. Error rates tended to be low across all participants except for the LL and EY stimuli in some cases.

Table 5 Observed mean correct and error RT (ms) and error rate for individual stimuli for each participant in Experiment 2

A series of KS-tests (Houpt et al., 2013) were used to check that each participant’s survivor functions were ordered to allow interpretation of the SFT analyses (Townsend & Nozawa, 1995, see Table 6). The survivor functions are shown in Appendix B, Fig. 15. Although the assumption held for most participants, there are some notable violations. Namely, for NB3, the HH stimulus is not significantly higher than the HL and LH stimuli, and for NB7, the LH stimulus is not significantly greater than the LL stimulus. Consequently, we omitted these participants from further analysis.

Table 6 p-values from KS-tests of stochastic dominance in Experiment 2

Target category

Figure 10 shows the mean RTs and corresponding MICs. Except for NB5, all the MICs are positive which is indicative of coactive processing.

Fig. 10
figure 10

Observed target category mean RTs and MICs for individual participants in Experiment 2. The two left-hand points represent low discriminability on the left disc and the two right-hand points represent high discriminability on the left disc. The solid line represents low discriminability on the right disc, and the dotted line represents high discriminability on the right disc. Error bars represent standard error

To analyze the target category RTs, we again conducted a series of 6 (sessions: 2-6) × 2 (left disc: L or H) × 2 (right disc: L or H) ANOVAs on the Target Category RTs for each individual participant (see Table 7).

Table 7 Target category statistical results for individual participants in Experiment 2

For all participants, the results indicated that:

  1. 1.

    There was a main effect of session, indicating RTs became faster across sessions.

  2. 2.

    There was a significant main effect of disc discriminability for both discs across all participants.

  3. 3.

    For some participants, session interacted with one or both of the dimensions indicating for some sessions the left disc was processed faster than the right and vice versa.

  4. 4.

    The three-way interaction was not significant, indicating a stable relationship between target category items across sessions.

  5. 5.

    The Left × Right interaction was significant for participants NB2, NB4 indicating that the MIC was significantly positive, which supports the inference of coactivity. For all other participants, the non-significant interaction is consistent with serial processing, although again caution must be exercised when interpreting this result (Fox & Houpt, 2016).

Figure 11 shows the SICs. When referenced to the model predictions shown in Fig. 5, the majority of the SICs in Experiment 2 have a large positive area, ruling out parallel processing for the target category stimuli. Further, for these participants, the SICs appear to have a greater positive region than negative region. This is most consistent with a coactive pattern of results.

Fig. 11
figure 11

Observed target category SICs (red line) for individual participants in Experiment 2. Blue lines represent 95% bootstrapped confidence intervals

Using two one-sided KS-tests from Houpt et al., (2013)’s SFT analysis package, we also sought to determine whether the positive and negative portions of the SICs were significantly different to zero. We again adopted a less conservative cut-off of α = .33 to avoid bias toward the serial model; most participants showed significant differences at this level. Both positive and negative D-tests for Experiment 2 are displayed in Table 8.

Table 8 Directional KS-tests for individual participants in Experiment 2

For most participants, the positive deflection was significantly higher than zero, but the negative deflection was not. This provides further support for coactive processing. For NB5, however, both deflections were significant, suggesting serial processing or coactive processing. For NB6 neither the positive nor negative deflection was significant.

Contrast category

The mean RTs for the contrast category are displayed in Fig. 12. For most participants, the interior stimulus is faster than the exterior stimulus on at least one dimension, which supports an inference of coactivity. For NB4 and NB6, the contrast category RTs seem more consistent with a fixed-order serial model, with the interior item being slower on one dimension, and the exterior and interior being approximately equal on the other.

Fig. 12
figure 12

Observed contrast category mean RTs and MICs for individual participants in Experiment 2. Error bars represent standard error. R = redundant stimulus, I = interior stimulus, E = exterior stimulus

For contrast category items, we conducted a series of planned t-tests comparing interior and exterior items on both dimensions, and comparing the redundant stimulus to the other contrast category items (see Table 9). With the exception of one comparison,Footnote 7 the redundant stimulus was processed significantly faster than the other stimuli providing evidence against an exhaustive model for the contrast category.

Table 9 Contrast category statistical results for individual participants in Experiment 2

The interior item was significantly faster than the exterior item on the left dimension for NB5, and on the right dimension for NB1, NB2, and NB5 which is suggestive of coactive processing. For NB4 and NB6, the exterior item was significantly faster than the interior item for the right dimension which is indicative of serial processing for these participants.

Discussion

As was the case with Experiment 1, the target and contrast category results for Experiment 2 support a tentative inference of coactivity for most participants. However, there were differences across participants that make it difficult to clearly infer the architecture based on the non-parametric results. Although the SICs, for the most part tended to be positive and looked like the coactive prediction, the MICs were typically not significant. However, as indicated above, the typical NHST cutoff of .05 is biased toward the serial model, and this result should be interpreted cautiously. For the contrast category, there was substantial variability in the pattern of mean RTs, although we note that the predictions shown in Fig. 4 are only illustrative based on simulations from one set of model parameters. In general, however, a coactive model is the only model under consideration that can predict faster interior than exterior processing (Little et al., 2015; Little & Smith, 2018).Footnote 8 Nevertheless, the nonparametric analyses only take into account a subset of the data. They do not, for instance, account for error rates or error RT distributions, nor do they consider data from both the target category and the contrast category simultaneously. The SFT analyses further require meeting an assumption of stochastic dominance of the target category RTs. For these reasons, we also fit a set of computational models that parametrically instantiate our assumptions for the serial self-terminating, parallel self-terminating, and coactive models.

Computational modeling

We fit the models using differential evolution Markov chain Monte Carlo (Turner, Sederberg, Brown, & Steyvers, 2013). We then compared the models using the Deviance Information Criterion (DIC) which provides an estimate of model fit with a penalty for model complexity (Gelman et al., 2014). The DIC can be thought of as a Bayesian version of a maximum likelihood-based fit statistic like the Akaike Information Criteria (AIC; Akaike, 1974). A detailed discussion on the implementation of the computational models and calculation of the DIC is reported in Appendix C. As the contrast category RTs conclusively rule out exhaustive processing, we focused on fitting the self-terminating version of both serial and parallel models as well as the coactive model. We included the parallel model since, although this model is not supported by the SIC analyses, it cannot be ruled out on the basis of the contrast category results. Because the models do not rely on the assumption of stochastic dominance, we also fit the model to observers NB3 and NB5, who were omitted in our previous analyses. The computational modeling thus provides the strongest test of architecture since it utilizes all of the data including both correct and error RTs across all items simultaneously.

To summarize the models, the serial and parallel models are based on the assumption that observers make independent decisions about stimulus values along each dimension and then combine these using logical operations such as OR and AND; hence, we term these models the logical rule models. Decisions about the values of each of the discs are modeled as independent evidence accumulation processes. In the past, we have used random walk processes to model the RTs (Fifić et al., 2010; Luce, 1986; Ratcliff, 1978). Here we utilize the linear ballistic accumulator (LBA; Brown and Heathcote, 2008). To generate the drift rate for each of the LBA channels, we used GRT (Ashby & Townsend, 1986), which is a multivariate generalization of signal detection theory, and decision-bound theory (Ashby & Gott, 1988). Each stimulus is represented by a bivariate normal distribution representing the variability in the perception of the stimulus from moment to moment. The assumption is that, from moment to moment, samples are drawn from this distribution and used to drive the sequential sampling process. When a sample falls in the target category region, the evidence increases in the accumulator for the target category, and likewise for the contrast category. The LBA approximates this process and provides an efficient method for predicting the decision time for each disc. We generate drift rates for the LBA by integrating the perceptual distributions with respect to the decision boundary within each category region. The decision times for each accumulator are then either, for example, summed for the serial model or used to find the maximum time prediction for the parallel model. For the coactive model, rather than modeling the perceptual distributions independently, the variability of the perception of both discs is modeled as a bivariate normal distribution.

In the coactive model, as for the serial and parallel models, we assume that the means for the left and the right discs are aligned to a grid (as shown, e.g., in Fig. 1). That is, we assume that the technical property of perceptual separability holds (Ashby & Townsend, 1986). To minimize reliance on this assumption, we also fit a highly flexible model in which we freely estimated a drift rate for each stimulus. This model can, for instance, capture coactive patterns which are accompanied by violations of perceptual separability. However, the free drift model should incur a high penalty for complexity. The free drift model is therefore an important comparison model because it includes many existing single-channel categorization models as special cases including the Exemplar-Based Random Walk model (Nosofsky & Palmeri, 1997) and stochastic GRT (Ashby, 2000). The DICs for each individual participant and model are shown in Table 10. The preferred model is the model with the lowest DIC.

Table 10 DIC values for each individual participant and candidate model across both Experiments 1 and 2. Lower values indicate better fit

In both experiments, the coactive model or the more flexible free drift model provided the best fit for most participants. This supports the general conclusion that processing of both discs tended to be pooled into a single coactive channel. The model-based analysis of each individual also provides some characterization of the differences in processing between each individual. Observers B4, NB5, and NB7 were best fit by the parallel self-terminating model while participant NB6 was best fit by the serial self-terminating model. Along with model comparison estimates, it is essential to provide an estimate of how well a model fits the data to ensure that the model is capturing the data well (Heathcote, Brown, & Wagenmakers, 2015). In the interest of space, we present the posterior predictions of the coactive model and for the best fitting models (when this was not the coactive model) for each item along with posterior parameter estimates of the parameters in our supplementary material. In Fig. 13, we show the model predictions for one participant (observer B2) to show that the model does provide a good fit to the data. The Supplementary Material shows that the fits to the other participants are comparable.

Fig. 13
figure 13

Posterior predictions from the coactive model for Observer B2 from Experiment 1. The data are shown as bars. The solid blue line is the posterior mean prediction, and the red lines are draws from the posterior predictive distribution

General discussion

In this paper, we have shown that decisions about spatially separated luminance discs appear to be processed coactively for most individuals (10 of 14). This pattern of coactivity was consistent with non-parametric analyses of MIC and SIC results for the target category in both Experiments 1 and 2 (see Figs. 7 and 8, and Figs. 10 and 11, respectively). This interpretation is supported statistically via the directional KS-Test results for both Experiments 1 and 2 (Houpt & Townsend, 2012). Further, the mean RT results for the contrast category also tended towards a pattern of coactivity (with some individual variation) in both Experiments 1 and 2 (see Figs. 9 and 12, respectively). Finally, although there is individual variation in the data, the computational modeling provides strong evidence that the majority of participants favored a strategy in which information was pooled into a single channel.

Implications for RT theories of categorization

This finding has implications for current RT theories of categorization and, in particular, the growing body of work utilizing the logical rule-based models. First, we add to the literature examining categorization decisions using visual stimuli by demonstrating that within-dimension luminance features are pooled together into a single decision channel. This finding is novel because it is not commensurate with the idea that spatially separated dimensions need to be resolved serially due to limits on the spatial aspects of visual attention as one might conclude from the results of Little et al., (2011). However, a limitation of the current study is the inability to characterize the specific effects of spatial attention and in particular, distinguish between space-based and object-based attention. Further investigation into spatial configuration is therefore an interesting avenue for future research. It should also be noted, that since the stimuli were displayed until a response, at least one participant may have adopted a serial strategy. With brief presentations, we would expect a stronger tendency toward coactivity since this would limit the ability to make several eye saccades which could induce serial processing. We therefore also highlight that pooling of information, while commonly used, may not be mandatory at the time scale of presentation that we used. Given the possible effect of time-scale on processing architecture, investigating decisions which are time limited as well as those which are time unlimited is therefore a worthwhile pursuit for future research

The key point of difference from earlier perceptual categorization work is that features that belong to the same feature type can be pooled across space whereas features of different dimension types need to be attended to independently. In terms of our banana example, this means that you can select the ripest banana by considering the bunch as a whole, but in order to compare both ripeness and price, you would need to attend separately to the price tag and the bunch. Along with recent results by Moneer et al., (2016) and Cheng et al., (2018), we highlight the important role that visual attention plays in perceptual categorization.

Implications for theories of visual attention

Visual search

In visual search, the difference between within-dimension and between-dimension features is central to the efficiency of the search. For instance, visual search for items comprising the conjunction of two between-dimension features (e.g., searching for a red vertical line among distractors which are red horizontal lines and green vertical lines) can still be guided via parallel feature guidance modules. However, guidance in search for feature conjunctions which comprise two within-dimension features (e.g., search for a red-green target among red-blue and blue-green distractors) is not as efficient. While a guidance module may be able to preferentially direct search to all red items, thus reducing the number of items to be searched, this is still not as efficient as search involving conjunctions of different feature types. In a series of experiments, Wolfe et al., (1990) showed that searches for conjunctions which were within-dimension (e.g., color × color and orientation × orientation) in a field of within-dimension distractors, were significantly less efficient than between-dimension searches (e.g., color × orientation) in a field of between-dimension distractors.

It is tempting to link the efficiency (i.e., speed and accuracy) with which the target is found, to the underlying architecture of processing. Indeed there are strong indications that an invariance of RT across set sizes is indicative of parallel processing (Wolfe, 2016; Townsend, 2016). On the other hand, it is well known that the increase in mean RT with increasing set size cannot be taken as an indicator of serial processing. The issue is that a limited capacity parallel model can yield identical inefficient set size functions to the serial model (see Townsend, 1971). Changes in processing due to a change in set size therefore give an indication of capacity (i.e., the efficiency at which a system can process information given varying workloads) which should be considered independently to that of questions of architecture (this concept is covered in further detail in the following section). Our present work uses a factorial combination of item difficulty in order to address the question of architecture directly thereby circumventing methodological issues with other methods (Little, Eidels, Houpt, & Yang, 2017). Although our focus is on perceptual categorization, our results can provide additional insight into other perceptual tasks like simple visual search, where the primary question is whether a target is present or not, instead of is this target a type A or a type B.

A limitation of the current methods, however, is that we are unable to locate the specific stage (i.e., pre-attentively or attentively) in which the pooling of information occurs. Guided Search (Wolfe 1994a, 2007) proposes that feature maps are created and combined into a master salience map at a pre-attentive parallel processing stage and that this map subsequently guides attention. Commensurate with this theory, it seems likely that, given they belong to the same feature-map, within-dimension features could be pooled pre-attentively forming a single signal which drives the decision-making process. However, it could also be that pooling occurs at an attentive stage. Indeed, given that highly similar perceptual operations can yield a variety of experimental results, it may be expected that different tasks such as visual search, identification, categorization, detection might also diverge. Investigating the locus of pooling of information, and more broadly investigating different task types utilizing the same experimental stimuli and ideally, the same participant pool, would be a worthwhile pursuit for developing a unified view of visual perception and cognition. This could be tested within the methodology of SFT but would require manipulations of salience thought to operate solely at the pre-attentive stage.

While in simple visual search the focus is on target presence or absence, for complex visual search tasks, target type may actually be of vital importance. For example, a radiologist may need to conduct a visual search of an x-ray to search for a potential cancerous tumor. This process may involve not only a search for a potential target, but also a categorization decision regarding the tumor. This decision may form a part of the search (does this part of the image constitute a malignant tumor or distracting information) or be somewhat independent from the search process itself (is this tumor, once found, malignant or benign?).

Fifić, Townsend, and Eidels (2008b) have successfully used SFT to investigate the processing architecture of the search process, and whether processing architecture changes under varying experimental manipulations. They investigated search for letters versus non-letters, while also accounting for how feature complexity (number of features per item) and target-distractor similarity may additionally affect processing architecture. Overall, they found that target-distractor similarity (as modulated by feature complexity) rather than linguistic composition changed the processing architecture of the search. They proposed this was likely achieved through positively interacting parallel channels.

The use of SFT in this instance is interesting as it builds on the foundation of work by Duncan and Humphreys (1989) who provide an alternative view to theories incorporating feature maps (Treisman & Gelade, 1980; Wolfe, 1994a, 2007). Their theory focuses on stimulus similarity as a determinant of efficiency in visual search paradigms. Here, search efficiency increases with increased similarity between non-targets, and decreases with increased similarity between targets and non-targets. Duncan and Humphreys (1989) propose that all items in the display are processed in a parallel first stage which provides a structured representation of the input. In this parallel stage, items are thought to be organized in a part-whole structure whereby items which share properties (such as same color, same shape, same motion, or even simply proximity) are linked together via gestalt grouping. This is followed by a selection process whereby the input from the parallel stage is matched against a template of the information needed to complete the search. Finally, information enters into visual short-term memory (VSTM) and thus reaches conscious awareness and allows for a response to be initiated.

As access to VSTM is limited, items must compete for entry. Increasing attentional weight to one structural unit, or gestalt group, must therefore naturally lead to a decrease in attentional weight to another. It is further assumed that items gain and lose attentional weighting together via a process called weight linkage. This means that items which are strongly grouped will tend to be either selected together or efficiently rejected together if they are non-targets which do not match the target template. Thus, search efficiency is determined by two factors: the degree to which targets and non-targets match the target template and the degree of similarity between items allowing for spreading suppression or activation of groups. For example, increasing non-target and target similarity will reduce search efficiency as the weight of each non-target depends on its match to the target template. Decreasing the similarity between non-targets will further reduce search efficiency as the opportunity for spreading suppression is decreased. Indeed, this effect of target-distractor similarity was shown to modulate processing architecture in Fifić et al. (2008b). When also considering the current work in the context of Duncan and Humphreys (1989), the notion that items of the same color would be grouped together as a whole is consistent with our finding of coactivity, however, as their theory does not include a temporal component, or specify an underlying architecture, it could also be seen to be consistent with other accounts.

The question of decision-making, independent from the search process itself, has been investigated by Wolfe et al., (1990) who had participants complete a “search” with a set size of one (i.e., a simple identification task). In contrast to a standard visual search where they found a processing advantage for between-dimension features, they found that there was no difference in RT for identifying whether a color × color conjunction was a target versus a color × orientation conjunction was a target. This suggests that there is no cost to identifying whether or not a conjunction is a target for within-dimension conjunctions versus conjunctions which comprise different features. However, this result alone does not address whether the individual colors in a color × color target are processed independently or not. Our results show that for many individuals, these color × color targets are treated as a single source of information. The current methods could therefore be usefully extended to fully characterizing decision-making in complex visual search task which necessitate target categorization, for example when asking “is this item a gun or a hair-dyer?” in baggage screening (Wolfe, Horowitz, & Kenner, 2005), or “is this tumor malignant or benign?” in cancer screening (Drew, Evans, Võ, Jacobson, & Wolfe, 2013).

The current findings, and SFT more generally, can therefore be used to inform the underlying architecture of decision-making in complex visual search. This applies to a variety of settings including security baggage-screening and visual search of medical images, as well as other complex searches such as foraging studies (Wolfe, 2013), and visual search of natural scenes (Wolfe, 1994b), familiar scenes (Hout & Goldinger, 2010), and in everyday life (where are my wallet and car-keys?; Wolfe, Alvarez, Rosenholtz, Kuzmova, & Sherman, 2011).Footnote 9

Encoding, selection, and VSTM

Within-dimension feature stimuli have also been investigated in studies of visual selection, visual encoding, and VSTM (see e.g., Huang, Treisman, & Pashler, 2007; Mance, Becker, & Liu, 2012; Sewell, Lilburn, & Smith, 2014). However, most models of visual attention do not explicitly speak to how decisions are actually made, and thus the current findings provide an interesting insight for these models. For example, this finding is consistent with Smith and Sewell (2013)’s conceptualization of visual information, perhaps represented in VSTM, feeding into a single diffusion process. In Smith and Sewell (2013)’s multi-stage model of visual attention, different stages of processing may be subject to different capacity limitations and, consequently, governed by different architectures. In their model, although the information retained by VSTM is determined by the selection and encoding stages which occur in parallel, with competitive interaction, the decision-making stage accumulates noisy samples from a single VSTM trace until a criterion is reached. Our coactive model embodies a similar set of assumptions. Smith and Sewell (2013)’s implementation of selection as a competitive parallel process is also consistent with theories of visual search (Wolfe, 1994a, 2007) which propose that attentional selection of color × color conjunction targets cannot occur efficiently as they belong to the same feature map and attention, therefore, cannot be effectively guided to the target location. In Smith and Sewell (2013)’s model, when performing tasks such as visual search, items which contain task-relevant attributes or features excite the “where” pathway of attentional selection and in turn, mutually inhibit each other via competitive interaction. These relevant attributes or features further self-excite, modulated by attention. In a color × color conjunction search task, distractor items also contain task relevant attributes and therefore compete with the target for selection, leading to inefficient search that can be completed only with the application of attention.

Smith and Sewell (2013) state that little is currently known about capacity limitations and the processing architecture of decision-making. The logical rules approach is useful in this regard as it provides a way to investigate decision-making independently of sensory and memory representations. In our task, the stimuli are presented supra-threshold and are available to participants until a response is made. Given this manipulation, sensory and memory load are held constant while only the decisional difficulty is varied factorially. Although our findings do not speak explicitly to items held in VSTM, they are nonetheless consistent with Smith and Sewell (2013)’s analysis of briefly presented, multi-element displays.

While our design is useful in that it holds constant memory load, it does this by not requiring the retention of any information at all, removing the need to encode information and hold it in VSTM. It is consequently difficult to know whether decision-making which is driven by information held in VSTM is equivalent to decision-making using stimuli which remain present in the visual scene. For example, in the current design, participants are able to accumulate information from the available scene until a decision is reached. In VSTM experiments, however, the information needed to make a decision must be accumulated from a potentially imperfect memory representation which could alter the decision strategy used.

We set as a goal for future research to investigate decisions in a VSTM task, again, holding constant sensory and memory load and factorially varying the difficulty of decision-making. Some progress has been made towards this using a SFT change detection paradigm (Yang, 2011; Yang, Hsu, Huang, & Yeh, 2011; Yang, Chang, & Wu, 2013). However, these investigations have only investigated single item displays and have systematically varied sensory information such as relative saliency of changes between dimensions and the probability of changes. In order to accurately isolate the decision-making stage for multi-element displays, environmental cues and stimulus properties that guide attention such as relative saliency of the changes and probability of changes should be held constant. Further, changes in two or more items should be investigated. This is the current focus of our research group.

Relationship to the race model inequality

Several studies (Mordkoff & Danek, 2011; Mordkoff & Yantis, 1993; Poom, 2009) have used an alternative measure, violations of the Race Model Inequality (Miller, 1982), to infer coactive processing of certain visual features and their location in space. Given our inference of coactivity using SFT in the present work, some further discussion of the connection to the RMI is warranted. The RMI is typically measured using a redundant target task (see Egeth & Mordkoff, 1991; Snodgrass & Townsend, 1980). In this task, two locations (e.g., left and right) are monitored for the onset of stimuli or targets. These targets can appear in one of the two locations, as well as both locations, or neither location. In one condition (an OR task), participants are instructed to respond in the affirmative whenever any target is present in either or both locations. RTs are compared between the double target displays (i.e., displays containing a redundant target) and single target displays. An RT advantage for detecting the redundant target could arise solely due to statistical facilitation in an independent parallel race model (i.e., the minimum time expected for the detection of two possible targets is smaller than the detection of any single target alone; Raab, 1962). On the other hand, faster redundant target detection times could also be due to what Miller (1982) refers to as “coactivity”, where both items contribute to a target present response. To distinguish these two accounts (Miller, 1982) formulated the RMI, which provides an upper limit on the speeding of responses which can be accounted for by statistical facilitation. According to Miller (1982) violation of the RMI therefore provides evidence for coactivity.

Mordkoff and Yantis (1993) used a modification of the redundant targets detection task to investigate single objects comprising two between-dimension features (i.e., color and shape), as well as these between-dimension features separated in space. This task is different to detection in that it requires participants to provide a stimulus identification in the presence of distracting information using a go/no-go response. For example in the first experiment, stimuli comprised of green and purple X’s and O’s and participants were required to respond to the presence of the color green, or the letter X (a green X therefore constituted the double target, whereas a purple O required no response). In Experiments 2 and 3, the features were separated in space. For example, in Experiment 3 shape was represented by a white letter presented at fixation, and color was represented as a colored border surrounding this shape. This was further compared in Experiments 4 and 5 to a within-dimension version of the experiment in which two letters or colors separated in space were presented either side of fixation (drawn from a pool of six, two of which were given target designations). Overall, they found a violation of the RMI for all versions of the experiment using between-dimension stimuli (regardless of location), but not for the within-dimension stimuli separated in space. They therefore concluded that decisions requiring the integration of between-dimension occurred coactively, whereas decisions requiring the integration of within-dimension interactions can be explained by statistical facilitation.

However, Feintuch and Cohen (2002), suggested that Mordkoff and Yantis’ (1993) findings could be accounted for by perceptual grouping (Duncan, 1984). That is, the between-dimension items which were thought to coactivate were actually being perceived as part of the same object, whereas items which did not coactivate (i.e., which happened to be within-dimension) were perceived as two separate objects. Feintuch and Cohen (2002) therefore suggested that grouping, rather than feature type was responsible for the coactivation of information in Mordkoff and Yantis’ work. In Feintuch and Cohen’s experiments, between and within-dimension items were presented as both separated in space (e.g., in the between-dimension condition, a color and shape were presented either side of fixation and in the within-dimension condition two colors were presented either side of fixation), and as parts of the same object (achieved simply by drawing an ellipse around the two items). They found that features which were considered as part of a single object (and therefore were contained in a single locus of attention) were processed coactively, regardless of feature type, whereas features which did not comprise a single object were not. This interpretation was later confirmed by Mordkoff and Danek (2011) in a follow up study.

While a cursory consideration of the findings of Feintuch and Cohen (2002) and Mordkoff and Danek (2011) seem counter to the current work, it is important to take a number of factors into account. A violation of the RMI is not isomorphic with the concept of coactivity used in SFT. In SFT, coactivity refers to pooling of information and can be conceptualized as an interactive model with a complete facilitation between decision-making channels. In Miller (1982), Mordkoff and Danek (2011), and Feintuch and Cohen’s (2002) work, coactivity is defined as any process violating the RMI (although pooling of information to form a decision is often also therefore inferred). However, there is now a large body of work showing that the RMI can be violated by models which decidedly do not show complete pooling: serial exhaustive models (Townsend and Nozawa, 1997), models violating the assumptions of the RMI (Cheng et al., 2017; Otto & Mamassian, 2016; Yang, Altieri, & Little, 2018), and for exhaustive models in the presence of distractors (Little et al. 2015, 2017). Consequently, the link between the RMI and architecture is not one-to-one and these concepts should be treated independently.

In SFT, violation of the RMI is instead isomorphic to the concept of super-capacity. In SFT, capacity refers to the efficiency of information processing when workload varies (i.e., the amount of information that needs to be processed increases or decreases; (Townsend & Ashby, 1983; Wenger & Townsend, 2000)). Capacity is also measured using a redundant target detection task. By using the expected minimum processing time of the parallel independent race model as a baseline, performance can be evaluated as either less efficient than expected or more efficient than expected (Townsend & Ashby, 1983; Townsend, Fific, & Neufeld, 2007; Wenger & Townsend, 2000). A violation of the RMI therefore indicates that workload efficiency is more efficient than expected under an independent parallel race model. Super capacity and violations of the RMI have recently been shown to be formally related: the RMI provides a bound on the super capacity expected from an unlimited capacity parallel system (Townsend & Eidels, 2011). The experiments of Mordkoff and Danek (2011) and Feintuch and Cohen (2002) can therefore be better interpreted in the context of the current paper as measures of workload capacity. These papers therefore provide evidence for super capacity processing for between- and within-dimension features comprising a single object, and unlimited-capacity processing for between- and within-dimension features separated in space. However, coactivity in the single object condition is only one explanation for the difference between these conditions.

Caution should also be exercised when interpreting these results as both non-target and single target displays in these experiments necessarily include irrelevant/distracting information and therefore do not constitute a pure single-target as per redundant target detection tasks. Cheng et al., (2018) and Little et al. (2015, 2017) have shown that utilizing capacity (and hence the RMI) in the presence of distractors is not straightforward due to distracting information potentially slowing responses for single target items leading to an overestimation of capacity and which can also lead to violations of the RMI. Nonetheless, Feintuch and Cohen’s (2002) finding that within-dimension stimuli do not violate the RMI warrants further investigation and characterizing the capacity of within-dimension stimuli independent from distracting information would be a worthwhile pursuit for future work.

Measuring the capacity of the current within-dimension stimuli would also be a worthwhile target for future work. The current work assumes channel independence, however, there is a possibility that the luminance level of one disc may affect the processing speed of the other disc. Eidels, Houpt, Altieri, Pei, and Townsend (2011) describe a series of parallel models which allowed for facilitatory and inhibitory interaction between the two channels and showed that a combination of SIC and capacity coefficients could allow for the identification of particular interactive systems. Measuring capacity would therefore allow for an even more comprehensive characterization of processing and would assist further investigations into the possibility of interactions between processing channels in the current work.

Conclusions

In summary, the current study adds to the body of literature investigating perceptual categorization and provides insight into the relationship between visual attention and decision-making when stimuli are comprised of within-dimension features. In particular, we have made the novel finding that within-dimension stimuli in separate locations are processed coactively by most participants, rather than in serial as seen with between-dimension stimuli in separate locations (Little et al., 2011; Fifić et al., 2010). Whether or not this generalizes to non-integral stimuli would be a worthy avenue for future research. The current work further emphasizes areas where theories of visual attention, categorization decision-making, and the logical rule models converge, and highlights avenues for synthesis between these theories. This same focus on featural information as a determinant of information processing is also a cornerstone of Anne Treisman’s work, demonstrating that the influence of her legacy spans well beyond the scope of visual search, and continues to inform other areas of cognitive science.