A long-standing question in cognitive psychology is how people make decisions about stimulus arrays composed of multiple elements. Such decisions are central to many studies of cued visual attention (Dosher & Lu, 2000), near-threshold visual search (Eckstein et al., 2000), visual working memory (Vogel et al., 2006), and are important in other areas as well. In statistical decision terms, an array composed of multiple elements can be thought of as a single multidimensional, multiattribute stimulus, and the question then arises as to how people combine information across the elements, dimensions, or attributes of such a stimulus to make a decision. Decision accuracy in brief, near-threshold visual displays has been modeled successfully using multichannel generalizations of signal detection theory (see Palmer et al., (2000), for a review), but this work leaves open the question of whether the decision model can be extended to account for response time RT as well. There have been some recent successes in modeling RT in suprathreshold visual search tasks using sequential-sampling decision models (Moran et al., 2013; Schwarz and Miller, 2016), but the models involve sequential attention shifts, so they are not just models of the decision process and involve the additional complexity of distinguishing between parallel and serial processes (Dosher et al., 2010; Townsend & Nozawa, 1995; Thornton & Gilden, 2007).

In this article, we describe a new approach to modeling these kinds of decisions that was motivated by the perspective above—that a multielement stimulus array can be thought of as a single multidimensional stimulus whose dimensionality equals the number of elements in the array. To this end, we generalize the circular diffusion model of Smith (2016)—in a way foreshadowed in that article—and represent the decision process as a diffusion process within a hypersphere imbedded in an n-dimensional evidence accumulation space, whose dimensionality equals the number of elements in the display. The circular diffusion model can itself be thought of as a generalization of the diffusion decision model of Ratcliff (1978), which represents two-choice decision-making as diffusion on a line (i.e., a one-dimensional evidence space). Smith introduced the model to try to account for RTs and decision outcomes in continuous-report tasks, which are widely used in studies of visual working memory (van den Berg et al., 2014; Zhang and Luck, 2008). In continuous report tasks, people make decisions about continuously distributed stimuli that are defined in a two-dimensional (2D) feature space (Prinzmetal et al., 1998), and express their decisions using an analog device like a mouse or a trackball. Although Smith’s model was developed to account for decisions that have a continuous range of outcomes (i.e., an infinite choice set), the model can account for categorical decisions as well, by partitioning the outcome space into discrete categories with decision bounds. When equipped with decision bounds in this way, the model can be viewed as a stochastic form of Ashby and Townsend’s (1986) general recognition theory. It is this form of the circular diffusion model that serves as the basis for the models we consider in this article. We do not attempt to elaborate the relationship between the circular diffusion model and general recognition theory here, but the interested reader should be aware of the theoretical links, both with other forms of stochastic general recognition theory (Ashby, 1989, 2000; Townsend et al., 2012) and with work that has extended general recognition theory into more complex spaces (Townsend et al., 2001, 2006).

The double-target detection task

The hyperspherical diffusion model is a general theoretical framework with potential applications in a variety of cognitive settings, several of which we consider subsequently. However, our immediate reason for developing the model was to try to characterize decision-making in tasks like the one shown in Fig. 1. The figure shows stimuli from a recent study of the double-target deficit by Corbett and Smith (2017). The task, which used a paradigm devised by Duncan (1980), required participants to detect digit targets among letter distractors in four-element arrays. The actual stimuli were rendered in a simplified stroke font to minimize orthographic cues that could distinguish between digits and letters and were presented in noise for 200 ms and backwardly masked. The brief exposure duration precluded gaze refixations (Hallett, 1986) and minimized the possibility of any kind of serial search. The digits were drawn from the set 2-9, the letters from the set A, E, G, J, M, P, T, X. In the single-target task, there was either a single digit or no digit and participants responded “target present” or “target absent,” accordingly. In the double-target task, there were either two digits or one digit and participants responded “target present” or “target absent” depending on whether they detected two digits or fewer than two. The double-target deficit is the finding that if people are required to detect a pair of simultaneous (auditory or visual) targets, then their performance is appreciably worse than would be predicted from their performance when detecting either of the targets separately (Duncan, 1980, 1985; Moray, 1970a, 1970b; Sorkin and Pohlmann, 1973; Sorkin et al., 1973, 1976). The double-target deficit has been interpreted as evidence for two-stage theories of search, in which preattentive filtering processes impose a selection bottleneck on the stimuli that enter the decision process (Duncan, 1980; Hoffman, 1979; Smith & Sewell, 2013; Wolfe et al., 1989, 2010). Our interest in this article is not in the double-target deficit per se but in the processes involved in making these kinds of decisions. We therefore wanted to focus on the decision processes in the two forms of the task, uncontaminated, as much as possible, by differences in the preattentive selection processes they engage. To that end, we have chosen to focus on the experiment from among the nine in Corbett and Smith’s study that yielded the smallest double-target deficit. The reader is referred to the original article for further details.

Fig. 1
figure 1

Double-target decision task used by Corbett and Smith (2017, Experiment 4a). In the single-target task, the target was a single digit in an array of letters. On target-absent trials, the array contained four letters. In the double-target task, the target was a pair of digits in an array of letters. On target-absent trials, the array contained three letters and one digit

One way in which one could go about developing a response time model for the task in Fig. 1 is to consider sequential-sampling generalizations of the signal detection models that have previously been used to characterize accuracy in such tasks. The two main classes of signal detection model that have been influential in this area are independent detectors models and integration models (which includes likelihood ratio models) (Eckstein et al., 2000; Kinchala, 1974; Palmer et al., 1993, 2000; Shaw, 1980, 1982; Shaw et al., 1983; Shimozaki et al., 2003; Smith, 1998; Verghese, 2001). Independent detector models assume that separate decision mechanisms identify the stimulus at each location. The outputs of the mechanisms are then combined by Boolean operators to implement the appropriate decision rule. Integration models assume the contents of each array location are summed or pooled (using either a linear or a nonlinear combination rule) to create a single decision variable that expresses the contents of the array as a whole.Footnote 1Corbett and Smith (2017) found that both an independent detectors model and an integration model (in likelihood ratio form) provided a satisfactory account of response accuracy in the nine experiments in their study.

In an investigation that complements the present one, Corbett and Smith (submitted) considered sequential-sampling generalizations of these two models in which the component decision processes are represented by diffusion processes (Ratcliff, 1978; Ratcliff & McKoon, 2008). They focused specifically on the single-target task and, in agreement with Corbett and Smith (2017), found that either of two models could provide a fairly good account of the distributions of response times and choice probabilities in four different decision tasks investigated by Corbett and Smith (2017). One model assumed that evidence about the contents of the four locations was pooled and accumulated by a single diffusion process. The other model assumed that decisions about the contents of the four locations were made by four independent diffusion processes whose outputs are combined to implement the decision rule. We chose to investigate the hyperspherical diffusion model as an alternative decision model for these tasks both because of its general theoretical interest as a model for multiattribute decision-making and because, to date, we have not been able to find a model for response times in the double-target task using either the single or the multiple diffusion model that accounts for the data with a consistent and interpretable set of parameters.

Decision-making as diffusion in an (n − 1)-sphere in \(\mathbb {R}^{n}\)

Figure 2 shows the circular diffusion model of Smith (2016), its relationship to the diffusion model of Ratcliff (1978), and the way in which we generalize the circular model to higher dimensions in this article. The model conceptualizes decision-making as a process of noisy evidence accumulation in an n-dimensional Euclidean evidence space, \(\phantom {\dot {i}\!}\mathbb {R}^{n}\), to a decision boundary or response criterion. The evidence accumulation process is represented by an n-dimensional (vector-valued) Wiener diffusion process, \(\phantom {\dot {i}\!}\textbf {X}_{t}\), with independent components, \(\phantom {\dot {i}\!}({X^{1}_{t}}, {X_{t}^{2}}, \ldots , {X_{t}^{n}})\), all with infinitesimal standard deviation, \(\phantom {\dot {i}\!}\sigma \). In this notation, the subscript t denotes time and the superscripts \(\phantom {\dot {i}\!}1, 2, \ldots \) index the coordinates of the process; they are not powers. In the application to the double-target detection task in Fig. 1, the evidence space is four-dimensional, reflecting the number of noisy elements in the stimulus space. The decision criterion or decision boundary, which represents the evidence needed for a response, is an \(\phantom {\dot {i}\!}(n-1)\)-dimensional hypersphere, \(\phantom {\dot {i}\!}S^{(n-1)}\), of radius a, imbedded in the n-dimensional Euclidean evidence space, \(\phantom {\dot {i}\!}\mathbb {R}^{n}\). In the circular model, the decision criterion is a circle, or 1-sphere, \(\phantom {\dot {i}\!}S^{1}\). In the 3D generalization of the model shown in Fig. 2c, the decision criterion is an “ordinary” sphere, \(\phantom {\dot {i}\!}S^{2}\), while in Ratcliff’s model the decision criteria are an isolated pair of points, a and \(\phantom {\dot {i}\!}-a\), which may be viewed as a 0-sphere, \(\phantom {\dot {i}\!}S^{0}\). (This parameterization differs from that of Ratcliff who uses the “gambler’s ruin” form of the diffusion process (Feller, 1967), in which the upper bound is set to a and the lower bound set to zero.)

Fig. 2
figure 2

Decision-making as evidence accumulation in an (n − 1)-sphere in \(\mathbb {R}^{n}\). a Decision-making as diffusion on a line (Ratcliff, 1978). Evidence accumulation is modeled as a one-dimensional diffusion process on the real line, \(\mathbb {R}^{1}\) (the vertical axis), between absorbing boundaries (decision criteria) at − a and a. The isolated points a and − a are a zero-sphere, S0. The horizontal axis represents time. b Decision-making as diffusion in a 1-sphere, S1 (Smith, 2016). Evidence accumulation is modeled as a two-dimensional diffusion process on the interior of a disk of radius a, whose bounding circle represents the decision criterion. The process \(\textbf {X}_{t} = ({X^{1}_{t}}, {X_{t}^{2}})\) consists of two independent components, both with infinitesimal standard deviation σ. The drift rate, μ = (μ1, μ2), is vector-valued, with length, or norm, ∥μ∥ and phase angle 𝜃μ. The phase angle specifies the direction of the drift vector in polar coordinates. A decision is made when the process Xt reaches the criterion. The decision time, Ta, is the time to reach the boundary and the response, X𝜃, is the hitting point on the boundary. c Diffusion in a two-sphere, S2, in \(\mathbb {R}^{3}\). The process \(\textbf {X}_{t} = ({X^{1}_{t}}, {X_{t}^{2}}, {X_{t}^{3}})\) consists of three independent components, all with infinitesimal standard deviation σ. The drift rate is μ = (μ1, μ2, μ3)

Figure 2a shows the main elements of Ratcliff’s (1978) diffusion model, which we generalize here. We have used the terminology of signal detection theory in the figure and denoted the two classes of stimuli as “signal” and “noise.” The model assumes that the information in the stimulus is encoded in the drift rate of the diffusion process, which determines the average rate at which evidence accumulates. The sign of the drift rate depends on the identity of the stimulus (positive for signals and negative for noise) and the magnitude of the drift rate determines the quality of the encoded information on a given trial. Evidence accumulation begins at stimulus onset and continues until the process hits one of the two decision boundaries. The first boundary hit determines the response—“signal” at the upper boundary and “noise” at the lower boundary—and the time taken to hit the boundary determines the decision time component of RT. The time taken to first hit a boundary and the probability of hitting the upper boundary before the lower boundary depend on the sign and magnitude of the drift rate and the noise in the evidence accumulation process, which in turn depends on the infinitesimal standard deviation, \(\phantom {\dot {i}\!}\sigma \). Because of noise in the accumulation process, on some proportion of trials the process may terminate at the wrong boundary, and an error will be made. As Fig. 2a depicts, the model can predict choice probabilities (response accuracy) and distributions of RTs for correct responses and errors. In addition to the elements shown in Fig. 2a, the standard form of the model contains components of processing that represent various kinds of decision bias, which we discuss later. The properties of the model have been described in many places, including Ratcliff and McKoon (2008), Ratcliff and Smith (2004, 2015), and Ratcliff et al. (2016), and Smith and Ratcliff (2015) .

Figure 2b shows the two-dimensional (2D) generalization of the model of Smith (2016). Unlike the standard model, the circular model assumes that stimuli are defined in a 2D evidence space. The dimensions represent features of a stimulus that can vary independently, such as color, orientation, or spatial frequency and phase, or any pair of psychologically relevant feature dimensions of a decision task. The drift rate is then vector valued, \(\phantom {\dot {i}\!}\boldsymbol {\mu }\), with components \(\phantom {\dot {i}\!}\mu _{1}\) and \(\phantom {\dot {i}\!}\mu _{2}\), which represent the values of the two features. Although it is not conceptually necessary, as is discussed in the next section there are powerful analytic advantages to be gained by making a strong symmetry assumption and assuming that the evidence accumulation process starts at the center of the circle, \(\phantom {\dot {i}\!}S^{1}\), with radius a.

As in the standard model, evidence accumulation begins at stimulus onset and continues until the process hits a point on the decision boundary. The time taken to reach the boundary determines the decision time and the point at which it hits the boundary determines the response. The latter is continuously valued on the range \(\phantom {\dot {i}\!}(0,2\pi )\). The decision time, which we denote by \(\phantom {\dot {i}\!}T_{a}\), is the time for the Euclidean norm of the process, \(\phantom {\dot {i}\!}\sqrt {{\sum }_{i = 1}^{2} ({X_{t}^{i}})^{2}}\), to first equal a. The time \(\phantom {\dot {i}\!}T_{a}\) is referred to as the first passage time of the process through the boundary a. Because of the circular symmetry of the process it is more convenient to work in polar coordinates rather than Cartesian coordinates. In polar coordinates, the drift rate is defined by the drift norm, \(\phantom {\dot {i}\!}\|\boldsymbol {\mu }\| = \sqrt {{\mu _{1}^{2}} + {\mu _{2}^{2}}}\), and phase angle, \(\phantom {\dot {i}\!}\theta _{\boldsymbol {\mu }} = \arctan (\mu _{2}/\mu _{1})\). Psychologically, the drift norm represents the quality of the evidence in the stimulus and the phase angle represents its identity (i.e., the direction in which drift vector points). Likewise, we denote the point at which the process exits the bounding circle, the hitting point, in polar coordinates as \(\phantom {\dot {i}\!}\textbf {X}_{\theta }\). If \(\phantom {\dot {i}\!}{X_{T}^{1}}\) and \(\phantom {\dot {i}\!}{X_{T}^{2}}\) denote the horizontal and vertical Cartesian coordinates of the process at the point at which it hits the boundary, then the phase angle of the hitting point is \(\phantom {\dot {i}\!}{\cap {\Theta }}_{T} = \arctan \left ({X_{T}^{2}} /{X_{T}^{1}}\right )\).

As discussed by Smith (2016), all of the features of the standard diffusion model that have been found to be useful in fitting empirical data carry over to the 2D case. When there is no across-trial variability in drift rates or decision criteria, the model predicts that decision times will be independent of decision outcomes. When there is across-trial variability in drift rates, the model predicts an inverse relationship between RT and decision accuracy. When there is across-trial variability in decision criterion (the value of a), the model predicts a direct relationship between RT and decision accuracy. These relationships are continuous counterparts of the slow errors and fast errors that the standard diffusion model predicts when there is across-trial variability in drift rates and starting points, respectively (Ratcliff and McKoon, 2008).

The model also predicts families of RT distributions whose shapes remain invariant as drift rates or decision criteria change. Technically, it predicts RT distributions whose quantiles form affine families. Let \(\phantom {\dot {i}\!}Q_{i,k}\) denote the k th distribution quantile in the i th condition and let T denote the RT. Symbolically, \(\phantom {\dot {i}\!}P_{i}[T \leq Q_{i,k}] = p_{k}\), that is, \(\phantom {\dot {i}\!}Q_{i,k}\) is the value of time that cuts off the fastest proportion \(\phantom {\dot {i}\!}p_{k}\) of the RTs in condition i. For any pair of conditions, i and j, the model predicts that \(\phantom {\dot {i}\!}Q_{i,k} = a_{i,j} Q_{j,k} + b_{i,j}\), where \(\phantom {\dot {i}\!}a_{i,j}\) and \(\phantom {\dot {i}\!}b_{i,j}\) are constants that depend on the pair of conditions being compared. Put informally, the distribution quantiles form a set of straight lines. As we discuss in the section Treatment of Empirical Data, empirical RT distributions often exhibit this property to quite a high degree. In addition to these RT distribution properties, the model predicts that decision outcomes follow a von Mises distribution, which is a circular counterpart of the normal distribution (Fisher, 1993). Studies of visual working memory using continuous-report decision tasks have found that empirical decision outcomes are well described as mixtures of von Mises distributions (van den Berg et al., 2014; Zhang & Luck, 2008).

Figure 2c shows how the circular model can be generalized in a straightforward way to higher dimensions. For a 3D stimulus, the evidence accumulation process becomes three-dimensional, with components, \(\phantom {\dot {i}\!}\textbf {X}_{t} = ({X^{1}_{t}}, {X_{t}^{2}}, {X_{t}^{3}})\), each with infinitesimal standard deviation \(\phantom {\dot {i}\!}\sigma \) and drift rate \(\phantom {\dot {i}\!}\boldsymbol {\mu } = (\mu _{1},\mu _{2}, \mu _{3})\). The decision criterion then becomes an ordinary sphere, \(\phantom {\dot {i}\!}S^{2}\), with radius a, in the 3D evidence space \(\phantom {\dot {i}\!}\mathbb {R}^{3}\). As is Fig. 2b, a decision is made when \(\phantom {\dot {i}\!}\sqrt {{\sum }_{i = 1}^{3} ({X_{t}^{i}})^{2}} = a\), that is, at the first time when the Euclidean norm of the evidence accumulation process equals the decision criterion. Although the process in higher dimensions is hard to visualize, the decision rule is simply defined. In four dimensions, which is the process that concerns us here, the accumulation process is \(\phantom {\dot {i}\!}\textbf {X}_{t} = ({X^{1}_{t}}, {X_{t}^{2}}, {X_{t}^{3}}, {X_{t}^{4}})\), the drift rate is \(\phantom {\dot {i}\!}\boldsymbol {\mu } = (\mu _{1},\mu _{2}, \mu _{3}, \mu _{4})\), and a decision is made when \(\phantom {\dot {i}\!}\sqrt {{\sum }_{i = 1}^{4} ({X_{t}^{i}})^{2}} = a\).

Smith (2016) proposed the circular diffusion model as a model of decisions in continuous report tasks, but it also provides a model for categorical decisions if the outcome space is partitioned into discrete categories with decision bounds. To see how this works, consider, first, a two-element version of the digits-among-letters task of Fig. 1, in which the display contains a pair of characters, each of which can be a letter or a digit. Following the terminology of Fig. 2, we will refer to digits as “signals” and letters as “noise.” Assume that the evidence encoded at each location varies (negative to positive) from strong evidence for a letter (noise) to strong evidence for a digit (signal). For a two-element display, there are four possible stimulus configurations, \(\phantom {\dot {i}\!}s_{1}s_{2}\), \(\phantom {\dot {i}\!}n_{1}s_{2}\), \(\phantom {\dot {i}\!}n_{1}n_{2}\), and \(\phantom {\dot {i}\!}s_{1}n_{2}\), where s and n denote signal and noise, and the subscripts denote the display locations, which we identify with the horizontal and vertical axes of the evidence space, \(\phantom {\dot {i}\!}\mathbb {R}^{2}\), respectively. For specified stimulus discriminability conditions (exposure, contrast, etc.), the drift vectors associated with these four stimulus conditions will be \(\phantom {\dot {i}\!}(\mu , \mu )\), \(\phantom {\dot {i}\!}(-\mu , \mu )\), \(\phantom {\dot {i}\!}(-\mu , -\mu )\), and \(\phantom {\dot {i}\!}(\mu , -\mu )\), respectively. These coordinate pairs represent drift vectors with phase angles of \(\phantom {\dot {i}\!}\pi /4\), \(\phantom {\dot {i}\!}3\pi /4\), \(\phantom {\dot {i}\!}5\pi /4\), and \(\phantom {\dot {i}\!}7\pi /4\) radians (i.e., oriented at \(\phantom {\dot {i}\!}45^{\circ }\), \(\phantom {\dot {i}\!}135^{\circ }\), \(\phantom {\dot {i}\!}225^{\circ }\), and \(\phantom {\dot {i}\!}315^{\circ }\)).

As shown in Fig. 3, an appropriate decision model for this task can be obtained by placing decision bounds that align with the cardinal axes, at 0, \(\phantom {\dot {i}\!}\pi /2\), \(\phantom {\dot {i}\!}\pi \), and \(\phantom {\dot {i}\!}3\pi /2\) radians, which partition the continuous outcome space into four equal-size quadrants that define response regions, and which we denote, going anticlockwise from zero, as \(\phantom {\dot {i}\!}R_{1}\), \(\phantom {\dot {i}\!}R_{2}\), \(\phantom {\dot {i}\!}R_{3}\), and \(\phantom {\dot {i}\!}R_{4}\). Any process terminating in one of the four quadrants is labeled with the associated response. These quadrants can be mapped via Boolean operators to the response rule for the task. For the single-target task of Fig. 1, quadrants \(\phantom {\dot {i}\!}R_{1}\), \(\phantom {\dot {i}\!}R_{2}\), and \(\phantom {\dot {i}\!}R_{4}\) map to the “target present” response and quadrant \(\phantom {\dot {i}\!}R_{3}\) maps to the “target absent” response. For the double-target task, quadrant \(\phantom {\dot {i}\!}R_{1}\) maps to the “target present” response and quadrants \(\phantom {\dot {i}\!}R_{2}\), \(\phantom {\dot {i}\!}R_{3}\) and \(\phantom {\dot {i}\!}R_{4}\) map to the “target absent” response.

Fig. 3
figure 3

Circular (2D) diffusion model with decision bounds. The decision bounds are aligned with the cardinal axes and partition the outcome space into four response regions, R1, R2, R3, and R4, each of which is a quadrant of the outcome space. Processes finishing in the first quadrant lead to a “two signals” response; processes finishing in the third quadrant lead to a “two noise” response; and those finishing in the second and fourth quadrants lead to a “single signal” response. Processes that finish in the second quadrant are associated with detecting noise at location 1 and signal at location 2; those that finish in the fourth quadrant are associated with detecting a signal at location 1 and noise at location 2

For the task in Fig. 1, which used four-element displays, each of which independently provides noisy evidence for the presence or absence of a digit, the natural way to conceptualize the decision process is as a 4D accumulation process in the evidence space \(\phantom {\dot {i}\!}\mathbb {R}^{4}\). As in the preceding paragraph, the response region, which is the surface of the hypersphere \(\phantom {\dot {i}\!}S^{3}\), can be partitioned into response regions with decision bounds. If the decision bounds are aligned with the cardinal axes of \(\phantom {\dot {i}\!}S^{3}\), then there are \(\phantom {\dot {i}\!}2^{4} = 16\) such response regions, which are the orthants of the hypersphere. (“Orthants” are higher-dimensional analogs of quadrants. For an ordinary sphere, the orthants are the octants of the sphere and can be visualized as eighths of a ball. For the three-sphere, the orthants are mathematically well defined but difficult to visualize.) As in the 2D case, the orthants of the hypersphere map via Boolean operators to the response rule for the task. Before we describe these mappings, we outline the methods used to obtain response time and response outcome predictions for the model.

The Girsanov theorem and the Bessel process

In order to apply the hyperspherical diffusion model to empirical data we need predicted distributions of decision outcomes and decision times. This apparently formidable task is simplified by means of a theorem in stochastic processes known as the Girsanov theorem (or Cameron–Martin–Girsanov theorem) (Karatzas and Shreve, 1991; Rogers & Williams, 2000). The Girsanov theorem allows us to relate the properties of a diffusion process with nonzero drift rate (which is the process of theoretical interest) to those of the simpler zero-drift process. If we assume that evidence accumulation always begins at the origin of the evidence space, i.e., \(\phantom {\dot {i}\!}\textbf {X}_{0} = \boldsymbol {0}\), then all of the relevant properties of the zero-drift process are carried by the Euclidean distance process, \(\phantom {\dot {i}\!}R_{t}\),

$$ R_{t} = \sqrt{\sum\limits_{i = 1}^{4} ({X_{t}^{i}})^{2}}. $$
(1)

The Euclidean distance process \(\phantom {\dot {i}\!}R_{t}\) is known as the Bessel process (Borodin & Salminen, 1996, p. 297, Hamana & Matsumoto, 2013) and describes the distance of the 4D diffusion process, \(\phantom {\dot {i}\!}\textbf {X}_{t}\), from the origin at time t. The advantage of working with \(\phantom {\dot {i}\!}R_{t}\) rather than \(\phantom {\dot {i}\!}\textbf {X}_{t}\) is that it allows us to reduce a problem in four dimensions to a problem in one dimension. Smith (2016) describes in detail how the combination of the Girsanov theorem and the Bessel process can be used to derive predictions for the circular diffusion model. Together, they allow us to obtain simple analytic expressions for all of the relevant properties of the model required to fit data. Here we confine ourselves to stating the key results needed to extend the model to the four-dimensional case. To do so, we need some notation.

We denote by \(\phantom {\dot {i}\!}dP_{t}(a)\) the probability density function of the Bessel process through the decision boundary a at time t. (We have chosen to use differential notation as the most convenient way to state these results). That is,

$$dP_{t}(a) = \frac{d}{dt} P[R_{t} \leq a| R_{s} < a, \; 0 \leq s < t]. $$

This relationship states that \(\phantom {\dot {i}\!}dP_{t}(a)\) is the time derivative of the first passage time probability of \(\phantom {\dot {i}\!}R_{t}\) through the decision boundary a. The condition on the right stipulates that \(\phantom {\dot {i}\!}R_{t}\) is less than a for all values of time less than t and is needed to ensure that we are dealing with the first boundary crossing rather than any later crossing.

For the 4D Bessel process, the probability density function \(\phantom {\dot {i}\!}dP_{t}(a)\) is given by the time derivative of Eq. 2.7 in Hamana and Matsumoto (2013) with v (a parameter in their notation that determines the dimensionality of the process) set to 1.0,

$$ dP_{t}(a) = \frac{\sigma^{2}}{2a^{2}} \sum\limits_{k = 1}^{\infty} \frac{j^{2}_{1,k}}{J_{2}(j_{1,k})} \exp\left( -\frac{j_{1,k}^{2}\sigma^{2}}{2a^{2}} t\right). $$
(2)

In this equation, \(\phantom {\dot {i}\!}J_{2}(x)\) is a second-order Bessel function of the first kind (Abramowitz & Stegun, 1965, p. 360) and \(\phantom {\dot {i}\!}j_{1,k}\), \(\phantom {\dot {i}\!}k = 0, 1, 2, \ldots \) are the zeros of the first-order Bessel function \(\phantom {\dot {i}\!}J_{1}(x)\) whose properties are readily computed (Appendix A). Like other infinite-series representations associated with diffusion processes, this series must be truncated for use in applications. In our application, we truncated the series at 50 terms, which is more than sufficient for convergence (see Smith (2016), for further discussion and illustration of the properties of Bessel functions).Footnote 2

For us, the function of theoretical interest is the joint density of the event that the nonzero-drift process hits a designated point on the decision boundary at time t, conditional on this being the first such boundary crossing. In the circular model of Smith (2016), the hitting point on the boundary is specified by its phase angle, \(\phantom {\dot {i}\!}\theta \). For the 4D model, we work in hyperspherical coordinates (Misner et al., 1973, pp. 703–704; Weisstein, 2009, pp. 1877–1878), so we need three phase angles, \(\phantom {\dot {i}\!}\psi \), \(\phantom {\dot {i}\!}\phi \), and \(\phantom {\dot {i}\!}\theta \), to describe the point at which the process hits the bounding 3-sphere. The first coordinate, \(\phantom {\dot {i}\!}\theta \), \(\phantom {\dot {i}\!}0 \leq \theta < 2\pi \), describes the point (the azimuth) at which the process exits a disk oriented in the horizontal plane, like the one shown in Fig. 2b. The second coordinate, \(\phantom {\dot {i}\!}\phi \), \(\phantom {\dot {i}\!}0 \leq \phi < \pi \), identifies the elevation of a designated disk within a sphere like the one shown in Fig. 2c and is expressed as a deviation from the north pole. (It is helpful to visualize the volume enclosed by the sphere in Fig. 2c as being approximated by a stack of disks, whose diameters are maximal at the equator, \(\phantom {\dot {i}\!}\phi = \pi /2\), and decrease towards either pole. Then \(\phantom {\dot {i}\!}\theta \) identifies a point on the boundary of a disk and \(\phantom {\dot {i}\!}\phi \) identifies a particular disk within the stack.) The third coordinate, \(\phantom {\dot {i}\!}\psi \), \(\phantom {\dot {i}\!}0 \leq \psi < \pi \), similarly identifies the “elevation” of a sphere within a hypersphere and is also expressed as a deviation from the point of maximal elevation. Although it is difficult to visualize, \(\phantom {\dot {i}\!}\psi \) behaves mathematically in the same way as \(\phantom {\dot {i}\!}\phi \). Using these three phase angles, the point at which the process hits the bounding 3-sphere in \(\phantom {\dot {i}\!}\mathbb {R}^{4}\) can be specified using four Cartesian coordinates:

$$\begin{array}{@{}rcl@{}} {X_{T}^{4}} & = & a \cos \psi \\ {X_{T}^{3}} & = & a \sin \psi \cos \phi \\ {X_{T}^{2}} & = & a \sin \psi \sin \phi \cos \theta \\ {X_{T}^{1}} & = & a \sin \psi \sin \phi \sin \theta. \end{array} $$
(3)

Unlike the coordinate transformation for the circular model described earlier, the \(\phantom {\dot {i}\!}{X_{T}^{1}}\) coordinate in Eq. 3 refers to the vertical rather than the horizontal axis of the circle in \(\phantom {\dot {i}\!}\theta \). This way of parameterizing coordinates is fairly standard when working with spherical and hyperspherical coordinate systems.

In order to state our results compactly, we will introduce the additional piece of notation, \(\phantom {\dot {i}\!}\boldsymbol {\Theta } = (\psi , \phi , \theta )\), to denote a vector whose three components are the three phase angles of the hitting point on the boundary. We then denote by \(\phantom {\dot {i}\!}d\tilde {P}_{t}(\boldsymbol {\Theta }_{T})\) the probability density that the process hits the bounding sphere at a point \(\phantom {\dot {i}\!}\boldsymbol {X}_{T}\) with phase angle \(\phantom {\dot {i}\!}\boldsymbol {\Theta }_{T}\) at time \(\phantom {\dot {i}\!}T_{a}\). We use a capital T to indicate that the hitting point and hitting time are both random variables that depend on the entire history of the evidence accumulation process, but omit the “a” from the notation for the hitting time to avoid double-subscripting. The tilde indicates that \(\phantom {\dot {i}\!}\tilde {P}_{t}\) is the probability distribution of the nonzero-drift process.

The Girsanov theorem states that the probability density functions \(\phantom {\dot {i}\!}dP_{t}(a)\) and \(\phantom {\dot {i}\!}d\tilde {P}_{t}(\boldsymbol {\Theta }_{T})\) are related to one another in a simple way, via an exponential martingale, \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\),

$$ d\tilde{P}_{t}(\boldsymbol{\Theta}_{T}) = \textbf{Z}_{T}(\textbf{X}) dP_{t}(a), $$
(4)

where

$$ \textbf{Z}_{T}(\textbf{X}) = \exp\left[\frac{1}{\sigma^{2}} \boldsymbol{\mu} \cdot \textbf{X}_{T} - \frac{1}{2\sigma^{2}} \|\boldsymbol{\mu}\|^{2} T\right] $$
(5)

and where \(\phantom {\dot {i}\!}\boldsymbol {\mu } \cdot \textbf {X}_{T} = {\sum }_{i = 1}^{4} \mu _{i} {X_{T}^{i}}\) is the dot product of the drift rate and a random vector \(\phantom {\dot {i}\!}\textbf {X}_{T}\) with norm \(\phantom {\dot {i}\!}\|a\|\) and phase angle \(\phantom {\dot {i}\!}\boldsymbol {\Theta }_{T}\). For a specified \(\phantom {\dot {i}\!}\boldsymbol {\Theta }_{T}\), the four components of \(\phantom {\dot {i}\!}\textbf {X}_{T}\) are given by Eq. 3.

Martingales are fundamental to the modern theory of stochastic processes but they will be unfamiliar to many readers. In brief, a martingale is a bounded stochastic process whose expected value remains constant.Footnote 3 For readers wishing to learn more about martingales, Karlin and Taylor (1975) and Williams (1991) are recommended starting places. For our purposes here, the significance of the function \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\) is that it allows us to derive the statistics of the nonzero-drift process from those of the 4D Bessel process.

The ratio of probability density functions

$$ \frac{d\tilde{P}_{t}(\boldsymbol{\Theta}_{T})}{dP_{t}(a)} = \textbf{Z}_{T}(\textbf{X}) $$
(6)

has an illuminating statistical interpretation. This ratio is known as the Radon-Nikodym derivative of the probability measure \(\phantom {\dot {i}\!}\tilde {P}_{t}\) with respect to the measure \(\phantom {\dot {i}\!}P_{t}\). The Radon-Nikodym derivative is similar to, but generalizes, the familiar likelihood ratio of statistics from a discrete sequence of observations to a process in continuous time. Equation 6 states that the process \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\) can be interpreted as a likelihood ratio; specifically, it is the likelihood of obtaining the observed cumulative evidence process \(\phantom {\dot {i}\!}\textbf {X}_{t}\), \(\phantom {\dot {i}\!}0 \leq t \leq T\), given the drift rate was \(\phantom {\dot {i}\!}\boldsymbol {\mu }\), versus the likelihood of obtaining the same process given the drift rate was \(\phantom {\dot {i}\!}\textbf {0}\). Evidently, the process \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\) will be maximized when the dot product \(\phantom {\dot {i}\!}\boldsymbol {\mu } \cdot \textbf {X}_{T}\) is maximized, which will occur when the phase angles \(\phantom {\dot {i}\!}\boldsymbol {\mu }_{\boldsymbol {{\Theta }}}\) and \(\phantom {\dot {i}\!}\boldsymbol {{\Theta }}_{T}\) are the same, that is, when the process exits the bounding hypersphere at a point that coincides with the orientation of the drift vector. This point is the most likely hitting point for a process with drift vector \(\phantom {\dot {i}\!}\boldsymbol {\mu }\) and an observer (i.e., a decision maker) who reports this point as the identity of the stimulus is a maximum-likelihood observer.

Because the process \(\phantom {\dot {i}\!}\textbf {X}_{T}\) is guaranteed to terminate at the decision boundary in finite time (Mörters & Peres, 2010), the integral of the probability density \(d\tilde {P}_{t}(\boldsymbol {{\Theta }}_{T})\) over all possible hitting points and hitting times will be unity. Expanding \(\phantom {\dot {i}\!}\boldsymbol {{\Theta }}_{T}\) in components, this condition states that

$$ {\int}_{0}^{\infty}\! {\int}_{0}^{\pi}\! {\int}_{0}^{\pi}\! {\int}_{0}^{2\pi} \!d\tilde{P}_{t}(\psi, \phi, \theta) \, V(\psi, \phi, \theta) \; d\theta \, d\phi \, d\psi \, dt \,=\, 1, $$
(7)

where \(\phantom {\dot {i}\!}V(\psi , \phi , \theta )\) is the volume element,

$$V(\psi, \phi, \theta) = a^{3} \sin \psi^{2} \sin \phi $$

associated with the change of variables from Cartesian to hyperspherical coordinates (Misner et al., 1973; Weisstein, 2009). To understand the role of the volume element, it is helpful to visualize the integral over phase angle in Eq. 7 as a summation over a set of small 3D regions on the surface of the hypersphere, \(\phantom {\dot {i}\!}S^{3}\). For the 2D model, each of these regions is just a segment of arc of length \(\phantom {\dot {i}\!}a\,d\theta \) that forms the outer boundary of a wedge-shaped region that falls between the phase angles \(\phantom {\dot {i}\!}\theta \) and \(\phantom {\dot {i}\!}\theta + d\theta \). These segments are the same length for all \(\phantom {\dot {i}\!}\theta \) and the volume element is simply a, the radius of the bounding circle. For the 4D model, the regions are 3D volumes whose size varies with phase angle. They are largest at the “equator” (ψ = π/2, \(\phantom {\dot {i}\!}\phi = \pi /2\)) and decrease towards the “poles” (ψ = 0, \(\phantom {\dot {i}\!}\phi = 0\) and \(\phantom {\dot {i}\!}\psi = \pi \), \(\phantom {\dot {i}\!}\phi = \pi \)). The volume element specifies how the size of these regions changes as the phase angles are varied in equal steps.

Application to the double-target detection task

Like the circular model of Smith (2016), the 4D model of the preceding section is a model of continuous report. It predicts the distributions of decision times and decision outcomes for reporting the identity of the stimulus as a specific point in 4D space. It becomes a model of categorical decision-making when the surface of \(\phantom {\dot {i}\!}S^{3}\) is partitioned into response regions with decision bounds. The integral of Eq. 7, with suitably chosen limits of integration, then gives the probability of making the response associated with a specified region. For us, the regions of interest are the 16 orthants of \(\phantom {\dot {i}\!}S^{3}\). Psychologically, each orthant is associated with a unique stimulus configuration (i.e., an allocation of digits and letters to the four locations of the display).

Corbett and Smith’s (2017) study of double-target detection used a two-alternative decision task in which participants responded “target present” or “target absent,” according to the decision rule for the task. Table 1 shows how the orthants of the decision space for the 4D model map to the two response alternatives for their single-target and double-target tasks. The mappings assume that people respond “signal” (target present) if there is at least one signal associated with the response orthant in the single-target task and at least two signals in the double-target task; otherwise they respond “noise” (target absent). The decision time and decision outcome predictions are obtained by summing across all of the orthants that map to a given response. The decision time densities are the sums of the hitting time densities, conditioned on the response probabilities. Explicitly, denote by \(\phantom {\dot {i}\!}d\tilde {P}_{ijk}(t)\) the joint density of a response at time t in orthant ijk. By taking appropriate bounds of integration in Eq. 7 we obtain

$$\begin{array}{@{}rcl@{}} d\tilde{P}_{ijk}(t) &=& {\int}_{\psi_{i}}^{\psi_{i + 1}} {\int}_{\phi_{j}}^{\phi_{j + 1}} {\int}_{\theta_{k}}^{\theta_{k + 1}} d\tilde{P}_{t}(\psi, \phi, \theta)\\ &&\times\, V(\psi, \phi, \theta) \; d\theta \, d\phi \, d\psi, \end{array} $$
(8)

where \(\phantom {\dot {i}\!}\psi _{i} = i\pi /2\); \(\phantom {\dot {i}\!}i = 0, 1\); \(\phantom {\dot {i}\!}\phi _{j} = j\pi /2\); \(\phantom {\dot {i}\!}j = 0, 1\); and \(\phantom {\dot {i}\!}\theta _{k} = k\pi /2\), \(\phantom {\dot {i}\!}k = 0, 1, 2, 3\). The joint densities of target-present and target-absent responses in the single-target and double-target tasks are obtained by summing the joint densities in Eq. (8) across all of the orthants in Table 1 that map to the given response, \(\phantom {\dot {i}\!}\mathcal {R}\), for the particular task,

$$ d\tilde{P}_{\mathcal{R}}(t) = \sum\limits_{{ijk} \in \mathcal{R}} d\tilde{P}_{ijk}(t). $$
(9)

The probability of making response \(\phantom {\dot {i}\!}\mathcal {R}\), \(\phantom {\dot {i}\!}\tilde {P}_{\mathcal {R}}\), is obtained by integrating across all of the decision times associated with that response,

$$ \tilde{P}_{\mathcal{R}} = {\int}_{0}^{\infty} d\tilde{P}_{\mathcal{R}}(t) \, dt. $$
(10)
Table 1 Mapping of hitting-point phase angles and orthants to responses

Across-trial variability

The standard diffusion model (Ratcliff and McKoon, 2008), in addition to within-trial diffusive variability, includes three sources of across-trial variability that have been found to be useful in fitting data: variability in drift rate, variability in starting point, and variability in nondecision time. These sources of variability have been implemented in third-party packages for fitting the model to data (Vandekerckhove & Tuerlinckx, 2007; Vogel et al., 2006; Wiecki et al., 2013). For our purposes, the most important of these is variability in drift rate. Variability in drift rate is assumed to arise because of trial-to-trial variability in the quality of the evidence provided by nominally equivalent stimuli. This variability may be due to variability in perception or memory or in the operation of matching the encoded stimulus representation to the mental representations of the decision alternatives. We refer to these mental representations, by analogy with Lu and Dosher’s perceptual template model (Dosher & Lu, 2000; Lu & Dosher, 1998), as the decision template for the task.

In the standard diffusion model, drift rates are assumed to vary normally across trials with standard deviation \(\phantom {\dot {i}\!}\eta \). In higher dimensions the possibilities are richer, as discussed by Smith (2016), but here we try to remain close to the spirit of the standard diffusion model and assume that the four components of the drift rate \(\phantom {\dot {i}\!}\boldsymbol {\mu }\) are independently normally distributed with common standard deviation \(\phantom {\dot {i}\!}\eta \). When there is across-trial variability in drift rate (or any other model parameter) it is necessary to integrate (marginalize) across the decision time and decision outcome distributions for different values of the parameter to obtain predictions. To improve computational efficiency, we carried out this integration analytically rather than numerically. The first passage time joint density \(\phantom {\dot {i}\!}d\tilde {P}_{t}(\boldsymbol {{\Theta }}_{T})\) in Eq. 4 is the product of the exponential martingale \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\) in Eq. 5 and \(\phantom {\dot {i}\!}dP_{t}(a)\), the first passage time density for the Bessel process in Eq. 2. Because only the first of these terms depends on drift rate, the effect of across-trial variability in drift can be completely characterized by its effects on \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\). The value of \(\phantom {\dot {i}\!}\boldsymbol {\bar {Z}}_{T}(\textbf {X}; \boldsymbol {\nu }, \eta )\), the marginal form of \(\phantom {\dot {i}\!}\textbf {Z}_{T}(\textbf {X})\), is a function of \(\phantom {\dot {i}\!}\boldsymbol {\nu } = \{\nu _{1}, \nu _{2}, \nu _{3}, \nu _{4}\}\), the mean of the across-trial distribution of drift rates, and the common standard deviation, \(\phantom {\dot {i}\!}\eta \),

$$\begin{array}{@{}rcl@{}} \boldsymbol{\bar{Z}}_{T}(\textbf{X}; \boldsymbol{\nu}, \eta) &=& \prod\limits_{i = 1}^{4} \frac{1}{\sqrt{(\eta/\sigma)^{2} T + 1}}\\ &&\exp\left\{-\frac{{\nu_{i}^{2}}}{2\eta^{2}} + \frac{[{X_{T}^{i}}(\eta/\sigma)^{2} + \nu_{i}]^{2}}{2\eta^{2}[(\eta/\sigma)^{2}T + 1]} \right\}. \end{array} $$
(11)

The details of this computation are described in Appendix A. As in Eq. 5, when carrying out computations using Eq. 11 the values of \(\phantom {\dot {i}\!}{X_{T}^{i}}\) are given by the coordinate mapping functions in Eq. 3 that relate the phase angles \(\phantom {\dot {i}\!}\psi \), \(\phantom {\dot {i}\!}\phi \), and \(\phantom {\dot {i}\!}\theta \) to the hitting points of the process on the surface of the hypersphere.

The standard diffusion model also includes variability in the starting point of the evidence accumulation process. In Ratcliff’s (1978) gambler’s ruin notation, the decision boundaries are denoted 0 and a and the starting point is denoted z (0 < z < a) and is assumed to vary uniformly across trials with range \(\phantom {\dot {i}\!}s_{z}\), (sz < a). Because of the spatial homogeneity (translation invariance) of the Wiener diffusion process, variability in starting point is equivalent to negatively correlated variability in decision criteria with a fixed starting point. Starting-point variability allows the model to predict the fast errors that are often found when stimulus discriminability is high and instructions emphasize speed of responding (Luce, 1986; Ratcliff and Smith, 2004). Smith (2016) showed that the circular diffusion model can similarly predict fast errors (i.e., positively correlated RT and decision accuracy) when there is across-trial variability in the radius of the criterion circle, a. However, in near-threshold psychophysical tasks like the ones used by Corbett and Smith (2017) errors are almost always slower than correct responses. For such tasks, we have found little evidence that fits are improved by including variability in starting points or decision criteria (Smith & Ratcliff, 2009; Smith et al., 2004) and, indeed, recovery of other model parameters can be impaired by including it in the model, suggesting that the starting point parameter is not well identified in these kinds of tasks. We therefore chose not to include it in our models here.

The third source of across-trial variability is in nondecision time. The standard diffusion model assumes that RT is the sum of independent decision and nondecision times. The latter is denoted \(\phantom {\dot {i}\!}T_{\text {er}}\) (for “time for encoding and response”) and is assumed to be uniformly distributed with range \(\phantom {\dot {i}\!}s_{t}\). This assumption is made mainly for convenience: If nondecision time variability is small relative to decision time variability, then the shape of the RT distribution will be almost completely determined by the distribution of decision times and the shape of the distribution of nondecision times will be largely immaterial. This assumption was questioned recently by Verdonck and Tuerlinckx (2016) who proposed a deconvolutional method for inferring the distribution of nondecision times that yields distributions with appreciable positive skewness. As their method has yet to be widely applied and because we wished to stay as close as possible to the standard diffusion model, we continued to assume uniformly distributed nondecision times. For Corbett and Smith’s (2017) task, which was difficult and yielded long and variable RTs, this assumption seems a reasonable one and is not critical to the performance of the model.

Response bias and stimulus bias

Two kinds of decision bias have been identified in models of speeded two-choice decision-making: response bias and stimulus bias. Response bias is bias in the amount of evidence needed for a response and is implemented as a difference in the decision criteria for the two responses or, in the standard diffusion model, as asymmetry in the placement of the starting point between the two decision boundaries. Response bias is needed to account for performance in decision tasks in which the prior probabilities of the alternatives are unequal (Diederich & Busemeyer, 2006; Leite & Ratcliff, 2011; Mulder et al., 2012) and in tasks like the lexical decision task, in which there is asymmetry in the amounts of evidence needed to decide whether a character string is a word or a not (Ratcliff & Smith, 2004). In contrast, many perceptual tasks exhibit a high degree of symmetry in the response probabilities and RT distributions for the two alternatives, especially when the prior probabilities of the stimulus alternatives are equal, and can be modeled without response bias (Ratcliff & Smith, 2004; Smith & Ratcliff, 2009; Smith et al., 2004).

In higher-dimensional spaces like those that arise in the 2D and 4D models, it is not obvious that starting point placement is the most appropriate way to think about response bias—at least not when the prior probabilities of the stimulus alternatives are equal. Pragmatically, the introduction of starting point bias breaks the symmetry on which the analytic solutions via the Girsanov theorem and Bessel process depend. Consequently, for reasons of computational efficiency if for no other reason, we are motivated to seek other ways of representing response bias in these models. A form of bias that seems particularly relevant to the tasks of Fig. 1 is bias in the placement of the decision boundaries on the surface of the hypersphere. An unbiased decision maker will align the boundaries with the cardinal axes of the hypersphere, partitioning it into equal-sized orthants. For the single-target task, we found that this assumption was too restrictive. As is apparent from Table 1, there is asymmetry in the decision rule for both forms of the task that is particularly pronounced in the single-target task: For this task, 15 of the 16 orthants are associated with a “signal” response and only one of them is associated with a “noise” response. This response rule strongly biases the model to make “signal” responses, even when the drift rates for signal and noise are the same. In order to allow for this kind of bias, we increased the size of the response region associated with “noise” responses by moving the decision boundaries symmetrically away from the category midpoint.

Stimulus bias is a difference in the rates at which evidence for the decision alternatives accumulates. It is conceptualized as a bias in the process that compares encoded stimulus representations to the decision template. Link and Heath (1975) proposed that sensory evidence is compared to a mental standard or referent, which determines the relative rates at which evidence for the two alternatives accumulates. Ashby (1983) proposed that stimulus bias could arise from bias in a log-likelihood computation. The standard diffusion model conceives of stimulus bias as arising from a drift criterion, which determines the relative drift rates for the two alternatives. Drift rate bias is needed to account for performance in recognition memory tasks, in which drift rates are affected by the prior probability that an item was on a studied list. The changes in drift rates across conditions in recognition memory are well described by changes in drift criterion (Ratcliff and Smith, 2004).

The decision template for multiattribute tasks

To characterize drift bias in the 4D model it is useful to have a formal representation of the processes involved in the computation of drift rates. We coined the term “decision template” to refer to the mental representation of the stimulus configurations that are associated with the response alternatives in the task. In the 1D model, the decision template consists of a pair of alternatives; for the 4D model it potentially consists of 16, although only a subset of these alternatives were actually presented in Corbett and Smith’s (2017) task. These 16 alternatives correspond to the rows of Table 1 and consist of all possible assignments of signals (digits) and noise (letters) to the four locations of the array. The drift rate is a four-element vector that represents the encoded evidence for one of these 16 configurations on an experimental trial.

A natural way to represent the process of computing drift rates is as a linear operation in the vector space of possible stimulus configurations. Although this approach is too restrictive to accommodate the configuration-dependent biases we found empirically, it is useful to consider it first because it helps motivate a more appropriate representation. To this end, we define the configuration space of stimuli by associating with each of the 16 stimulus configurations a basis vector,

$$\begin{array}{@{}rcl@{}} \boldsymbol{u}_{i} & = & \left( u_{1}, u_{2}, u_{3}, u_{4}\right)^{\prime};\\ i & = & 1, {\ldots} 16; \quad u_{1}, u_{2}, u_{3}, u_{4} \in \{0.500, -0.500\}, \end{array} $$
(12)

where, in keeping with our conventions for stimulus coding, we assume signals (digits) are coded as positive and noise (letters) are coded as negative. In words, \(\phantom {\dot {i}\!}\boldsymbol {u}_{i}\) is a four-element vector with components 0.500 or \(\phantom {\dot {i}\!}-0.500\) and the set of \(\phantom {\dot {i}\!}2^{4}\) such vectors enumerates all possible assignments of these values to components. The prime denotes the matrix transpose and the choice of 0.500 sets the norms of the basis vectors to unity, \(\phantom {\dot {i}\!}\|\boldsymbol {u}_{i}\| = 1\). The basis set in Eq. 12 is redundant in that it contains subsets of linearly dependent vectors, including the collinear pairs \(\phantom {\dot {i}\!}\boldsymbol {u}_{i}\) and \(\phantom {\dot {i}\!}-\boldsymbol {u}_{i}\), but the redundancy does not affect its algebraic properties in any essential way.Footnote 4

A stimulus, \(\phantom {\dot {i}\!}\boldsymbol {s}\), of intensity I can be represented as a vector in this 4D space,

$$\boldsymbol{s} = I\boldsymbol{u}_{i} + \boldsymbol{\xi}, $$

for some i, where \(\phantom {\dot {i}\!}\boldsymbol {\xi }\) is a vector of normally distributed noise, which characterizes the effects of encoding variability on a given trial. The vector space operation

$$ \boldsymbol{\mu} = \frac{1}{4}\sum\limits_{i = 1}^{16} (\boldsymbol{s} \cdot \boldsymbol{u}_{i}) \, \boldsymbol{u}_{i}\ = \boldsymbol{s}, $$
(13)

yields a veridical representation of the stimulus. In this representation, the dot product, \(\phantom {\dot {i}\!}(\boldsymbol {s} \cdot \boldsymbol {u}_{i}) = \|\boldsymbol {s}\|\|\boldsymbol {u}_{i}\| \cos \alpha \), computes the orthogonal projection of the stimulus vector on the basis vector \(\phantom {\dot {i}\!}\boldsymbol {u}_{i}\), where \(\phantom {\dot {i}\!}\alpha \) is the angle between the two vectors. The value of \(\phantom {\dot {i}\!}\boldsymbol {\mu }\) is a weighted sum of the basis vectors, with weights equal to the orthogonal projections of the stimulus vector on each of the basis vectors. The scaling factor of \(\phantom {\dot {i}\!}1/4\) compensates for the redundancy in the basis set and puts the computed drift rates on the same scale as the stimulus. The operation in Eq. 13 can be viewed as a vector-space generalization of the one assumed to underlie the computation of drift rates in the 1D diffusion model, which reflect the difference in the strength-of-match between a stimulus and the two decision alternatives. The drift rates in the 1D model can be regarded as a weighted sum of two collinear vectors of opposite sign.

In general, the drift rates will not be veridical representations of the stimuli, but may exhibit various forms of stimulus bias. The estimates of drift rates we obtained from fitting the model suggest they depend at least in part on the configural properties of the stimuli, that is, on the particular combination and arrangement of elements in the display. Specifically, our data suggest that participants were searching for particular patterns of stimulus elements that would allow them to classify the trial as “target present” or “target absent.” For the double-target task, these patterns consisted of either a pair of digits or a triplet of letters, indicating a target-present trial or a target-absent trial, respectively. For the single-target task the patterns were a single digit or a quartet of letters. The biases we found empirically seem to have been directed at emphasizing the contents of those display locations that carry information about these patterns while de-emphasizing other locations.

These kinds of configural biases imply that the metric and the categorical properties of stimuli are both important for how they are coded cognitively. The metric properties of a stimulus are those that identify it uniquely as a point in a four-dimensional perceptual space; the categorical properties are those that associate this point with one of two response categories. Such interactions between metric and categorical properties of stimuli recall the picture provided by the work of Fific et al., (2010) who showed how evidence accumulation rates in serial and parallel processing models depend on category structures defined by logical rules acting upon the metric properties of an underlying stimulus space. Our data suggest that rule-based category structures induce stimulus biases that depend on the way in which particular sets of stimulus elements are mapped to responses. Such biases cannot be represented by applying a single biasing transformation uniformly to the result of the linear operation in Eq. 13, because they depend on the rule that maps a specific configuration of stimulus elements to a response.

We can formalize the ideas in the preceding paragraphs by thinking of the decision template, not as the set of all possible stimulus configurations for the task, but instead, as a set of diagnostic patterns that participants actually use (or we infer they use) to distinguish between target-present and target-absent trials. Components of the decision template that do not represent one or other of these diagnostic patterns are set to zero (for “don’t care”). Conceptualized in this way, the set of diagnostic patterns we hypothesize forms the decision template for the double-target task can be represented as a set of ten normalized vectors, \(\phantom {\dot {i}\!}\boldsymbol {\upsilon _{i}}\), comprising the six element-wise permutations of

$$\boldsymbol{\upsilon}_{\mathrm{Double\; Present}} = \left( 0.707, \, 0.707, \, 0, \, 0\right)^{\prime} $$

and the four element-wise permutations of

$$\boldsymbol{\upsilon}_{\mathrm{Double \; Absent}} = \left( -0.577, \, -0.577, \, -0.577,\, 0\right)^{\prime}. $$

For the single-target task, the hypothesized decision template is a set of five normalized vectors comprising the four element-wise permutations of

$$\boldsymbol{\upsilon}_{\mathrm{Single \; Present}} = \left( 1, \, 0, \, 0, \, 0\right)^{\prime} $$

together with the single vector

$$\boldsymbol{\upsilon}_{\mathrm{Single \; Absent}} = \left( -0.500, \, -0.500, \, -0.500,\,-0.500 \right)^{\prime}. $$

Like the linear-space representation of Eq. 13, we can think of the dot product \(\phantom {\dot {i}\!}(\boldsymbol {s}\cdot \boldsymbol {\upsilon _{i}} )\) as characterizing the similarity between a stimulus \(\phantom {\dot {i}\!}\boldsymbol {s}\) and a template vector \(\phantom {\dot {i}\!}\boldsymbol {\upsilon _{i}}\), and the drift rate, \(\phantom {\dot {i}\!}\boldsymbol {\mu }\), as the outcome of a process in which the stimulus is simultaneously matched to every vector in the template. Now, however, instead of depending on the sum of the projections of the stimulus on the basis vectors in configuration space, the drift rate depends on which of the set of template vectors it resembles most closely,

$$\begin{array}{@{}rcl@{}} \boldsymbol{\mu} & = & (\boldsymbol{s}\cdot \boldsymbol{\upsilon_{i}})\, \boldsymbol{\upsilon_{i}} + (1 - \beta) \left[\boldsymbol{s} - (\boldsymbol{s}\cdot \boldsymbol{\upsilon_{i}})\, \boldsymbol{\upsilon_{i}}\right] \\ i & = & \underset{j}{\text{argmax}} \, (\boldsymbol{s}\cdot \boldsymbol{\upsilon_{j}}); \hspace{0.25cm} j = 1, 2, \ldots; \hspace{0.25cm} 0 \leq \beta \leq 1. \end{array} $$
(14)

In Eq. 14, the “argmax” function picks out the index i at which similarity is maximized, that is, it identifies the template vector that most closely matches the stimulus. A biased drift rate, \(\phantom {\dot {i}\!}\boldsymbol {\mu }\), is obtained by orthogonally decomposing the stimulus vector into two components. The first, \(\phantom {\dot {i}\!}(\boldsymbol {s}\cdot \boldsymbol {\upsilon _{i}})\,\boldsymbol {\upsilon _{i}}\), is the projection of the stimulus on the template vector \(\phantom {\dot {i}\!}\boldsymbol {\upsilon _{i}}\) and the second, \(\phantom {\dot {i}\!}\boldsymbol {s} - (\boldsymbol {s}\cdot \boldsymbol {\upsilon _{i}})\, \boldsymbol {\upsilon _{i}}\), is the orthogonal complement of the projection. The stimulus vector, \(\phantom {\dot {i}\!}\boldsymbol {s}\), is the sum of these two components. The extent of the drift bias is controlled by the parameter \(\phantom {\dot {i}\!}\beta \) which varies between zero and one. When \(\phantom {\dot {i}\!}\beta \) is zero, there is no bias and \(\phantom {\dot {i}\!}\boldsymbol {\mu } = \boldsymbol {s}\) and the drift rate is a veridical representation of the stimulus. As \(\phantom {\dot {i}\!}\beta \) increases, the contribution of the orthogonal complement is reduced and the drift vector progressively approaches its projection onto the nearest template vector. The effect of this kind of bias is to de-emphasize those elements of the stimulus that are not part of one or other of the diagnostic patterns for the decision rule while preserving the values of the remaining elements.

Computationally, the “argmax” function in Eq. 14 could be realized in a straightforward way by reformulating the model as a similarity-choice model, in which similarity is an exponentially decreasing function of the psychological distance between stimuli and similarities are summed across items or exemplars in psychological space (Navarro, 2007; Nosofsky, 1984; Shepard, 1987). In such models, when the exponential decay grows large, the summed similarity becomes dominated by the nearest item and the sum across items approximates the argmax function. For our purposes, there is little to be gained by elaborating the model in this way, but the theoretical connection to the larger class of similarity-choice models is nevertheless illuminating.

Does dimensionality matter?

Notwithstanding our earlier argument that a four-dimensional evidence accumulation process is the natural setting in which to represent decisions about four-element displays, the reader may still be wondering whether the dimensionality of the model really matters. Specifically, are the predictions of the 2D and 4D models sufficiently different that it would affect our ability to account for data if we assumed a model with the wrong dimension? Figure 4 shows that the dimensionality of the model does indeed have a significant effect on the RT distributions it predicts. The figure shows predicted response time distributions for zero-drift diffusion processes (Bessel processes) in two (S1) and four (S3) dimensions. The distribution for the 4D process was generated from Eq. 2 and that for the 2D process from Eq. 22 in Smith (2016) with the decision criterion \(\phantom {\dot {i}\!}a = 0.8\) and infinitesimal standard deviation \(\phantom {\dot {i}\!}\sigma = 1.0\) in both cases.

Fig. 4
figure 4

Predicted response time distributions for zero-drift diffusion processes in two dimensions (S1, dashed line) and four dimensions (S3, solid line) for a = 0.8 and σ = 1.0

In comparison to the distribution for the 2D process, the distribution for the 4D process has a smaller mean and a smaller variance. Processes in a 4D space are likely to finish faster and to be less variable than those in a 2D space. The way to understand this relationship is by analogy with the properties of independent parallel processes. As is well known (e.g., Colonius (1995), Eq. 6), when a decision depends on the fastest-finishing of a set of racing independent processes, RTs become progressively faster and less variable as the number of racing processes increases. This is because the probability that the system as a whole takes longer than a given time t to finish is the product of the probabilities that all of its components take longer than t to finish and this (usually) decreases with the number of components for all t.Footnote 5 A multidimensional diffusion process with independent components behaves similarly. Increasing the number of dimensions increases the number of different ways in which the process can exit the bounding hypersphere. This increases its opportunities for finishing by time t, or conversely, decreases the probability that it will still be within the hypersphere at time t. This manifests itself as a reduction in the mean and variance of the RT distribution as shown in Fig. 3. So the answer to our question is, yes, dimensionality does matter—in the sense that the relationship between the predicted RT distributions and the underlying model parameters is different in the 2D and 4D cases—which justifies our use of the more complex 4D model for the Corbett and Smith (2017) task.

Method

The full details of the experimental method can be found in Corbett and Smith (2017, General Method and Experiment 4a). The stimuli were presented on a linearized (gamma-corrected) 21” CRT monitor driven by a Cambridge Research Systems ViSaGe framestore, which controlled all aspects of stimulus presentation and response timing. Responses were made on a Cambridge Research Systems infrared button box and were timed by the ViSaGe’s hardware clock. The ViSaGe runs asynchronously of the host computer and is non-interruptible, providing a high degree of timing accuracy. Six highly practiced participants each completed seven 480-trial experimental sessions, yielding 3,360 trials per participant in a 2 (task) \(\phantom {\dot {i}\!}\times \) 5 (stimulus contrast) factorial design. The two tasks (single target and double target) were alternated in four consecutive 120-trial blocks in each session. The task order was pseudorandomized across sessions to ensure that no more than four sessions began with the same task. Stimuli were presented in Gaussian noise for 200 ms and backwardly masked with high-contrast checkerboards. Stimulus contrasts and the location of digit targets were randomized across trials within a block. Five levels of stimulus contrast were selected for each participant individually during practice to produce a range of accuracies in the single-target task that varied from just above chance to near-ceiling. The same set of contrasts was used in the single-target and double-target tasks. Participants were encouraged to respond on each trial as accurately as possible and as soon as they had made a decision but there was no emphasis on speed of responding. Trial-by-trial accuracy feedback was given via distinctive auditory tone pairs.

Treatment of experimental data

The primary experimental data we wish to explain are the proportions of correct responses and the RT distributions for correct responses and errors as functions of stimulus contrast and the decision task. We evaluated our models using methods similar to those described in Ratcliff and Smith (2004), which we have used in many other studies (e.g., Smith et al., (2004), Smith and Ratcliff (2009), and Smith et al., (2016)). We summarized the information in the RT distributions using five quantiles: the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles. Five quantiles suffice to characterize the shape of the distribution while being relatively insensitive to contaminants and outliers (Ratcliff and Tuerlinckx, 2002). The 0.1 quantile characterizes the fastest responses in the distribution (the leading edge); the 0.5 quantile characterizes its central tendency (the median), and the 0.9 quantile characterizes the slowest responses (the tail).

We fitted our models both to the individual participant data and to group data obtained by averaging corresponding quantiles for the distributions of correct responses and errors across participants: \(\phantom {\dot {i}\!}\bar {Q}_{k} = ({\sum }_{i = 1}^{6}Q_{i,k})/6\). The conditions under which distributions can be safely quantile-averaged without distortion were identified by Thomas and Ross (1980). A distribution obtained from quantile-averaging a set of component distributions will belong to the same family as its components when the quantiles of the components form an affine family, that is, when they fall on a set of straight lines.Footnote 6

Figure 5 shows quantile-quantile (Q-Q) plots for individual participants in each of the ten conditions of Corbett and Smith’s (2017) experiment. The columns are the experimental tasks and the rows, top to bottom, represent increasing levels of stimulus contrast. To construct these plots, we plotted the quantiles of the marginal RT distributions for the individual participants, \(\phantom {\dot {i}\!}Q_{i,k}\), \(\phantom {\dot {i}\!}i = 1,{\ldots } 6\), against the group average quantiles, \(\phantom {\dot {i}\!}\bar {Q}_{k}\). The heavy solid line is a plot of the values of \(\phantom {\dot {i}\!}\bar {Q}_{k}\) against itself, which is a straight line with unity slope. Although there are some individual deviations from linearity discernible in the figure they are small and unsystematic and the plot overall is remarkably linear.

Fig. 5
figure 5

RT distribution quantiles, Qi, k, for the i th participant, k = (.1, .3, .5, .7, .9), plotted against the group average quantile, \(\bar {Q_{k}}\) across participants for single (S) and double (D) target conditions for five levels of stimulus contrast (1,…5). The plotting symbols denote the individual participants; the heavy line is the group average

Table 2 represents the same data in another way. The table shows the squared Pearson correlations (r2) between the individual participant RT quantiles, \(\phantom {\dot {i}\!}Q_{i,k}\), and the average quantiles for the other participants, (\({\sum }_{j = 1,j \ne i}^{6}Q_{j,k})/5\), with participant i omitted. The smallest \(\phantom {\dot {i}\!}r^{2}\) in the table is .946 and most of the values are around 0.99. These figures imply that the conditions for valid quantile-averaging identified by Thomas and Ross (1980) are satisfied. The pattern of data in Fig. 5 and Table 2 is not an exceptional or remarkable one but is fairly typical of data from this kind of task (e.g., Ratcliff and McKoon (2008), Fig. 8; Ratcliff and Smith (2010, Figure 20).

Table 2 Individual-group quantile correlation (r2) statistics

One of the reasons for the success of the diffusion model as a model of two-choice decisions is that the quantiles of its predicted RT distributions also form affine families (Ratcliff & McKoon, 2008; Smith, 2016, Fig. 10). This is true for the 1D model and for models in higher dimensions as well. Consequently, the distributions predicted by the diffusion model have just the right shape to account for empirical data. Because empirical data and the model predictions have the same affine structure, model parameters estimated from group data and the averages of parameters estimated from individual data are usually in good agreement (Ratcliff et al., 2003, 2004, 2004).

We fitted our models to the data from the single-target and double-target tasks separately, because there were no strong a priori reasons to assume that any of the model parameters would be the same across tasks. To do so, we fitted the quantile-averaged group data and individual data by minimizing the likelihood-ratio Chi-square statistic (G2),

$$G^{2} = 2\sum\limits_{i = 1}^{10}n_{i}\sum\limits_{j = 1}^{12} p_{ij} \log \left( \frac{p_{ij}}{\pi_{ij}}\right), $$

using the Matlab implementation of the Nelder-Mead Simplex algorithm (fminsearch). In this equation, \(\phantom {\dot {i}\!}p_{ij}\) and \(\phantom {\dot {i}\!}\pi _{ij}\) are, respectively, the observed and predicted probabilities (proportions) in the bins bounded by the quantiles.Footnote 7 The inner summation over j extends over the 12 bins formed by each pair of joint distributions of correct responses and errors. (There are five quantiles per distribution, resulting in six bins per distribution, or 12 bins in total for each distribution pair.) The outer summation over i extends across the five contrast levels and the two stimulus configurations (target-present and target-absent). The quantity \(\phantom {\dot {i}\!}n_{i}\) is the number of experimental trials in each stimulus condition (here \(\phantom {\dot {i}\!}n_{i} = 168\)). Because \(\phantom {\dot {i}\!}G^{2}\) computed on the joint distributions depends on the relative proportions of correct responses and errors, it characterizes goodness-of-fit to the distribution shapes and the choice probabilities simultaneously.

We performed model selection using the Akaike Information Criterion (AIC), which penalizes models for complexity based on their number of free parameters. To allow for overdispersion (or underdispersion) in the data, we used a variant of the AIC called the QAIC (Burnham & Anderson, 2002, pp. 67–70), which adjusts the likelihood for dispersion prior to applying the penalty,

$$\text{QAIC} = G^{2} / q + 2m, $$

where q is the dispersion adjustment (see Results) and m is the number of free parameters in the model.7 Many cognitive researchers prefer the alternative Bayesian Information Criterion (BIC) for model selection, which uses a sample-size dependent complexity penalty, but Burnham and Anderson claim that BIC often leads to underfitting. Consistent with their view, Oberauer and Lin (2017) recently reported that the AIC had better model-recovery properties than the BIC in a study of visual working memory. We also computed the BIC, but in agreement with Burnham and Anderson, we found it tended to prefer underfit models.

Parameterizing the 4D model

Mean drift rates

Because the 4D model is new and because there have been relatively few applications of diffusion models to tasks with multielement displays, we considered several highly parameterized versions of the model to try to understand its empirical properties. Our ultimate goal was to find a parsimonious parameterization that provided insight into the underlying psychological processes, particularly those associated with the computation of drift rates. We do not attempt to report all of the model variants we considered, but focus on the theoretically interesting ones. The most highly parameterized model has a total of 20 mean drift rates: one for each level of stimulus contrast for signal and noise stimuli (digits and letters), which differ for target-present and target-absent trials. The possibility that drift rates for signal and noise stimuli might be different accords with previous treatments of stimulus bias in the literature and can be characterized using a variable drift criterion. However, the idea that drift rates might differ on target-present and target-absent trials has some deep theoretical implications that we consider at length subsequently.

The most striking of these, as was anticipated in the section on the decision template, is that the decision template is not local but configural. By a “local” decision template, we mean the components of the drift rate arise as a result of performing a perceptual matching operation simultaneously and separately at each location in the display. The result of this matching operation is a vector-valued quantity that characterizes the strength of evidence for a signal at each location. The drift criterion is a local computation of this kind. By a “configural” template, we mean a matching operation in which the strength of evidence for a signal at a given location also depends on the stimuli that are present at other locations. This is, to our knowledge, a new idea in the realm of diffusion process models and one that on first encounter may strike readers as counterintuitive.

Although it is novel in diffusion process settings, the idea that the global features of a display can influence how its elements are processed is a fairly common one in perceptual psychology and mental architecture studies. In perceptual psychology, this kind of nonlocality is expressed in the idea of “gist” processing, which proposes that the global properties of an object or a scene can affect the manner and efficiency with which its components are processed (Palmer, 1975). Mental architecture studies using systems factorial technology (Townsend & Nozawa, 1995) have shown that the configural properties of stimuli affect the speed with which their components are processed and the degree to which they can be processed in parallel (Fific et al., 2010; Moneer et al., 2016; Wenger & Townsend, 2006). In a similar vein, Brady and Tenenbaum (2012) found that visual working memory was better characterized by a model in which top-down configural structure affects the processing of individual stimulus elements than by one in which the elements were processed independently.

In mental architecture models, these kinds of nonlocal processing effects are represented mathematically by crosstalk between channels that process individual stimulus elements (Townsend et al., 2012). Such crosstalk constitutes a form of coactivation between channels (Miller, 1982; Townsend & Nozawa, 1995). Our estimated drift rates imply a similar kind of nonlocality in the computation of letter and digit drift rates on double-target trials. Parenthetically, we should note that while we have termed this “configural” processing, some authors prefer to reserve this term for stimuli that have recognizable higher-order or Gestalt structure, like features in a face (Wenger & Townsend, 2006) and use the term “contextual processing” (Palmer, 1975), for the more general situation in which processing of a stimulus element is affected by elements elsewhere in the display. We do not maintain this distinction rigidly and treat the terms as interchangeable.

Drift rate variability

Most applications of the 1D diffusion model assume that across-trial variability in drift rates, η, is constant for different levels of stimulus difficulty within an experimental block when stimuli are viewed under identical conditions. Values of \(\phantom {\dot {i}\!}\eta \) may also vary within blocks if stimuli are viewed under different conditions, such as occurs when there is an across-trial manipulation of attention (e.g., Ratcliff (2014), Ratcliff and Smith (2004), Smith et al., (2016), and Smith and Ratcliff (2009)). When brief near-threshold stimuli are used, however, like those used by Corbett and Smith (2017), there is a possibility that drift rate means and variances might covary with stimulus difficulty. This possibility arises from the idea that the drift rate mean and variance might both depend on Poisson-like neural coding of stimulus information in perception and visual working memory (Bays, 2014; Smith, 2010, 2015; Smith & McKenzie, 2011). We considered this possibility by allowing mean drift rate and variance to both vary with stimulus contrast: Specifically, we allowed \(\phantom {\dot {i}\!}\eta \) to vary with contrast and stimulus condition (single- versus double-target trials). Allowing \(\phantom {\dot {i}\!}\eta \) to vary in this way did improve model fit, but the improvements were smaller than were those associated with the configural encoding of drift rates.

Nondecision times

We also considered the possibility that nondecision times might vary as a function stimulus difficulty. This possibility has been considered by a number of authors, including Ratcliff and Smith (2010) who studied two-choice decisions about letters presented in dynamic noise. They found large changes in the leading edge (the 0.1 quantile) of the RT distribution with changes in noise density. Donkin et al., (2009) reanalyzed data from Gould et al., (2007), who studied two-choice decisions about low-contrast grating patches and found similar changes in the leading edge. Smith et al., (2014) showed that the RT distributions from these kinds of tasks could be successfully modeled by the integrated system model of Smith and Ratcliff (2009), which attributes changes in the leading edge of the RT distributions to the process of visual short-term memory formation that drives the decision process. Corbett and Smith’s (2017) study also used low-contrast stimuli presented in noise, so we considered models in which both the mean nondecision time, Ter, and its variability, \(\phantom {\dot {i}\!}s_{t}\), varied with stimulus condition. We found no evidence that freeing up the fits in this way improved them, so we do not consider models with stimulus-dependent nondecision times any further.

Results

Quantile-probability representation of experimental data

A compact way to represent the effects of experimental manipulations on the RT distributions for correct responses and errors is in a quantile-probability plot. These plots have been widely used in the literature since they were introduced by Ratcliff et al., (2001), but they may be unfamiliar to some readers, so Fig. 6 summarizes how such a plot is constructed. The quantile-probability plot shows how the distributions of RT for correct responses and errors and the associated accuracy varies as a function of stimulus difficulty, which in our task was a function of contrast. For each difficulty level, performance is summarized by the .1, .3, .5, .7, and .9 quantiles of the distributions of correct responses and errors and the associated accuracy, p. Figure 6a shows a comparison of a kernel-density estimate of an RT distribution with an equal-area histogram in which the bin boundaries are formed by the distribution quantiles. Each of the interior bins contains .2 of the probability mass; the remaining .2 of the mass is distributed equally between the two end bins, whose lower and upper boundaries are the .005 and .995 quantiles, respectively. Evidently, the distribution quantiles provide a good summary of the shape of the distribution.

Fig. 6
figure 6

Quantile-probability plot. a RT distributions are summarized by equal-area histograms using the .1, .3, .5, .6, and .9 distribution quantiles as bin boundaries. The continuous curve is a kernel density estimate of the same distribution. b For each stimulus condition, the quantiles of the distribution of correct responses are plotted against the probability of a correct response, p, and the quantiles of the distribution of errors are plotted against the probability of an error, 1 − p. c Quantile-probability plot from an experiment with two stimulus conditions showing a slow-error structure. Each condition contributes a pair of distributions to the plot

To construct a quantile probability plot, the quantiles of the distribution of correct responses are plotted against the probability of a correct response, p, and the quantiles of the distribution of errors are plotted against the probability of an error, \(\phantom {\dot {i}\!}1 - p\), as shown in Fig. 6b. Each stimulus condition contributes one pair of distributions to the plot; in general there will be as many pairs of distributions in the plot as there are stimulus conditions. Figure 6c shows a quantile-probability plot from an experiment with two stimulus conditions and four distributions. The distributions on the right-hand side of the plot (light gray) are the distributions of correct responses and the distributions on the left (dark gray) are the distributions of errors. The innermost pair of distributions is from the hardest difficulty level (lowest contrast) and the outermost pair is from the easiest level (highest contrast).

The quantile-probability plot shows in a compact way how distribution shape and accuracy both change as stimulus difficulty is parametrically varied. Most of the changes in RT with changing stimulus difficulty are in the upper quantiles of the distribution (i.e., the median, .7, and .9 quantiles), as are most of the differences between correct responses and errors. The leading edge (the .1 quantile) shows comparatively little change with changing difficulty and comparatively little difference between correct responses and errors. The plot in Fig. 6c is canted upwards towards the upper left hand side. This is the typical pattern of slow errors that is found in difficult tasks in which accurate responding is stressed. If there were no difference between correct responses and errors then the plot would be symmetrical about its vertical midline. Performance in tasks in which speed and accuracy for the two decision alternatives are similar can be summarized using a single quantile-plot by combining correct responses and errors for the two alternatives in each stimulus condition. In tasks in which there are differences in RT or accuracy for the two alternatives, like those we consider here, separate plots are required. Performance under these circumstances can be characterized either in terms of the stimulus or the response; we have to chosen to characterize it in terms of the stimulus.

The double-target task

The double-target task is more complex psychologically than the single-target task because it shows a greater asymmetry in performance across target-present and target-absent trials, but it was simpler to model, because it did not require response bias, so we consider it first.Footnote 8 We begin by discussing the quantile-averaged group data and then report confirmatory fits to the individual participants. Table 3 summarizes the main models of interest for the double-target task and the associated \(\phantom {\dot {i}\!}G^{2}\) fit statistics and BICs. Table 4 gives a detailed breakdown of the parameterization of mean drift rates and drift rate variability for all of the models for the double-target and single-target tasks.

Table 3 Models for the double-target task
Table 4 Drift Rate Parameters

Theoretically, \(\phantom {\dot {i}\!}G^{2}\) is distributed as Chi-square with degrees of freedom equal to the degrees of freedom in the data minus the number of free parameters in the model (110 - m in the notation of Table 3). We do not assign p-values to \(\phantom {\dot {i}\!}G^{2}\), but use it just as a rough guide to the quality of the fit. For \(\phantom {\dot {i}\!}G^{2}\) to be tested against the percentage points of the Chi-square distribution, the quantity \(\phantom {\dot {i}\!}{\sum }_{i} n_{i}{\sum }_{j} p_{ij}\log \pi _{ij}\) (the model-dependent part of \(\phantom {\dot {i}\!}G^{2}\)) must be a likelihood based on independent and identically distributed observations. Individual data from cognitive tasks are typically statistically overdispersed (i.e., more variable than predicted) relative to the distributions ordinarily used to describe them (McCullagh and Nelder, 1989; Smith, 1998) because of parameter variability due to learning, fluctuations in attention, and across-trial sequential dependencies (Luce, 1986). As a result, fit statistics computed at the individual level are often larger than would be expected from an otherwise well-fitting model, even when it seems to capture the essential features of the data. Quantile-averaging of individual data can produce the opposite effect and yield fit statistics that are underdispersed relative to the theoretical Chi-square. In the absence of overdispersion or underdispersion, \(\phantom {\dot {i}\!}G^{2}\) or Pearson Chi-square for a well-fitting model should be fairly close to its residual degrees of freedom.

Group fits

Figure 7 shows plots for two of the models from Table 3 (Model 1 and Model 2). The left-hand panels of the figure show RTs and accuracy for target-present (two-digit) trials; the right-hand panels show target-absent (single-digit) trials. The errors in the panels on the left are the target-absent responses on double-target trials (misses) and the errors in the panels on the right are the target-present responses on single-digit trials (false alarms). (The errors are the symbols to the left of the 0.5 point in each panel.) There is a marked asymmetry in the pattern of RTs and accuracy for the two kinds of trial and the overall shape of the quantile-probability functions is appreciably more complex than is typically found for decisions based on only a single stimulus (e.g., Ratcliff and Smith (2004)).

Fig. 7
figure 7

Fits to double-target task (Group data). The left-hand panels are target-present (two-digit) trials; the right-hand panels are target-absent (one-digit) trials. a Model with 20 mean drift rates (Model 1); b model with ten mean drift rates (Model 2). The symbols denote distribution quantiles: circle = .1; square = .3; diamond = .5; inverted triangle = .7, and upright triangle = .9. Light gray symbols are “target-present” responses; dark gray symbols are “target-absent” responses. The points on the x-axis are choice probabilities for the five levels of stimulus contrast. Choice probabilities increase with contrast for correct responses (to the right of 0.5 point) and decrease with contrast for errors (to the left of the 0.5 point)

Model 1, shown in Fig. 7a, was the most complex of the models we considered. It had a total of 20 mean drift rates, \(\phantom {\dot {i}\!}\nu \), ten drift standard deviations, \(\phantom {\dot {i}\!}\eta \), a single decision criterion, a, and a single pair of nondecision time parameters, \(\phantom {\dot {i}\!}T_{\text {er}}\), and \(\phantom {\dot {i}\!}s_{t}\). The idea that the mean drift rates for individual digits and letters might vary both as a function of the elements themselves and of the number of similar elements in the display is an expression of the assumption that processing is inherently configural. As we foreshadowed earlier and discuss in detail later, we interpret the estimated drift rates from the model as evidence for a biased-template model, in which participants search for particular, diagnostic patterns of stimulus elements.

Overall, the model fits well and captures the main regularities in what is a challenging set of data. The only obvious misses are in the tail quantiles (.7 and .9 quantiles) for errors, but empirical estimates of tail quantiles have high variance because they are based on comparatively few trials and one must be careful to avoid overinterpreting them when assessing model fit (Ratcliff & Tuerlinckx, 2002). Burnham and Anderson (2002) recommend using the goodness-of-fit of the most complex structural model in a set of candidate models to estimate the QAIC dispersion parameter q; we used Model 1 for this purpose. They also recommend not adjusting the likelihood when the estimated q is less than unity, but underdispersion is expected with quantile-averaged data, so use of an adjusted likelihood seems appropriate. We obtained an estimate of \(\phantom {\dot {i}\!}q = 44.9/77 = 0.583\). which we used to calculate the QAICs for all of the fits to group data in Table 1.

In addition to Model 1, we considered Model 2, which had ten fewer mean drift parameters. Our reasons for considering this model were both pragmatic and theoretical: Pragmatically, we were trying to reduce the number of free parameters, but the model is also of theoretical interest because it is the purest expression of the biased-template model, in which all of the stimulus information on target-present trials is carried by digits (signals) and all of the information on target-absent trials is carried by letters (noise). We were motivated to consider this model by the finding that mean drift rates for letters on target-present trials and for digits on target-absent trials in Model 1 were smaller and less systematic than those for the other two stimulus types (see Mean Drift Rates below). We therefore considered a model in which these ten drift rates were all set to zero (see Table 4 for details). The fit of Model 2 is shown in Fig. 7b. The \(\phantom {\dot {i}\!}G^{2}\) is some 29% worse than that of Model 1, but the QAIC differed by only a small amount, \(\phantom {\dot {i}\!}{\Delta }_{i} = 2.3\). Burnham and Anderson (2002, p. 70) state that a model whose \(\phantom {\dot {i}\!}{\Delta }_{i}\) is within two units of the best-fitting model has “substantial” support. When applied to our data, their criterion implies that a model with 20 free drift rates and one with ten free drift rates perform similarly. In comparison to Model 1, Model 2 is somewhat “stiffer” and consequently does not capture some of the fine-grained details of the data as well. Nevertheless, it captures most of the main qualitative features, with the exception of the concavity in the quantiles of the target-absent error distributions.

In Models 3 and 4 we attempted to reduce the number of free \(\phantom {\dot {i}\!}\eta \) parameters from ten to two using the same set of mean drift rates as in Models 1 and 2. Estimates of \(\phantom {\dot {i}\!}\eta \) for noise (letter) stimulus were appreciably smaller than those for signal (digit) stimuli, implying a need for separate parameters for these two classes of stimuli, but the changes in estimated \(\phantom {\dot {i}\!}\eta \) across stimulus contrasts were not large, so we considered models in which there were only two such parameters. As we noted previously, the majority of fits of the 1D diffusion model in the literature assume a single \(\phantom {\dot {i}\!}\eta \) value for different values of mean drift rate. Again, there was a theoretical as well as a pragmatic reason for considering these two models.

Model 2 assumed that letters on target-present trials and digits on target-absent trials do not contribute discriminative information to the decision process but only add contrast-dependent noise. This assumption is expressed in the combination of zero mean drift rates and contrast-dependent drift-rate variance for these two classes of stimulus elements. The picture implied by Model 2 is that these elements have a measurable, but unsystematic effect on drift rates, in the sense that they affect RT but not response accuracy. Like Model 1, Model 3 assumed that nondiagnostic stimulus elements affect mean drift rates, but unlike Model 1, it assumed that drift rate variance is unaffected by stimulus contrast. Model 4 assumed that stimulus contrast affects neither the means nor the variances of the drift rates of nondiagnostic elements.

The \(\phantom {\dot {i}\!}G^{2}\) for Model 3 was similar to that of Model 2 and its QAIC was (by a small margin) the best of the three models and within two QAIC units of both of them, implying by Burnham and Anderson’s (2002) criterion, “substantial support.” In contrast, the fit of Model 4 was poorer, and its QAIC difference from the best model was \(\phantom {\dot {i}\!}{\Delta }_{i} = 9.0\). Burnham and Anderson (p. 70) classify the support for the associated model as somewhere between “considerably less” and “essentially none.”

Another way of viewing the QAIC values is to convert the differences between models to Akaike weights (Burnham & Anderson, 2002, p. 75; Wagenmakers & Farrell, 2004),

$$w_{i}(\text{QAIC}) = \frac{\exp\left[-\frac{1}{2}{\Delta}_{i}(\text{QAIC})\right]} {{\sum}_{i = 1}^{k} \exp\left[-\frac{1}{2}{\Delta}_{i}(\text{QAIC})\right]}, $$

where the summation in the denominator extends over the k models in the candidate set. The Akaike weights can be interpreted (with caution) as the probability that Model i is the best model in the set. Table 1 shows the Akaike weights for the four models. The weights suggest that Models 1 and 3 are the most probable models, Model 2 is somewhat less probable, and that Model 4 is highly improbable.

Individual fits

When the conditions for valid quantile-averaging are satisfied, the resulting group distributions have some of the nice properties of robust estimators. They identify essential structural regularities in the data while de-emphasizing the influence of outliers and other aberrations, and serve as a corrective to any tendency to overfit and overinterpret the idiosyncrasies of individual participants. Nevertheless, it is important to assess whether a model can account for performance at the individual level and whether inferences made at the group level continue to hold. Individual participant data, as noted earlier, tend to be overdispersed relative to the associated theoretical sampling distributions. Smith (1998) used a method from McCullagh and Nelder (1989) and found that individual two-choice psychophysical data were overdispersed by a factor of 1.2 to 2.5 relative to the theoretical variance of the binomial distribution. Ratcliff and Childers (2015) reported that Chi-square fits of the 1D diffusion model to individual participants tend to vary between the critical value of Chi-square (roughly \(\phantom {\dot {i}\!}df + 2\sqrt {2\,df}\)) and twice the critical value, implying overdispersion of around 1.3 to 2.6, in agreement with Smith’s estimates.

Using again the fit of Model 1 as a criterion for overdispersion, the estimates of q suggest that the individual double-target data were more variable than is usually found in tasks involving only a single stimulus. The overdispersion estimates, which are given with individual fit statistics in Table 6 , ranged from 1.6 to 4.1. The lower part of Table 3 shows the average of the fit statistics for each of the four models. The average \(\phantom {\dot {i}\!}w_{i}(\text {QAIC})\) weights are averages from computing Akaike weights for each participant individually. The individual weights are shown in Table 6. These weights suggest that, contrary to the group fit, the best fits at the individual level were provided by Model 4, which had ten free mean drift rates but only two drift variability parameters. Table 6 shows that the \(\phantom {\dot {i}\!}\text {QAIC}\) preferred Model 4 for five of the six participants and that in all of these cases, the evidence was overwhelmingly in favor of Model 4. The estimated parameters for the group and individual participants for all four model are shown in Table 7.

To illustrate the range of variability among individuals, Fig. 8 shows the fits of Model 1 for the best-fit (S1) and worst-fit (S6) participants (Fig. 8a and b, respectively). These plots show the differences among participants and highlight the presence of structure in the RT distributions that is more complex than is usually found in decision tasks of this kind. Like the group data in Fig. 7, there are large differences between target-present and target-absent trials, which are particularly apparent on the target-absent trials for \(\phantom {\dot {i}\!}S6\), whose false alarm rate at the lowest stimulus contrast exceeds 60%. The model is able to predict this pattern of performance because of the differences in the drift rates for noise and signal stimuli. Many large-sample cognitive studies would be inclined to exclude participants like \(\phantom {\dot {i}\!}S6\) on the grounds of excessive bias, but we believe it is more principled to try to characterize challenging data of this kind rather than to exclude them.

Despite some obvious places where the model misfits the data, it nevertheless captures much of the essential structure fairly well. It fails to capture the tail quantiles for \(\phantom {\dot {i}\!}S1\) on target-absent trials and the leading edge of \(\phantom {\dot {i}\!}S6\)’s target-absent error distributions (false alarms). The shape of the leading edge may be due to fast guess responses, which are outside the scope of model. The model does capture the atypical double-concave “ bird-in-flight” pattern of \(\phantom {\dot {i}\!}S1\)’s performance on target-present trials; this pattern of RTs indicates that the slowest correct and error responses occurred at intermediate levels of stimulus contrast. The model also captures the unusual convex bowing of the RT distribution quantiles for \(\phantom {\dot {i}\!}S6\) on target-present trials and the marked asymmetry in \(\phantom {\dot {i}\!}S6\)’s performance across the two stimulus types. In sum then, Model 1 appears to be performing fairly well qualitatively and to capture the main features of the individual participant data.

As a contrast to Model 1, Fig. 8c shows the quantile-averaged fitted values of Model 4, which was the best individual model, together the quantile-averaged group data. Although this model was preferred by the QAIC for most individual participants and again captures the main regularities of the data, qualitatively it appears somewhat underfit, in the sense that there are some aspects of the data that it not accounting for. It overpredicts accuracy at the lowest contrast on target-present trials; it underpredicts it at the highest contrast for on target-absent trials, and it fails to capture the shapes of the upper RT distribution quantiles for these stimuli.

Fig. 8
figure 8

Fits of Model 1 to double-target individual participant data. Best (a) and worst (b) individual fits of Model 1. Best: Participant S1, G2 = 126.7; worst: Participant S6, G2 = 314.5. c Quantile-averaged individual fits of Model 4; G2 is mean across participants. Other details as for Fig. 6

Summary of double-target fits

Although there are points of disagreement between the fits to the group data and the individual data, modeling of the data at either level provides strong evidence of a pronounced asymmetry in drift rates on target-present and target-absent trials. Model 1 allowed all 20 mean drift rates to vary; Model 2 allowed digit drift rates on target-present trials and letter drift rates on target-absent trials to vary and set all other drift rates to zero. For the group data, the QAICs for these two models were very similar and for the individual data, on average, the QAICs preferred Model 2. We also considered models in which the letter and digit drift rates had equal but opposite signs but we have not reported them because the fits were so poor. At the individual level, a model of this kind provided a credible account of the data for one participant only.

We consider the magnitude of the estimated drift rates in a later section, but, as we noted, asymmetry in the drift rates on target-present and target-absent trials imply the decision template is configural rather than local; it suggests that participants are searching for particular diagnostic patterns of stimulus elements. Our biased-template formulation represents an attempt to capture these regularities theoretically.

At the group level, similar fits were obtained from Model 2 and Model 3. Model 2 forced all of the mean drift rates for letter stimuli on target-present trials and digit stimuli on target-absent trials to zero, but allowed the drift rate variance to change with contrast. Model 3 allowed all 20 mean drift rates to vary but forced the drift rate variances of letters and digits to be the same for all contrasts. The fact that these two models performed similarly at the group level and also, on average, at the individual level, imply that either of these two representations can capture the main features of the data about as well as each other. That is, for the nondiagnostic parts of the decision template, contrast dependencies in drift-rate mean and in drift-rate variance have similar effects. This suggests that those components of the decision template that are not essential to distinguishing between target-present and target-absent trials are coded psychologically as “don’t care.” The speed and accuracy of responding is affected by across-trial variance in these parts of the decision template, but the value of the mean does not seem to be critical for performance. This contrasts with formal likelihood-ratio models of the double-target task Corbett and Smith (2017), in which positive and negative instances contribute equally to the joint likelihood.

The tradeoff in drift rate mean and variance seen at the group level was not as apparent at the individual level. On average, the QAICs for Models 2 and 3 were almost the same, but the QAIC-preferred model at the individual level was Model 4, in which the drift-rate means for nondiagnostic stimulus elements was zero and the drift-rate variances for letters and digits were constant. The fact that Model 4 was the model preferred by the QAIC suggest that across-trial variance in drift rates for the nondiagnostic parts of the decision template is not needed to account for the data—although, as Fig. 8c shows, the resulting model appears qualitatively as underfit. In view of the different results for the group and individual fits, we draw no strong inferences about the need for contrast-dependent changes in \(\phantom {\dot {i}\!}\eta \) in the model. However, the fits were clear in the need for different values of \(\phantom {\dot {i}\!}\eta \) for letter and digit stimuli.

The single-target task

As we noted earlier, the single-target task was more complex to model than the double-target task because of the need to account for response bias, which we conceptualized as a change in the placement of the decision boundaries on the surface of the hypersphere. We can think of the associated decision regions, in which the boundaries no longer align with the cardinal axes, as “generalized orthants.” Corbett and Smith (submitted) found a need for a similar kind of bias in modeling the single-target conditions from Corbett and Smith’s (2017) study using independent parallel 1D diffusion processes. The need for bias arises in independent channels models for the same reason as it does in the hyperspherical model, because there are many more ways in which processing can terminate with a target-present response than a target-absent response. As we noted in the “Does Dimensionality Matter?” section, the hyperspherical diffusion model and parallel independent-channels models share similar properties, because both are comprised of independent evidence-accumulation components.

Group fits

We considered four models for the single-target task which paralleled those for the double-target task, which varied in the number of drift rate means and variances. To accommodate the structural asymmetry of the decision rule, we assumed that the response regions were the generalized orthants of the hypersphere, in which the decision bounds need not align with the cardinal axes. In a 2D version of the task, like the one shown in Fig. 3, the target-absent trials are associated with the stimulus configuration nn and the drift vector \(\phantom {\dot {i}\!}\{-\mu ,-\mu \}\). The target-absent responses are responses made in quadrant \(\phantom {\dot {i}\!}R_{3}\), with decision bounds \(\phantom {\dot {i}\!}\pi \) and \(\phantom {\dot {i}\!}3\pi /2\). We can introduce orthant bias into this model by allowing the decision bounds to vary between \(\phantom {\dot {i}\!}3\pi /4\) and \(\phantom {\dot {i}\!}7\pi /4\) as \(\phantom {\dot {i}\!}\pi - c\pi /4\) and \(\phantom {\dot {i}\!}3\pi /2 + c\pi /4\), with \(\phantom {\dot {i}\!}-1 < c < 1\). In other words, positive and negative values of c symmetrically increase or decrease the size of the target-absent response region, respectively. In the 4D model, we parameterized the other two phase angles in a similar way. The region corresponding to target-absent responses is orthant 15 of Table 1. To facilitate estimation performance we treated c as a cumulative logistic function of an unbounded parameter, \(\phantom {\dot {i}\!}\lambda \), that could vary over the entire real line. Parameterized in this way, unbiasedness is represented by \(\phantom {\dot {i}\!}\lambda = 0\) and an orthant bias towards noise responses is associated with \(\phantom {\dot {i}\!}\lambda > 0\). When the drift rates for signal and noise stimuli are equal in magnitude and opposite in sign, an orthant bias of around \(\phantom {\dot {i}\!}\lambda = 0.5\) leads to similar frequencies of target-present and target-absent responses.

Table 5 summarizes the four models for the task with their associated parameters and Table 4 gives the details of the mapping of parameters to conditions. Models 5 to 8 correspond to Models 1 to 4 of Table 3, but unlike those models, the maximum number of mean drift rates in the most highly parameterized model was 15 rather than 20. These were the mean drift rates for signals on target-present trials, noise on target-present trials, and noise on target-absent trials, one for each level of contrast. As for the double-target task, we considered models with ten and two drift rate standard deviations. With the exception of the orthant bias, \(\phantom {\dot {i}\!}\lambda \), the other parameters are as for the double-target task.

Table 5 Models for the single-target task

The pattern of model fits for the single-target task is somewhat clearer than the corresponding pattern for the double-target task. There is an overwhelming preference for the simplest of the four models, which had ten \(\phantom {\dot {i}\!}\nu \) parameters and two \(\phantom {\dot {i}\!}\eta \) parameters. The model assumes separate mean drift rates for signal stimuli on target-present trials and for noise stimuli on target-absent trials, but all of the mean drift rates for noise stimuli on target-present trials were zero. Allowing these mean drift rates to vary freely produced no improvement in fit. As with the double-target task, allowing the drift rate standard deviation to vary with stimulus contrast produced no improvement over models in which there were just two drift rate standard deviations, one for signal stimuli and one for noise stimuli. Figure 9 shows quantile-probability plots of the fit of Model 8 to the group data, together with the best and worst individual fits.

Fig. 9
figure 9

Fits of Model 8 to single-target group data and best and worst individual participants. a Quantile-averaged group data. Best (b) and worst (c) individual fits. Best: Participant S4, G2 = 136.0; worst: Participant S1, G2 = 272.2. Light gray symbols are target-present responses; dark gray symbols are target-absent responses

The estimated model parameters show evidence of the same kind of asymmetry in mean drift rates that we found previously. The asymmetry can be characterized by assuming that it reflects the properties of the decision template for the task: on target-present trials, it consists of a single digit, with rest of the slots in the template coded as “don’t care,” and on target-absent trials it consists of a quartet of letters.

Individual fits

The individual participant fits show the same pattern as the group fits. The average \(\phantom {\dot {i}\!}G^{2}\), QAIC, and Akaike weights are shown in Table 5; the individual values of these statistics are shown in Table 6, and the associated parameter estimates are given in Table 8. For all six participants the preferred model was Model 8, with individual \(\phantom {\dot {i}\!}w_{i}\)’s ranging from .592 to .987. Both at the group and at the individual level, the preferred model was one with ten mean drift rates and two drift-rate standard deviations. We turn to the implications of the single-target and double-target fits in the following section.

Table 6 Individual model fits

Mean drift rates

Figure 10 shows the estimated mean drift rates from the individual fits, averaged across participants for the double-target and single-target tasks. For each task we have shown the estimates for the most and the least highly parameterized of the four models. The estimates for the remaining models, which we have not shown, were similar. The plots emphasize the asymmetry in stimulus information that drives the decision process on target-present and target-absent trials. On target-present trials (two-digit trials in the double-target task and single-digit trials in the single-target task), the diagnostic information is carried primarily by digit stimuli, whereas the reverse is true on target-absent trials. There is a further asymmetry in the dependence of mean drift rates on stimulus contrast: Mean drift rates for signal stimuli change much more rapidly with stimulus contrast than do those for noise stimuli. This asymmetry implies that individual digits carry a disproportionate amount of information about the stimulus identity.

Fig. 10
figure 10

Mean drift rates, ν, averaged across participants for the double- and single-target tasks. a Model 1, double-target task; b Model 4, double-target task; c Model 5, single-target task; d Model 8, single-target task. a and b 2s = signal stimuli, target-present trials; 2n = noise stimuli, target-present trials; 1s = signal stimuli, target-absent trials; 1n = noise stimuli, target-absent trials; c and d: 1s = signal stimuli, target-present trials; 1n = noise stimuli, target-present trials; 0n = noise stimuli, target-absent trials. Error bars are ± 1SEM

The one exception to this general observation is the nonmonotonic contrast dependence for signal stimuli (1s) for Model 1 in the double-target task in Fig. 10a. The fit of Model 1 to the group data in Fig. 7 showed the same nonmonotonicities as those in Fig. 10a, as did the fits of Model 3 (20ν, 2η) to the individual and group data, so it appears to be fairly robust and not an artifact of this particular fit. What these estimates appear to be showing is that, at low contrasts, single digits provide weak evidence for a target-present response, but at high contrast, the sign of the mean drift rate reverses and a single digit instead begins to provide evidence for a target-absent response. At the highest level of contrast, the magnitude of the 1s drift rate exceeds the 1n drift rate, implying that the single digit target carries more evidence that the trial is a target-absent trial than do any of the three letter distractors individually. This kind of reversal seems paradoxical if drift rates are interpreted as depending only on the local properties of individual stimulus elements, but becomes understandable if they also depend on their configuration: At high contrast, the best evidence for a trial being a single-target trial may be a single, clearly perceived digit. Gleitman and Jonides (1976, 1978) and Jonides and Gleitman (1972, 1976) reported evidence of categorical pop-out effects of this kind in visual search.Footnote 9

Another way to depict the mean drift rates, as shown in Fig. 11, is in a polar plot. Although only two of the four components of the drift rates can be shown in such a plot, they suffice to characterize how the drift rates behave across stimulus conditions in the two tasks. In generating the plot, we have assumed that when there were two digits presented they appeared in the first two positions (quadrant \(\phantom {\dot {i}\!}R_{1}\)) and when there was a single digit it appeared at the first position (quadrant \(\phantom {\dot {i}\!}R_{2}\)). The drift rates are shown as vectors whose lengths represent the norms of the mean drift rates, \(\phantom {\dot {i}\!}\boldsymbol {\nu }\), and whose directions represent their phase angles. The fact that the drift vectors for target-present stimuli in the double-target task appear as collinear is a function of the polar plot, which does not show the third and fourth components of the drift rates, but the collinearity of the drift vectors for target-absent stimuli in the single-target task is an accurate reflection of the way in which we paramaterized the drift rates (Table 4), using a single parameter at each level of contrast.

Fig. 11
figure 11

Polar plot of the first two components of the mean drift vectors for double-target and single-target tasks. For both tasks, the collinear light gray vectors represent drift rates in displays in which the first two components were the same: two digits in the double-target task and two letters in the single-target task. The variable dark gray vectors represent drift rates for displays containing one digit and one letter. The figure assumes that in mixed displays of this kind, the digit is in the first position and the drift vector lies in quadrant R2. (In the mapping from polar to Cartesian coordinates of Eq. 3, X1 is the vertical axis and X2 is the horizontal axis.)

If the mean drift rates were veridical representations of stimulus contrast, then the drift vectors for target-absent stimuli in the double-target task and for target-present stimuli in the single-target task would have phase angles of \(\phantom {\dot {i}\!}3\pi /4\). In both tasks, however, these vectors are rotated away from veridical in a direction that increases the Euclidean distance between the two classes of stimuli. For the double-target task, this difference increases with stimulus contrast, but in the single-target task it is present at all levels of contrast and its magnitude is almost constant. Models 2 and 4 for the double-target task, and Models 6 and 8 for the single-target task, provide a way of characterizing the magnitude of the observed rotation theoretically. These four models assume that the mean drift rates for letters on double-target trials and digits on single-target trials are zero. They predict that the phase angles of target-present drift rate vectors in the single-target task will be \(\phantom {\dot {i}\!}\pi /2\) and of target-absent drift rate vectors in the double-target task will be \(\phantom {\dot {i}\!}\pi \). The drift-rate vectors for the single-target task in Fig. 11 show that this property is close to being satisfied at all levels of stimulus contrast. This finding agrees with the strong and consistent support found for Model 8 in both the group and individual fits. Relative to Model 6, Model 8 makes the extra assumption that drift-rate variance is independent of stimulus contrast, which was also supported by the model fits.

In comparison to the single-target task, the mean drift rate vectors for the double-target task are more variable. The phase angles for single-digit stimuli are less than \(\phantom {\dot {i}\!}\pi \) at the lowest level of contrast, almost exactly equal to \(\phantom {\dot {i}\!}\pi \) at the three intermediate levels of contrast, and greater than \(\phantom {\dot {i}\!}\pi \) at the highest level of contrast. The substantive hypothesis embodied in Models 2 and 4 is that the fit is not improved by allowing these phase angles to differ from \(\phantom {\dot {i}\!}\pi \). As we noted previously, the pictures obtained from the group and individual fits disagree on this particular point. The group fits imply that these phase angles are contrast dependent, but the individual fits imply that the phase angles are uniformly equal to \(\phantom {\dot {i}\!}\pi \).

The estimated phase angles in Fig. 11 help shed light on why we had difficulty in distinguishing between alternative models for the double-target task: Except at the highest level of contrast, where the phase angles is around \(\phantom {\dot {i}\!}5\pi /4\), all of the other phase angles are either close to, or equal to \(\phantom {\dot {i}\!}\pi \). Figure 11 suggests that the lack of agreement between the group and individual fits largely comes down to what happens at high contrast, where perceptual clarity was high. As we suggested above, these conditions may have produced a categorical pop-out effect, leading to the otherwise paradoxical finding that a single, clearly perceived digit can provide more evidence for the trial being a target-absent trial than do any of the three letters individually. Apart from this point of disagreement between models, our results provide consistent support for the idea that nondiagnostic stimulus elements make either a small or negligible contribution to mean drift rates.

Had it been the case that the drift vectors for target-present and target-absent of stimuli in the two tasks were oriented at \(\phantom {\dot {i}\!}\pm 180^{\circ }\) to each other, then this could have been interpreted as showing that the decision process projects the stimulus information onto a lower-dimensional subspace and that a simpler 1D diffusion model might be an appropriate one for the task. However, the drift vectors in Fig. 11 suggest that any dimension reduction is at best partial rather than complete and that the stimulus information continues to be represented psychologically in a higher-dimensional space. The drift vectors for the single-target task are more regular than those for the double-target task and are consistently oriented at around \(\phantom {\dot {i}\!}3\pi /4\) (135) apart, suggesting that the dominant component of the discriminative information used to decide between the two alternatives is encoded in a single stimulus dimension. That this is so is consistent with Corbett and Smith’s (submitted) finding that a 1D decision model can provide a satisfactory model for the single-target task.

The greater complexity of the drift rate vectors for the double-target task suggest that any dimension reduction performed by the decision process, if it occurs, happens only at the highest level of stimulus contrast, where accuracy is around \(\phantom {\dot {i}\!}85-90\%\). At lower levels of contrast, the drift vectors show evidence of being higher dimensional. The greater complexity of the drift rate vectors for the double-target task is consistent with the greater complexity of the associated RT distributions in Figs. 7 and 8, and may explain why we have not been able to find a satisfactory parameterization of the 1D diffusion model for these data.

Discussion

The theory presented in this article seeks to provide a new way to conceptualize decisions about the contents of multi-element arrays. Our orientation is to view a stimulus array comprised of multiple elements as a single multidimensional stimulus, whose properties are represented mathematically by a point in a vector space. The hyperspherical model represents the process of making a decision about such an array as a single, multidimensional evidence accumulation process with independent components, each of which is a diffusion process. The outcome of the decision process is a description of the contents of the display as a configuration of digits and letters. We assume that the reduction in dimensionality needed to express the decision overtly as a two-choice response involves a response selection process (Sternberg, 1969) that begins once the evidence accumulation process reaches the decision boundary. The response selection process carries out the mapping from the orthants of the decision space to overt responses depicted in Table 1. The fits reported in the previous section show that the hyperspherical model provides a good account of the distributions of RT and the choice probabilities from both the double-target and single-target versions of the task. The fact that the model provides a good fit to our data implies that letter-digit discrimination in four-item arrays can be performed in parallel, without attention switching or any other kind of serial processing—a finding that agrees with the conclusions of Dosher et al., (2010) who studied parallel and serial processing in an orientation discrimination task.

The most striking and unexpected finding from our study was the evidence of what we have termed “configural” processing of four-item displays. We use this term to express the idea that the way in which an item at one location is processed depends on the contents of other locations, that is, on the configuration of the display as a whole. In the hyperspherical diffusion model, the rate of evidence accumulation is given by a vector-valued drift rate, \(\phantom {\dot {i}\!}\boldsymbol {\mu }\), whose components represent the evidence for a digit or a letter at each display location. Our fits showed that the components of drift rate depend on the number of similar elements (letters or digits) in the array. These kinds of dependencies could not arise if the components depended only on the local features of the display (i.e., on the contents of individual locations). We considered several models for the two tasks, which differed in the way in which they parameterized mean drift rates and drift rate variance, but all of the viable alternatives assumed that the drift-rate components for digits in the double-target task depend on whether there were one or more digits in the display (Fig. 10a and b). We also considered models in which these components were constrained to be equal, but they performed so badly that we have not presented them.

One plausible source of configurality in our double-target task is the one identified by Corbett and Smith (2017) in their study of the double-target detection deficit. They ascribed the deficit to the attentional selection processes that control access to visual working memory, interacting with working memory capacity limitations. We noted earlier that letter-digit discrimination yielded one of the smaller double-target deficits found in Corbett and Smith’s study, but the effect was nevertheless present, as can be inferred by comparing the estimated target-present drift rates (filled squares) across tasks in Fig. 10. For both of the models shown, the mean drift rates for members of a pair of digits (2s) in the double-target task (Fig. 10a or b) are less than the corresponding drift rates for single digits (1s) in the single-target task (Fig. 10c or d). This difference implies that digits as members of a target pair are less well represented in memory than are single-digit targets in isolation.

The double-target deficit is one of the most compelling pieces of evidence supporting two-stage theories of search, of which the best-known is probably Wolfe’s guided search theory (Wolfe et al., 1989, 2010). Two-stage theories assume that attentional selection is carried out by preattentive filtering processes that select stimuli by matching them against a cognitive representation of the search targets. Stimuli that pass the filter have privileged access to later processing stages; nonmatching stimuli are either excluded or are represented in attenuated form. Corbett and Smith (2017) showed that the magnitude and the time course of the double-target deficit in their study were well described by the competitive interaction model of attention and decision-making of Smith and Sewell (2013), which implements preattentive filtering using competitive-interaction dynamics (Grossberg, 1980, 1988).

Preattentive filtering provides a natural account of the nonlocality in drift rates we estimated in our models. A basic tenet of two-stage theories is that the quality of stimulus representations in the decision process will not only depend on how well the stimuli are encoded perceptually but will also depend on how well they match the cognitive representations of the search targets. Nonlocal interactions arise in models like that of Smith and Sewell (2013) because of competition among stimuli for access to later processing stages. In their model, competition for access to visual working memory leads to normalization of stimulus representations (Carandini & Heeger, 2012). Normalization is a process in which the activity in a mechanism representing a stimulus is reduced in proportion to the sum of the activity in other, related mechanisms. In Smith and Sewell’s model, competition among stimuli for access to visual working memory leads to normalization of memory trace strengths. The trace strengths determine the drift rates in a diffusion decision model, which we modeled here as diffusion in a hypersphere. Indeed, one of our reasons for developing the hyperspherical model was to provide a decision model that is compatible with the parallel, interactive view of attentional selection in the Smith and Sewell model.

Our current findings extend the picture from Corbett and Smith (2017) study by showing the configural nature of attentional selection. We characterized the goal of the double-target task as search for a pair of digits or a triplet of letters. These representations, because they involve combinations of stimulus elements rather single elements in isolation, are configural rather than local, in our sense of the term. The fact that performance in our tasks could be characterized by models in which the means of the drift rates for the nondiagnostic parts of the decision template—letters on target-present trials and digits on target-absent trials—are uniformly zero is consistent with the idea that drift rates are a product of attentional filtering by a configural decision template. The means of stimulus elements that do not pass the filter are set to zero: They may contribute noise to the output of the filter but they do not contribute to the information entering the decision process in any systematic way.

We have not attempted to relate the estimates of drift rate from our diffusion-model fits to any particular model of preattentive filtering; instead, we characterized them in general terms as the outputs of a biased decision template. To us, however, the most plausible interpretation of this kind of configuration-dependent bias is that it is a product of preattentive filtering, possibly via a process of competitive interaction like the one in the Smith and Sewell (2013) model. Indeed, as discussed by Corbett and Smith (submitted), the scaling relationships among the drift rates obtained from our model fits can potentially be explained by a process of trace-strength normalization acting on the output of a biased decision template like the one in Eq. (14). Viewed in this light, the “argmax” function in Eq. (14) can be interpreted as selecting the winner of a competition on the basis of a similarity-matching rule.

The limits of unitization

A basic tenet of the hyperspherical model is that a display comprised of multiple elements is treated in a unitary way, as a single decision about a vector-valued perceptual object. A question we have not addressed concerns the limit on the number of stimulus items that can be unitized in this way. Although we have no direct evidence that bears on this question, we think the maximum is likely to be around four items, which was, by design, the size of the stimulus displays used in Corbett and Smith’s (2017) study. According to many estimates, four items is the item-capacity limit of visual working memory (Cowan, 2001; Luck & Vogel, 1997; Shibuya & Bundesen, 1988). The 4D diffusion model presupposes that items enter the decision process in parallel. In tasks that use brief stimulus displays, they must all be represented in visual working memory—although, as the preceding discussion highlighted, preattentive filtering processes may affect the means of the individual item representations and may even force them to zero.

Whether or not visual working memory actually has a hard item-capacity limit remains highly controversial (Oberauer & Lin, 2017; van den Berg et al., 2014), but even if no such limit exists, the costs associated with forming and maintaining representations of more than four items may increase sharply once this number is exceeded. These costs have been conceptualized both as resource costs (Donkin et al., 2016) and as interference costs (Endress & Szabó, 2017; Oberauer & Lin, 2017). Perhaps coincidentally, four items is the usual estimate of the subitizing limit, that is, the size of the set of items whose cardinality can be apprehended in a single glance (Trick & Pylyshyn, 1994). It is also the size of the candidate set of items that later versions of guided search theory suggest can be searched in parallel (Wolfe et al., 2010).

As well as depending on the number of items in the display, the unitization of stimuli in the decision process may be affected by their spatial arrangement and physical separation. Mental architecture studies using systems factorial technology have found that whether stimuli composed of feature pairs are processed serially or in parallel depends in part on whether the features are spatially coincident or separated, with increased evidence of serial processing when they are more widely separated (Fific et al., 2010; Little et al., 2011; Moneer et al., 2016). Complementing these findings, attempts to model visual search for targets in large displays using the diffusion decision model by Moran et al., (2013) and Schwarz and Miller (2016) have had to postulate additional components of RT attributable to serial attention switching and premature search termination in order to fit data.

We characterized the decision process in the letter-digit task using a 4D diffusion model because a four-dimensional evidence space seems the natural setting in which to model decisions about four-item displays. Conceivably, it might be argued that even four dimensions is insufficient, because there were actually 16 different stimuli that could appear at each location and, if each stimulus were associated with its own evidence accumulator, then there would be \(\phantom {\dot {i}\!}16^{4}\) rather than \(\phantom {\dot {i}\!}2^{4}\) unique stimulus configurations that the decision process would have to identify. Apart from the analytic intractability of this kind of model, which makes it a priori unappealing, we think there are good reasons for regarding it as unlikely. Corbett and Smith’s (2017) use of letter-digit discrimination in their double-target deficit study was based on Duncan’s (1980) article, which drew on earlier work by Jonides and Gleitman who studied such discriminations in visual search. They found that categorically distinct stimuli like letters and digits need not be identified in order to be discriminated (Gleitman & Jonides, 1976; Jonides & Gleitman, 1976), and can be discriminated on the basis of their category membership alone (Jonides & Gleitman, 1972; Gleitman & Jonides, 1978). We suggested that the sign reversal of the mean drift rates for high-contrast, single-digit stimuli in the double-target task may be due to a pop-out effect of this kind.

Consistent with this interpretation, Corbett and Smith (submitted) found that letter-digit discrimination produced the same kinds of quantile-probability functions as three other decision tasks that used only single pairs of stimulus alternatives. There was no evidence of any difference among the RT distributions from these tasks that might have indicated a change in the dimensionality of the decision process in letter-digit discrimination. Taken together, these results suggest that it is appropriate to view letter-digit discrimination as a unidimensional rather than a multidimensional decision task and imply that a 4D vector space is the appropriate representation of a four-element version of it.

Is the hyperspherical model the best model?

It is not our intention at this point to claim that the 4D diffusion model is the best among potential candidate models for the task. As we noted in the Introduction, Corbett and Smith (submitted) successfully fitted a version of the 1D diffusion model to the data from the single-target task, but we have not yet been able to find a model for the double-target task that fits the data with a consistent and interpretable set of parameters. We were motivated to develop the 4D model as an alternative to existing models by the conjecture that decisions about multiattribute stimuli may involve multidimensional evidence accumulation processes and the fact that the 4D diffusion model provides a theoretically elegant representation of processes of this kind. We have not yet attempted a systematic empirical comparison of the 4D model and plausible alternative models for all of the data reported in Corbett and Smith (2017) study. While a large-scale evaluation of this kind would be useful, it is beyond the scope of this article and will form the subject of future work. Our immediate purpose here was to present the theory of the 4D model and to show that it provides a plausible account of performance in the double-target task, which is one of its areas of potential application.

The scope of the 4D model is, of course, much greater than we have been able to illustrate here. As we noted in the Introduction, the 2D model of Smith (2016) and its 4D generalization provide a stochastic version of Ashby and Townsend’s (1986) general recognition theory, which seeks to explain how people make decisions about stimuli composed of multiple features. Cognitive tasks using these kinds of stimuli are widely studied (e.g., Soto and Ashby (2015) and Soto et al., (2015), and an analytically tractable model that can make both RT distribution and accuracy predictions is likely to have broad theoretical appeal. Another area of potential application is to studies of visual attention. As we noted earlier, in Wolfe’s guided search theory of attention, preattentive filtering processes deliver a subset of candidate items to a decision process, which searches the subset to determine whether or not it contains a target. In early versions of guided search (Wolfe et al., 1989), the search of the candidate set was assumed to be serial, but in later versions (Wolfe et al., 2010), the search is assumed to be parallel and to be performed by multiple independent diffusion processes. Thornton and Gilden (2007) assumed a similar kind of parallel search model in their study of redundancy gains in visual search. The 4D diffusion model instantiates a parallel search mechanism of precisely this kind. It can therefore provide a theoretically principled and analytically tractable characterization of RT and accuracy from visual search tasks at a distributional level.

Conclusions

In this article, we have extended the circular diffusion model of Smith (2016) to higher dimensions. We have shown that the analytic properties of RT distributions and accuracy for the circular model carry over straightforwardly to higher-dimensional spaces. We have shown that the resulting hyperspherical model, when equipped with categorical decision bounds, can provide a model of multiattribute decision-making that possesses a number of nice theoretical and empirical properties. We argued that decisions about the contents of four-item displays can be viewed as unitary, multidimensional decisions about vector-valued stimulus objects and have shown that the RT distributions and accuracy from such a task could be well described by the hyperspherical model. The success of the model in capturing these data is notable, because they exhibit several unusual features not normally found in tasks with single stimuli and, to date, we have been unable to capture them in a theoretically satisfying way using extensions of existing models of two-choice decision-making. Our model fits have highlighted the role of nonlocal, configural stimulus properties in determining the drift-rate vectors that characterize the rates of evidence accumulation. We showed that the resulting configural drift rates could be represented as the outcomes of a matching operation between stimuli and the decision template, in which the latter is conceived of as a set of diagnostic patterns that distinguish between the response categories. As well as providing a new model for decisions about multielement arrays, the hyperspherical model is likely to have a variety of applications in areas such as cognitive categorization and visual attention.