Numerosity judgment is one of the core cognitive abilities of human beings and primates (Hauser, Tsao, Garcia, & Spelke, 2003; Sawamura, Shima, & Tanji, 2002). Human subjects are far more sensitive to numerosity than to either density or area (Cicchini, Anobile, & Burr, 2016). This priority of sensitivities to numerosity stems from the earlier development of capacities for discriminating between large and small numbers, even in newborn babies (Izard, Streri, & Spelke, 2014). While the investigation of numerosity judgment in individual sensory modality has become a hot topic, recent studies have started to tap into the numerosity perception in tactile modality and even cross different sensory modalities (Gallace, Tan, & Spence, 2006, 2007, 2014; Spence, Nicholls, & Driver, 2001; Spence, Shore, & Klein, 2001). This line of investigation is important for both theoretical and practical means. On the one hand, for human operators, numerosity judgment is a general perceptual processing of unimodal and bimodal displays. Revealing the role of (cross-modal) attention will potentially answer the current debate of whether numerosity judgment might rely on a unitary amodal system or, alternatively, hinge on a specific/individual sensory modality. On the other hand, modern intelligent interface design aims to reduce the attentional load for numerosity discrimination. The design of a friendly interface with reduced workload of number perception will streamline performance (Ferris, Penfold, Hameed, & Sarter, 2006; Gallace et al., 2014; Ho, Reed, & Spence, 2006; Ho, Tang, & Spence, 2005; McCrickard & Chewar, 2003; Rabinowitz, Houtsma, Durlach, & Delhorne, 1987). However, the debate on the potential differential attentional mechanism for subitizing (discriminating numbers less than or equal to 4) and estimation (discriminating numbers larger than 4), that is, the dependency on estimated quantity, has not been fully resolved (Feigenson, Dehaene, & Spelke, 2004; Piazza, 2010). Moreover, numerosity estimation ability is intrinsically related to the capacity of working memory (Spitzer, Fleck, & Blankenburg, 2014). It remains to be seen whether the precision of estimation is subject to task/modality-dependent working memory performance.

The process of nonverbal numerosity judgment mobilizes two systems (Piazza, 2010). Generally, when the number is small (i.e., less than or equal to 4), human observers can judge the numbers quickly and accurately (Carey, 2009) through the parallel individual system (in which each object is represented individually, i.e., subitizing). When the number is large (i.e., greater than 4), observers would rely on the approximate number system (i.e., estimation) and show more error when the estimated quantity becomes larger (Dehaene, 2011). Though the average cutoff point between the two systems is 4, recent studies have shown that this point is not fixed and may deviate around 4 (Hyde, 2011). Subitizing is often considered to be preattentive, but unlike estimation, it is very sensitive to attentional load, as it is necessary to form representations for each object individually (Anobile, Turi, Cicchini, & Burr, 2012; Burr, Anobile, & Turi, 2011; Burr, Turi, & Anobile, 2010; Hyde & Spelke, 2011; Piazza, Fumarola, Chinello, & Melcher, 2011). Estimation, on the other hand, is an overall representation of the estimated quantity and is less dependent on attentional resources (Burr et al., 2010). However, some studies have assumed that subitizing and estimation are not entirely independent processes. Anobile et al. (2012) found that regardless of the number of stimuli, the estimation mechanism will always play a role during numerosity perception. When the number is within the subitizing range, additional attention resources can be used to facilitate fast and accurate judgment (Anobile et al., 2012). Burr et al. (2010) measured subjects’ accuracy and precision in making rapid numerosity judgments for target numbers spanning the subitizing and estimation ranges while manipulating attentional load. They found that in the high-load condition, Weber fractions were similar in the subitizing (2–4) and estimation (5–7) ranges, suggesting that preattentive mechanisms works at all ranges. However, in the low-load and single-task conditions, the attentional mechanisms operating over the subitizing and estimation ranges were not identical (Burr et al., 2010). Recent brain-imaging studies also supported a potential dual subsystem of numerosity perception and the dependence of subitizing on attentional engagement. There is a partial overlap of the neural substrates subserving small and large numbers’ numerosity judgment. Brain regions (e.g., inferior intraparietal sulcus [inferior IPS]; right temporoparietal junction [RTPJ]) related to subitizing are found to be associated with attentional effects (Ansari, Lyons, van Eimeren, & Xu, 2007; Vetter, Butterworth, & Bahrami, 2011), whereas brain areas that are activated independently of the perception of numbers are not associated with attentional effects (Corbetta & Shulman, 2002).

Attentional processing is ubiquitous across different sensory modalities. A debate on modality-specific or amodal attentional processing hypotheses have remained for more than 2 decades (Anobile et al., 2012; Arrighi, Lunardi, & Burr, 2011; Driver & Spence, 2004). Both hypotheses have found supports from empirical experiments. For example, in Larsen’s experiments, subjects’ performance of identifying two concurrent stimuli (visually presented or spoken) was not interfered with when they needed to complete both visual and auditory tasks simultaneously, as compared with their performance of doing visual or auditory task alone (Larsen, McIlhagga, Baert, & Bundesen, 2003). In another instance, when observers reported visual contrast and auditory pitch, their performances of discriminating visual contrast (just noticeable difference [JND]) were not affected by doing simultaneous pitch judgment tasks. However, when the interfering task and the target task were given in the same modality, their visual thresholds increased by two, and the auditory thresholds increased by four (Alais, Morrone, & Burr, 2006). The above studies favored a modality-specific model, with each sensory modality having its own independent but capacity-limited resources. Amodal attentional processing, however, assumes that the processing of events in one modality would deplete the attentional resources needed for processing events from another sensory modality. For instance, an electrophysiological study showed that when participants were asked to recognize complex, fuzzy stimuli, such as faces and voices, the interference of auditory to vision can be recorded. This cross-modal interference occurs at many levels, involving the activation of the fusiform gyrus, associative auditory areas (Brodmann’s area 22), and superior frontal gyrus (Joassin, Maurage, Bruyer, Crommelinck, & Campanella, 2004). In a functional magnetic resonance imaging (fMRI) study, Hein, Alink, Kleinschmidt, and Muller (2007) asked participants to make simple auditory and visual decisions using a four-key button box. They found that even observers do not need to make any competitive reactions, at the neural level, a simple auditory decision can also affect their visual processing, with featured blood-oxygen-level-dependent (BOLD) signal changes such as in the prefrontal cortex, middle temporal cortex, and other visual cortex (Hein et al., 2007). This suggests that we have limited attentional capacity for dissimilar tasks, and events as dissimilar as visual and auditory decisions should recruit similar ‘‘global neuronal workspace’’, which causes audio-visual interference.

Attentional processing is tightly linked with working memory capacity (Dalton, Lavie, & Spence, 2009). The prominent perceptual load theory suggests that the degree of interference of processing irrelevant distractors hinges on individual working memory capacity, given that the attentional resources are fixed (Lavie, 2005). This theory is proper not only for vision but also for tactile modality. Working memory capacity influences the degree to which irrelevant distractors are processed. When a cognitive resource is large enough, working memory capacity will lead to a stronger influence of irrelevant distractors. Furthermore, in a multisensory context, recent research suggests that cortexes that mediate working memory storage also process sensory signals (Katus, Grubert, & Eimer, 2017). Multisensory information is stored in distributed, perceptual brain areas and can, at the same time, be processed independently, suggesting that maintenance of working memory which consumes attentional resources is a modality-specific process (Katus & Eimer, 2016) .

In a cross-modal context, the numerosity perception for events in a target modality involves either the facilitation or interference effect from the concurrent attentional processing of task-irrelevant events from another modality. If the attentional resources are modality specific (i.e., modality-specific model), the performance of the similar numerosity discrimination in task-irrelevant modality (such as visual modality) would impose less, if any, interference of numerosity discrimination for the events in target modality (such as a tactile modality). If the attentional resources are distributed amodally, a centralized attentional processing would constrain the (re)distribution of attentional resources invested in each individual modality. Therefore, cross-modal interference will take place—performing a perceptual task in a single modality (such as auditory modality) will hamper the performance in another modality. Moreover, if the attentional processing is associated with a centralized working memory capacity, then the individual working memory capacity will influence/predict the performance of attentional processing, such as in numerosity discrimination. If attention-based maintenance of working memory representation is modality specific, the processing of numerosity discrimination in one modality (such as in tactile modality) would be largely independent of the working memory capacity associated with another modality (such as visual modality).

Therefore, to address above hypotheses, we implemented tactile numerosity discrimination in unimodal or cross-modal (together with visual modality) scenarios, with focused attention or divided attention. We examined the potential mechanisms of tactile subitizing versus tactile numerosity estimation and investigated the role of cross-modal attention in tactile numerosity perception. Moreover, we asked whether individual (visual/tactile) working memory capacity could influence the attentional processing in the target modality (tactile modality). Given that the demarcation point between subitizing and numerosity could be flexible around 4, we implemented two main experiments by presenting different number ranges of tactile dots, and implemented two additional control tests to examine the capacities of visual and tactile working memory. The results showed that tactile subitizing (not estimation) is subject to cross-modal attentional processing, and the precision of tactile subitizing, with concurrent interference from visual subitizing, is correlated with the capacity of visual working memory rather than tactile working memory.

Experiment 1

Method

Participants

Sixteen naïve, right-handed participants (age range: 19–27 years, mean 21.2 years, nine females) took part in this experiment and received 80 CNY for their time, which took approximately 120 min to complete. All participants reported normal tactile perception and normal or correct-to-normal vision. Informed consent was obtained from each participant. The study was approved by the Academic Affairs Committee of the School of Psychological and Cognitive Sciences at Peking University.

Apparatus and materials

We used Tactile Stimulator (Tactile Stimulator Dual 24, Neuro Device Group Inc., Warsaw, Poland) in the experiments. The stimulator had two rectangular panels, each containing 24 pins (1 mm×1 mm for each pin), with four pins in rows and six pins in columns. By the intricate pumping and pneumatic-driven system, each pin could be controlled to protrude or retract independently.

During the test, the participants covered their index fingers on the pins. A maximum of 12 pin taps would be given in a single trial. Those pin stimuli were located randomly across the upper or lower area of the rectangular panel, with eight potential configuration patterns (see Appendix, Fig. 11). Visual stimuli were five white dots presented on a black background. The size of each small dot was equal to the diameter of a single tactile pin. In 50% of trials, three dots were shown on the up area and two on the down area (i.e., UD task, which asked participants to discriminate whether there were more tactile dots in the upper or lower area); the arrangement was flipped around in the other 50% of trials. These stimuli were created in a pseudorandom way. For each given number there were eight different patterns of pins. The UD tactile stimuli were eight different patterns of five pins, and they were also pseudorandomly created (see Fig. 1, VT, for an example).

Fig. 1
figure 1

Experiment conditions. Unisensory-focused attention (T): Participants were asked to make numerosity judgment (NJ) for the tactile stimuli presented to their right index finger. Unisensory-divided attention (TT): Participants had to estimate the number of tap dots on their right index finger and discriminate whether there were more tactile dots in the upper area than in the lower area on their left index finger. Cross-modal divided attention (VT): Participants reported the numbers of tactile pins and compared the numbers of visual dots across the upper and lower part of the (illusory) rectangle, which overlaid the tactile stimuli

Design and procedure

We adopted within-participants design. Each participant came to our lab twice and completed a tactile-only task (T), tactile-tactile task (TT), and tactile-visual (VT) task (see Fig. 1).

Tactile stimuli and visual stimuli for numerosity judgment (NJ) were separately presented to participants’ right and left index fingers, and participants were required to report the numbers of both tactile pins and visual dots. In the T condition, a cueing beep was given to show the start of tactile pin taps. The taps were maintained for 1,500 ms and disappeared by retracting, together with a signaling beep. Participants used a keyboard to report the estimated numbers of stimuli. The intertrial interval (ITI) was 2,000 ms. In the TT condition, the NJ and UD tactile stimuli (duration of 1,500 ms) were simultaneously presented to both of the participants’ index fingers. Participants not only reported the number of tap pins on their right index fingers but also made a two-alternative forced-choice (2-AFC) whether the number of pins was more or less in the upper area than in lower area on their left index fingers. In the VT condition, analogous to condition TT, in addition to reporting the estimated number of tap pins, participants made 2-AFC to report whether the upper area had more visual dots than the lower area had. However, the duration of visual dots were set at 150 ms by a pilot test with subjective ratings, to maximize the possibility that the difficulty of estimating visual numerosity was comparable with the counterpart tactile task. The screen of the laptop was laid horizontally across the tactile panel so that the visual and tactile stimuli were maximally overlaid. A thin white line, crossing the intersection of the upper and lower area in the (imaginary) visual rectangle, was given prior to the presentation of visual dots. When the presentation of dots was over, a 200-ms backward masking was given immediately (see Fig. 2).

Fig. 2
figure 2

Experimental procedure. (T): Participants reported the number of the tactile pins. (TT): Tactile pin stimuli were given simultaneously to both index fingers. In addition to reporting the number on their right index fingers, participants made a two-alternative forced-choice (2-AFC) to answer whether the number of pins on the top area was larger than the number on the lower area on their left index fingers. (VT): During estimating the number of tactile pins on right index finger, participants made a 2-AFC to report whether the upper area contained more visual elements than the lower area

Participants sat before the tactile stimulator and a laptop computer (Dell Precision M4800) that controlled the generation of stimuli, with the program written with MATLAB (MathWorks, Inc.) and the Psychophysics Toolbox (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997). They wore sound-proof earplugs to prevent hearing the faint noise generated by the tactile device. Before the formal tasks, participants received practice, and they well understood the tasks within 20 trials.

Results

As five participants’ Weber fractions were larger (around 1.5), we used the other 11 participants’ data in the following analysis. Errors (mean of absolute difference between presented number of dots and perceived numerosity) and Weber fractions for each participant in all the experimental conditions were calculated and then averaged (see Table 1 and Fig. 3, respectively). The average perceived numerosity against the presented number is showed in Fig. 4.

Table 1 Mean errors (with associated standard errors) for each task conditions and each number in Experiment 1
Fig. 3
figure 3

Average Weber fraction against presented number for Experiment 1. Error bars represent standard errors. (Color figure online)

Fig. 4
figure 4

Average perceived numerosity against presented number for Experiment 1. Error bars represent standard errors. (Color figure online)

The patterns of Weber fractions were different for the three conditions. For the T and TT condition, Weber fraction was relatively stable at around 0.25 but was a little up-and-down between numbers 2 and 3. For the VT condition, Weber fractions were larger when the number of dots was less than 3 and kept decreasing as dots increased (see Fig. 3). The main effect of number was significant, F(7, 70) = 3.120, p =.006, η2 = 0.238. The main effect of task condition was not significant, F(2, 20) = 3.032, p =.071, η2 = 0.233. The interaction between number and task condition was not significant, F(14, 140) = 1.595, p =.088, η2 = 0.138. Bonferroni-corrected comparisons showed that the Weber fraction for number 4 was larger than the one associated with number 6 (mean difference = 0.064, SEM = 0.014, p = .030). The Weber fraction for number 4 was also larger than the one associated with number 8 (mean difference = 0.104, SEM = 0.024, p = .045).

The numerosity judgment was quite accurate in general. We implemented a repeated-measure ANOVA, with the factors numerosity (1–8) and task condition (T vs. TT vs. VT). This analysis revealed a significant main effect of numerosity F(7, 70) = 19.630, p <. 01, η2 = 0.663. There was no significant main effect of task condition, F(2, 20) = 2.066, p =.153, η2 = 0.171, or significant interaction, F(14, 140) = 0.948, p >.05, η2 = 0.044 (see Fig. 4).

For the data in condition T, we implemented a repeated-measures ANOVA, with the factor numerosity (1–8). This analysis revealed significant main effect of numerosity, F(7, 70) = 14.292, p <.001, η2 = 0.588. For the data in the TT condition TT, the main effect of numerosity was significant, F(7, 70) = 6.060, p <.001, η2 = 0.377. Likewise, the main effect of numerosity in the VT condition was significant, F(7, 70) = 7.661, p <.001, η2 = 0.434. A Bonferroni-corrected comparison showed that there was no significant difference between each two adjacent numbers, except that there was significant difference between 2 and 3 in the T condition (mean difference = −0.475, SEM = 0.095, p = .016). However, for nonadjacent numbers, the Bonferroni-corrected comparison revealed differences between multiple pairs (e.g., there were significant differences between number 2 and numbers 5–7 in the T condition) (Fig.5).

The accuracy of the interfering task was 0.68 (SD = 0.08) for the TT condition and 0.71 (SD = 0.08) for the VT condition. A paired-samples t test revealed that there was no significant difference between the two interfering tasks’ accuracies, t(10) = −1.62, p>.05.

Taken together, the data pattern in both the Weber fraction and distribution of numerosity indicated that the cutoff point between subitizing and estimation could be around number 5. However, the demarcation of number sensing could be flexible, depending on the number range we adopted. Therefore, in Experiment 2, we extended the numbers of taps (up to 12 dots).

Fig. 5
figure 5

Errors for Experiment 1. Error bars represent standard errors. *p < .05. **p < .01

Experiment 2

To gain a better understanding of human’s tactile numerosity judgment ability and explore whether individual (visual) work memory capacity could influence the attentional processing in tactile modality, we implemented Experiment 2. In this experiment, we not only extended the numbers up to 12 but also carried out two additional tests to measure the working memory capacities of corresponding visual and tactile tasks.

Method

Participants

Thirty right-handed participants (19 females, 11 males) took part in this experiment, which took approximately 130 min, and they received 90 CNY for their time. The participants’ age ranged from 18 to 25 years (mean age = 21.6 years), and all participants reported normal tactile perception and normal or correct-to-normal vision. Informed consent was obtained from each participant. The study was approved by the Academic Affairs Committee of the School of Psychological and Cognitive Sciences at Peking University.

Design

In Experiment 1, the cutoff point between subitizing range and estimation range may be equal or greater than 5. Experiment 2 was similar to Experiment 1, except for three differences. First, the critical range of the stimuli’s number changed from 1–8 to 1–12. Second, participants gave oral reports rather than using keyboard inputs, to minimize the interference of the same affordance (hands) during the task. Third, we measured participants’ visual and tactile working memory capacities (WMC).

During the visual WMC task, participants firstly saw an equation (task of single digit addition or subtraction) and judged whether it was correct as quickly and accurately as possible in 3 seconds. Participants were then prompted to remember a capital letter appearing on the center of the screen. This process repeated four to eight times. Finally, participants were asked to report the whole letter sequences with keyboard inputs but without any feedback. There were three practice-trial sequences and 15 formal-test sequences. The WMC task score was computed by the number of letters correctly reported.

Seventeen participants who took part in Experiment 2 also took part in both the visual and tactile WMC tasks. The procedure and timing protocol of the tactile WMC task was almost the same as the visual, except that stimuli were changed from visual letters to tactile figures, and participants gave their answers using numbers corresponding to the tactile figures (see Fig. 6). All participants had passed the screen test to become familiar with the tactile figures by receiving each figure four times, or practicing more if necessary.

Fig. 6
figure 6

Eight kinds of tactile figures used in the tactile WMC task

Results

One participant’s judgment error (under the TT condition, for number 2) was outside three standard deviations (error = 1.630, M = 0.125, SD = 0.363), and two participants’ Weber fractions were larger than 2. Thus, these participants’ data were not included for analysis. The descriptive statistics results of the tactile numerosity judgment task are given in Table 2. The average perceived numerosity against the presented number is showed in Fig. 8.

Table 2 Mean errors (with associated standard errors) for each task conditions and each number in Experiment 2

For the Weber fraction, the main effect of number was significant, F(11, 286) = 23.649, p <.001 η2 = 0.476. The main effect of task condition was significant, F(2, 52) = 4.439, p =0.017 η2 = 0.146. The Weber fraction associated with VT (0.143± 0.008) was larger than the one associated with T (0.119 ± 0.006), p = .044. The interaction between number and task condition was borderline significant, F(22, 572) = 1.546, p =.054, η2 = 0.056. Simple effect analysis showed that for T, the fraction was larger in number 4 than in number 3 (mean difference = 0.109, SEM = 0.028, p = .037), and the fraction was larger in number 4 than in number 5 (mean difference = 0.076, SEM = 0.015, p = .002). For VT, the fraction was larger in number 3 than in number 2 (mean difference = 0.134, SEM = 0.022, p < .001 (see Fig. 7).

Fig. 7
figure 7

Average Weber fraction against presented number for Experiment 2. Error bars represent standard errors. (Color figure online)

For the numerosity judgment, we adopted a repeated-measures ANOVA, with the factors numerosity (1–12) and task conditions (T vs. TT vs. VT). This analysis revealed significant interaction, F(22, 572) = 2.222, p =.001, η2 = 0.079, and a significant numerosity main effect, F(11, 286) = 85.863, p< .001, partial η2 = 0.768. The main effect of task conditions was not significant, F(2, 52) = 0.935, p =.399, η2 = 0.035. Simple effect analysis showed that interaction effect was shown in number 3 and number 4. For number 3, the error associated with T was less than the one associated with VT (mean difference = −0.153, SEM = 0.045, p = .007). For number 4, the errors associated with T were larger than in TT (mean difference = 0.421, SEM = 0.053, p < .001), and the one in VT (mean difference = 0.372, SEM = 0.060, p < .001; see Figs. 8 and 9).

Fig. 8
figure 8

Average perceived numerosity against presented number for Experiment 2. Error bars represent standard errors. (Color figure online)

Fig. 9
figure 9

Errors for Experiment 2. Error bars represent standard errors. *p < .05. **p < .01. ***p < .001

We then implemented a repeated-measures ANOVA, with the factors number size (small number: 1–4 vs. large number: 5–12) and task condition (T, TT, and VT). The interaction effect was significant, F(2, 52) = 4.625, p =.014, partial η2 = 0.151. The main effect of number size was significant, F(1, 26) = 78.68, p <.001, partial η2 = 0.752. However, the main effect of task condition was not significant, F(2, 52) = 0.399, p =.673, η2= 0.015. The interaction effect was realized in the small-number condition, whereas the error was higher for T (0.231 ± 0.019) than for the errors in TT (0.122 ± 0.021) and VT (0.174 ± 0.019), p < .001 and p < .05, respectively, and the error was higher for VT than the error in TT, p < .05, suggesting the cross-modal dual task imposes attentional modulation in the subitizing task (see Fig. 10).

Fig. 10
figure 10

Errors for Experiment 2 with sorted small and large ranges. Small range of numbers is 1–4; large range is 5–12. Error bars represent standard errors. *p < .05. **p < .01. ***p < .001

The accuracy of reporting the number in the interfering tasks were 0.54 (SD = 0.03) for the TT condition and 0.58 (SD = 0.07) for the VT condition. The result of a paired-samples t test showed that the accuracy might be different between two different interfering tasks, t(26) = −2.62, p=.014.

Due to the small size of the numbers, we used Spearman’s rho correlation. The full score of the two WMC tests was 90. For the visual WMC test, participants’ score ranged from 64 to 88 (M = 78.47, SEM = 7.01), and for the tactile WMC test, participants’ score ranged from 73 to 89 (M = 81.76, SEM = 4.45). The paired t test showed that the two tests were comparable in difficulty, t(16) = 1.557, p = .139.

We examined the correlations between the WMC tests’ scores (the number of letters/figures correctly reported) and the errors under different conditions (T, TT, and VT) and found no significant correlations. We further calculated the difference of errors between two different tasks conditions (i.e., the difference between T and TT) and explored the correlation between WMC and this kind of difference. The correlations between the WMC tests’ scores (the number of letters/figures correctly reported) and the difference of errors between two different task conditions (VT, TT) are shown in Table 3.

Table 3 Correlations between WMC tests’ scores and the difference of errors between different tasks

The above results revealed that there was a significant correlation between the visual WMC test scores and the difference of errors between the VT and TT conditions, but only when the stimulus’ number was small (r = −0.662, p<.01), that is, in subitizing task. There was no significant correlation between the tactile WMC scores and tactile numerosity judgment task.

Discussion

The current findings suggest a dual system for tactile numerosity judgment, in which the attentional modulation plays a differential role in affecting sensing the numerosity in small and large ranges. Precision of sensing numerosity was impaired for dual-task operations in the subitizing range, but not in the estimation range. This result was in agreement with previous studies on number sensing for visual events, in which subitizing depends strongly on attentional resources while estimation of larger quantities depends less on attentional load (Anobile et al., 2012; Burr et al., 2011; Burr et al., 2010; Hyde & Wood, 2011; Piazza et al., 2011). We now extended the dual attentional tasks to other modalities, by examining the effect visual numerosity perception has on tactile subitizing/estimation. A cross-modal (divided) attentional interfering effect was only observed in the subitizing task when the number of stimuli was small, but not in the numerosity estimation task. Therefore, the dependency of subitizing on attention is general, not specific, to a given type of task or a particular modality (such visual/tactile modality).

The relation of attentional modulation and working memory capacities in visual and tactile modality showed an amodal influence in numerosity perception. With working memory tests, we observed a negative correlation between the relative numerosity judgment performance (VT vs. TT) and the score of visual WMC, rather than tactile WMC—the higher visual working memory capacity, the lower differences of the errors between the VT subitizing task and TT subitizing task. This finding suggested that individuals with higher capacities of visual working memory allow distractor interference.

The current findings could be reconciled in the perceptual load framework model with the component of working memory load. The perceptual load hypothesis claims that, for the low perceptual load condition (mediated by working memory capacity), attentional resources spill over to process distractor information and result in competition for the control of perceptual responses between target and distractor information in working memory (Lavie, 1995, 2005; Lavie & de Fockert, 2003; Lavie & Tsal, 1994). The interaction between working memory (capacity) and perceptual load depends on the modality of information (Kim, Kim, & Chun, 2005; Konstantinou, Beal, King, & Lavie, 2014; Konstantinou & Lavie, 2013; Koshino, 2017; Koshino & Olid, 2015). For tactile subitizing, it requires fine attentional modulation (i.e., high perceptual load) and depends largely on otherwise larger working memory capacity; hence, we observed generally larger errors for subitizing in tactile numerosity discrimination (T task). However, with the TT dual tasks, the perceptual load is high in the same modality, and attentional resources are fully consumed by target processing. As a result, the concurrent distractors from the same modality (tactile events) are not/less likely processed (with low accuracy of reporting number for distractor T) and have less chance of entering into working memory for processing target tactile events (Koshino & Olid, 2015; Lavie, 2005), and we observed a relatively better performance for subitizing of target events in TT. On the other hand, when the perceptual load is high but across different modalities (with VT dual tasks), larger visual working memory capacity allows the proper processing of visual numerosity (with relatively better performance of reporting numerosity of distractor V), which intrudes on the attentional resources for processing target tactile events (with relatively worse performance of reporting numerosity of target T). Nevertheless, we should be cautious in the above arguments, which we based on two assumptions: The total reservoir of attentional resources for numerosity processing is fixed, and perceptual load for processing target events is more subjective to the influence from a different sensory modality (de Fockert, Rees, Frith, & Lavie, 2001; Sandhu & Dyson, 2013, 2016; Tellinghuisen & Nowak, 2003). Moreover, since the difficulties of numerosity perception in both given interfering tasks (TT vs. VT) might be different, this could constrain our current accounts. More evidence is needed in further study.

Alternatively, the above results can be understood according to Dehaene and Changeux’s (1993) neuronal model. It depicts a hierarchical processing (with five presumed levels) for numerical information (Dehaene & Changeux, 1993). Firstly, the processing starts with receiving information from topographically organized input clusters (i.e., retinotopic for vision and somatotopic for touch), and then memory clusters help to maintain input information from each modality. At this stage, limitations might be different for each sensory modality. Those limitations would lead to modality-specific working memory representations and cause differential precision of sensing tactile numerosity in the presence of distractors from the same modality or from a third modality (visual modality). Different modality-specific input clusters, however, then project to a common amodal location map, where the transformed modality-nonspecific/amodal information relays to summation clusters. It is in this stage that the attentional resources (modulation) play a differential role on subitizing and estimation of the numbers. Finally, the preferred numerosity is presented and output in the numerosity clusters with appropriate neuronal responses from the observer.

With that said, we should be careful on the limitations of the above potential neuronal model, for which the exact stage of multisensory interaction, as well as the interface between working memory and attention, are not clear. Moreover, other limitations remain in the current study. One limitation is that we did not manipulate working memory load directly during the concurrent numerosity task. Additionally, a comparable design with the main task of visual numerosity perception in the presence of tactile distractors would help to elucidate the hypothesis of interaction between working memory and perceptual load, in the scope of perceptual load theory as we discussed above. This calls for further empirical studies.

Previous relevant studies did not maintain the same spatial locations for the stimuli from different sensory modality (Gallace et al., 2006, 2007; Gallace, Tan, & Spence, 2008; Gallace et al., 2014). Those different spatial representations of cross-modal events would constrain the outcome of the cross-modal perception and multisensory interaction (Pouget, Deneve, & Duhamel, 2002). In our study, we presented visual dots via a rectangular screen that was parallel to the surface of table and above the tactile display, so that the location of visual dots were mostly consistent with the pins on the tactile interface. We believe this operation would largely reduce some confounding during the perception of multisensory numerosity.

In sum, we found differential attentional mechanisms in tactile numerosity perception. Tactile subitizing, rather than tactile numerosity estimation, is subject to amodal attentional modulation. However, the modulation of visual working memory upon tactile numerosity perception has shown an amodal process and affects the precision of tactile subitizing, in the presence of distractor interference from a different modality (visual modality).