Everyday multitasking among three or more different tasks pervades domestic and working life, yet rarely is it the focus of empirical study or theory development. This form of multitasking refers to the management of several different ongoing tasks to be completed within a limited time. Tasks often must be interleaved efficiently to maximise performance (Burgess, 2000), and there may be an optimum order for their completion—for example, cooking a meal (e.g., Craik & Bialystok, 2006) or a time-limited shopping trip (e.g., Shallice & Burgess, 1991). Many occupations require multitasking—for example, emergency medicine and medical decision making (Chisholm, Dornfeld, Nelson, & Cordell, 2001; Law et al., 2005; van der Meulen et al., 2010), management (Seshadri & Shapira, 2001), or navigation (e.g., Garden, Cornoldi, & Logie, 2002; Law, Logie, & Pearson, 2006; Spiers & Maguire, 2006). We report here an investigation of how healthy young adults achieve everyday multitasking in a simulated environment. We start with a statistical model of everyday multitasking derived from studies of individuals with frontal-lobe damage (Burgess, Veitch, de Lacy Costello, & Shallice, 2000b). We assess the generality of this model, and develop it for healthy young adults using a new paradigm designed to address this ubiquitous requirement of everyday cognition.

Most current theories of cognition address specific components of the cognitive system, such as visual or auditory perception, focused or sustained attention, prospective memory, verbal or visuospatial working memory, episodic and semantic memory, task switching, and cognitive bottlenecks. These topics are experimentally tractable and draw on well-developed paradigms, allowing major advances in the understanding of each. Everyday multitasking requires coordinated and strategic deployment of several different cognitive functions, presenting a major challenge for paradigms and theories that focus on specific cognitive phenomena and ensuring its rarity in studies of human cognition (Neisser, 1978). So, the research we describe is not concerned with understanding a particular cognitive function or with cognitive bottlenecks between conflicting tasks (e.g., Borst, Taatgen, & van Rijn, 2010b). Rather, it addresses how the cognitive system avoids cognitive bottlenecks when performing multiple tasks (e.g., Craik & Bialystok, 2006). This requires new paradigms and the development of theories concerned with how various parts of the cognitive system operate in concert rather than in isolation.

Studies of multitasking typically focus on deficits following brain injury (e.g., Alderman, Burgess, Knight, & Henman, 2003; Miotto & Morris, 1998; Shallice & Burgess, 1991), on neuroanatomical correlates in healthy participants (Borst, Taatgen, Stocco, & van Rijn, 2010a; Burgess, Dumontheil, & Gilbert, 2007; Spiers & Maguire, 2006), or on highly trained experts such as in the military or aviation (e.g., Loukopoulos, Dismukes, & Barshi, 2009; Wickens, 2008). Everyday multitasking, as performed by untrained adults, is different from the microstructure of rapid switching between laboratory tasks (e.g., Koch, Gade, Schuch, & Philipp, 2010; Monsell, 2003) or from concurrent dual-task demands (e.g., Borst, Taatgen, Stocco, & van Rijn, 2010a; Borst, Taatgen, & van Rijn, 2010b; Logie, Cocchini, Della Sala, & Baddeley, 2004). Instead, it involves several subtasks that have different requirements, and participants decide themselves how to schedule subtask attempts. Other studies have assessed driving skills (e.g., Levy & Pashler, 2008; Strayer, Drews, & Crouch, 2006) or planning and implementation of subgoals, such as in the Tower of London task or simulated work (e.g., Hambrick, Oswald, Darowski, Rench, & Brou, 2010; Phillips, Gilhooly, Logie, Della Sala, & Wynn, 2003; Ward & Allport, 1997). However, these tasks do not address the broader demands of everyday multitasking, and despite its ubiquitous everyday requirement, there remains limited theoretical or empirical insight into how multitasking is achieved by healthy adults.

One theoretical approach to multitasking (Burgess et al., 2000b) was derived from studies of individuals with frontal-lobe damage, who often show impairments in everyday multitasking but intact performance on tests of attention, memory, and executive functions. This suggests that multitasking is not solely dependent on the latter functions. Burgess et al. (2000b) used the “Greenwich test,” in which participants switch between three manual subtasks (sorting beads, sorting tangled lines on paper, and constructing plastic Meccano). Their statistical model (Fig. 1) identified important, and largely independent, roles for retrospective memory and intentionality (prospective memory). Planning did not reliably contribute to the model. However, planning deficits only appeared for patients with lesions in the right dorsolateral prefrontal cortex. Damage to anterior regions—Brodmann areas 8, 9, and 10—did not affect planning but did affect task switching and task rule adherence. For these reasons, Burgess et al. (2000b) included planning in their structural equation model, but they had only one measure of planning, so the loading for the Plan construct was set at 1.0. Their model suggested that the Memory construct drives separate constructs for Plan and Intent.

Fig. 1
figure 1

An illustration of the structural equation model of multitasking from Burgess et al. (2000b). Reproduced with minor changes to format from “The Cognitive and Neuroanatomical Correlates of Multitasking,” by P. W. Burgess, E. Veitch, A. de Lacy Costello, and T. Shallice, 2000b, Neuropsychologia, 38, p. 856. Copyright 2000 by Elsevier. Adapted with permission

Burgess, Simons, Coates, and Channon (2005a) suggested that planning is multifaceted and supported by a range of cognitive abilities. Phillips et al. (2003) noted that many healthy participants plan online during complex tasks rather than following a plan formed in advance. Therefore, it would also be important to identify the individual cognitive functions that contribute to advance and online planning. Burgess et al. (2000b) argued that forming and implementing a plan draws upon cognitive systems responsible for remembering instructions and rules for the multitasking paradigm. Therefore, planning and intentionality were represented downstream from retrospective memory in their model.

The roles for advance and online planning may be greater for subtasks with an optimum order, such as preparing a meal. Craik and Bialystok (2006) addressed this topic in a cognitive aging study using simulated breakfast making. Participants repeatedly set a simulated table by moving cutlery and plates on the computer screen, switching to alternate screens for starting and stopping preparation of foods with different cooking times. However, the reliance on prospective memory for starting and stopping foods, and the similarity of the subtasks, made this task less well suited for multitasking assessment, which we assume to involve a range of different cognitive functions, only one of which is prospective memory.

The multiple-errands test (MET; Alderman et al., 2003; Shallice & Burgess, 1991) has an optimum task order. Individuals with frontal-lobe damage and healthy controls planned and attempted several tasks in a real shopping mall, following as efficient a route as possible. Although the MET is close to real life and requires little or no initial practice, there are obvious drawbacks to experiments conducted in real-life settings (e.g., Bailey, Henry, Rendell, Phillips, & Kliegel, 2010): They are time consuming, and require transport for participants and consent from local businesses. The lack of experimental control can compromise participant safety and data reliability, and the tasks cannot easily be adapted for other clinical or research settings.

Simulated real-life tasks have been used for assessing brain-damaged patients, including planning on a map (e.g., Burgess et al., 2000a), moving furniture in virtual buildings (Morris, Kotitsa, & Bramham, 2005), or selecting stores in a video of a high street (e.g., Knight, Titov, & Crawford, 2006). Knight and Titov (2009) and others (e.g., Bailey et al., 2010; Burgess et al., 2000a) have noted the need for ecologically valid and scientifically robust tests of multitasking. McGeorge et al. (2001) created a virtual version of the MET, the Virtual Errands Test (VET), that retained many of the advantages of the real environment while achieving experimental control. The errands were tasks such as collecting a book or meeting a colleague. The VET was as sensitive as the real environment to executive dysfunction in brain-damaged patients (see also Rand, Basha-Abu Rukan, Weiss, & Katz, 2009). Thus, virtual environments may offer sufficient face and content validity for the assessment of multitasking in a tractable and controlled setting. However, these studies have focused on impairments associated with brain damage (Knight & Titov, 2009). Law et al. (2006) modified the VET to be challenging for healthy adults. However, the graphics were unrealistic, and performance assessment involved video-recording test sessions with subsequent manual scoring. The Edinburgh Virtual Errands Test (EVET), used in the present study, was developed from a widely available and inexpensive commercial games platform that permits nonprofit development of virtual environments. It is well suited to creating an environment for multitasking with realistic graphics and a smooth interface, as well as automatic recording of multiple performance measures.

In sum, little is known about how healthy young adults behave in everyday multitasking situations and what cognitive variables affect their performance. We addressed the cognition that underlies this important activity, using sets of multiple errands performed by healthy young adults in a realistic simulation of everyday multitasking. EVET involves a higher memory load than did Burgess et al.’s (2000b) study, because participants have to memorise an errand list. EVET requires a substantial degree of preplanning, because inefficient ordering of the errands may result in time expiring before all tasks are completed. It requires navigation around a virtual environment, so participants may draw on the resources of visuospatial working memory (Logie, 1995). Therefore, we expected that the Burgess et al. (2000b) model might require modification to account for multitasking performance in healthy participants.

One additional important aspect of cognition is working memory, and König, Bühner, and Mürling (2005; see also Hambrick, Oswald, Darowski, Rench, & Brou, 2010) found that working memory was a more important predictor of multitasking performance than fluid intelligence or attention. Baddeley and Logie (1999; Logie, in press) argued that working memory comprises multiple, domain-specific resources that are deployed selectively by participants according to task demands. This contrasts with the view that working memory is primarily a control system for focusing limited-capacity domain-general attention on currently activated contents of episodic or semantic memory (e.g., Cowan, 2005). Therefore, we used independent measures of verbal and spatial working memory and of verbal free recall to explore whether a single- or a multiple-factor model could best account for our multitasking data.

Participants completed an additional test of planning within EVET and a battery of cognitive assessments to investigate relationships between individual differences in multitasking and separate measures of retrospective and prospective memory, online planning, and spatial and verbal working memory. From the Burgess et al. (2000b) model and from our own previous work (e.g., Garden et al., 2002; Law et al., 2006; Logie, Baddeley, Mane, Donchin, & Sheptak, 1989), we expected that retrospective memory and prospective memory, but also online planning (rather than preplanning), spatial working memory, and verbal working memory would make independent contributions to shared variance with EVET performance.

In summary, our goals were (1) to add to understanding of everyday multitasking in young healthy adults; (2) to test the generality in healthy young adults of a statistical model originally developed to account for the multitasking impairments associated with frontal-lobe damage; (3) to test the hypothesis that multitasking among diverse subtasks draws on a range of different cognitive functions, not only a single, general-purpose attentional capacity; and (4) to develop a new methodology to address the above concerns, given that most current paradigms in experimental cognitive psychology would be ill suited to these goals.

Method

Participants

A total of 165 students at the University of Edinburgh (102 women, 63 men), mean age 19.59 years (SD = 2.43, range = 16–32), participated.

Tests and procedure

Participants first completed the EVET procedure and then five individual-difference measures, administered in the order described below. Except for word recall, the tasks were viewed on a 42-cm colour monitor, on a Dell XPS PC with Intel Core Quad 2.33 GHz processor and 1 GB ATI Graphics Card.

Edinburgh Virtual Errands Task

The EVET environment was created using the Hammer environment editor, supplied as a software development kit with the computer game Half Life 2, to construct a 3-D model of a four-storey building with five rooms along the left and right ends of each floor around a central stairwell, with two sets of stairs (one left, one right) and a central elevator. Figure 2 shows a screen shot of the concourse on the ground floor (floor zero).

Fig. 2
figure 2

Screen shot of the ground floor concourse area of the Edinburgh Virtual Errands Test (EVET)

Participants explored the virtual environment using the standard keyboard—the keys “w,” “s,” “a,” and “d” for forward, backward, and left and right lateral movement, respectively, and the mouse to look in any direction. The “e” key was used for actions such as picking up objects or opening doors. Participant movement within the virtual building was automatically recorded at 10 Hz, represented as a series of x, y, z coordinates, with actions recorded and time stamped. Participants were to complete eight errands within 8 min (Table 1). Three of the errands were two-stage, requiring object collection and drop off. The remaining five required one action. Two of the tasks had time constraints (e.g., turn off cinema at 5:30). Sorting folders was the only open-ended EVET task. It could be performed at any time for as long as the participant chose during the 8-min test period. Participants were informed that folder sorting was no more important than other tasks, but that they should try to sort as many as possible. Half of the participants were given Errand List A and started on the ground floor between the stairwells. The other half were given Errand List B and started in the equivalent position on the top floor. Errands were given in a nonoptimal order for completion, and participants were asked to plan the optimal order before commencing EVET.

Table 1 EVET errand lists (A and B)

First, each participant read the EVET instructions detailing the task, building layout, and rules. The experimenter described the layout of the building prior to the practice session. The building rules were to use the left stairs for travelling down and the right stairs for travelling up, to avoid entering non-task-related rooms, and to avoid picking up non-task-related objects. Participants practiced using the controls to move around the building to complete five practice errands: object collection and delivery, button pressing, unlocking the stairwell door with a key code, and folder sorting. This also allowed participants to familiarise themselves with the building. None of the practice errands were used in the main testing session. The practice session took approximately 5 min.

Next, participants studied their allocated errand list for 2 min, followed by free recall, then 5 min of further study and a test of cued recall. The total number of errands correct from free and cued recall comprised the errand list memory score, equivalent to a measure classified as “Learn” by Burgess et al. (2000b). Finally, participants were provided with a schematic building map and a copy of the errand list, and were asked to indicate the order in which they planned to perform each errand to achieve maximum efficiency in task completion. They were informed that they could change their plan during the actual test. Upon completion of their plan, the task list was removed, along with their written plan, and they were asked again to verbally recall the errand list and the building rules, with any mistakes corrected. This process was repeated until 100% recall of the list. This last procedure was implemented to minimise the chances that participants would fail to complete errands in EVET because they could not recall all of the errands.

Participants next performed the EVET test for 8 min (neither task list nor plan were present during the test). On completion, they were asked to recall what errands they had attempted or had failed to complete, and any building rules that they had broken (see Table 2). This assessed whether participants could recall all of the actions that they performed in EVET and was equivalent to the measure labelled “Recount” by Burgess et al. (2000b). It was followed by free recall of the complete errand list, regardless of whether or not all of the errands had been completed. Participants were then cued about any errands that they had omitted in this posttest free recall. A point was awarded for each errand correctly recalled by both methods, to generate a measure equivalent to the “Remember” variable in Burgess et al. (2000b). Finally, all participants were presented with the alternative errand list (Set B if they had used Set A, and vice versa), with a fresh sheet showing the layout of each floor, and were asked to generate another plan for the order of errands in this alternative list. This was used as a second measure of planning within the context of EVET, but without performing EVET a second time. “EVET travel time” indicated the total amount of time each participant had spent travelling in the EVET building, but excluded time spent inside rooms. This measure was intended to index efficiency of navigation through the building. Finally, the EVET score indicated errand completion efficiency. Points were awarded for errand completion and were removed for breaking the building rules (Table 2).

Table 2 Number of bonus points added to and penalty points deducted from EVET scores, based on percentages of participants performing within the ranges shown in the table

Word recall task

This task required participants to recall orally, in any order, twelve words read out by the experimenter, one word per second, following the standardised procedure in Capitani, Della Sala, Logie, and Spinnler (1992). This was repeated for five different lists (60 maximum). It was used as an independent measure of retrospective memory.

Working memory verbal span

This measure was based on working memory sentence span, as developed by Baddeley, Logie, Nimmo-Smith, and Brereton (1985) and Duff and Logie (2001). Participants verified each of a set of sentences as they appeared consecutively on a computer monitor, and were asked to memorize the last word of each sentence. At the end of the set of sentences, they were asked to orally recall, in order, the sentence-final words. The task began with sequences of two sentences, with each set size repeated three times, after which the set size increased by one. All participants continued, regardless of performance, until the maximum set size (seven sentences) was completed. All sentences were presented for 3 s, preceded by a fixation cross for 1 s. Total correct recall of the sentence-final words was calculated as a proportion of the maximum possible (81).

Working memory spatial span

This was based on a task from Shah and Miyake (1996). The participants verified whether block capital letters that appeared consecutively were shown in their normal configuration or as a mirror image. Each letter was shown in a different orientation within a circular area. Participants were instructed to memorize the orientation of each letter for recall at the end of each set. The task began with sequences of two letters, with each set size repeated three times, after which the set size increased by one. All participants continued, regardless of performance, until the maximum set size (five letters) was completed. All letters were presented for 3 s and preceded by a fixation cross for 1 s. Total correct recall was calculated as a proportion of the maximum recall score (70).

Travelling salesman task

This task involved presenting nine coloured target shapes along the bottom of the screen, with the first shape labelled “Start/End” and the rest “Target Locations.” The 9 target shapes for each trial were also placed in the main section of the screen at random locations in a 5 × 5 array, and different coloured shapes were placed as distractors in the remaining 16 locations. Participants were asked to click on each of the target locations in the order that formed the shortest possible path connecting the start shape with the locations of each of the targets in the array, finally returning to the start location. As each target was clicked in the array, it was marked to show which targets in the array had been visited, and it disappeared from the target list at the bottom of the screen to indicate which of the targets had yet to be visited in the array. When all nine target locations had been visited and the participant had returned to the start point, the next array was presented, using a different selection of 9 target shapes from the full set of 25, and with the targets shown in new random locations in the array. There were two practice trials, both with 9 different targets, but the first showed only an array of the 9 targets in random locations and no distractors. The second showed 9 targets along with 16 distractors within the full 5 × 5 array. The optimum distance for each array was calculated using an algorithm for travelling salesman problems (Kirk, 2007). Each participant completed a total of 10 arrays. There was only one possible optimum solution for each combination of array and target set, and the average proportion of distance longer than the optimum was calculated for each array completed, with the average proportions across the 10 arrays comprising the score.

Breakfast task

This was used as a measure of prospective memory ability. It was devised by Craik and Bialystok (2006), who kindly provided a copy of the programme. In this task, participants had to swap between setting plates and cutlery on a virtual table on the computer screen and starting and stopping the virtual cooking of eggs, coffee, pancakes, sausages, and toast, each with different cooking times ranging from 120 to 330 s. The goal was for all foods to complete cooking at the same time. The five foods were each shown on a separate screen from each other and from the table being set. To start cooking, the participant clicked on an icon for the food with the longest cooking time (330 s). This took them to a screen for that food, where they clicked to start a timer displaying the progression of cooking time for that food. They then had to return to setting the virtual table until it was time to start the food with the next longest cooking time (240 s). This continued until the time when all of the foods should be ready, and the participant had to visit each screen to stop cooking each food. Participants practiced with two foods. Craik and Bialystok reported a range of different measures of cooking performance, which, in our data, were highly correlated. Our chosen outcome measure was the average deviation between the actual start time for each food and the time that each food should have started. Given that the task primarily involved prospective memory for starting each of the foods at the correct time while engaged in another task (table setting), it was taken as an independent measure of prospective memory ability. We chose this task, rather than established laboratory measures of prospective memory, because it was a simulation of everyday prospective memory, and our overall goal was to investigate simulations of everyday complex cognition in multitasking.

Results

Twelve of our participants were unable to complete either the breakfast or travelling salesman task due to technical problems. Their data were excluded, leaving a final total of 153 participants (95 women, 58 men) for subsequent analysis.

The EVET is a completely novel task and a novel paradigm. There were, therefore, no clear precedents for generating scores to be used as dependent measures. We describe below the rationale for, and the procedure followed to generate, scores. Our aim was to generate indicator variables that were as close as possible to those considered by Burgess et al. (2000b), with additional variables that were intended to assess the contributions to multitasking of additional cognitive functions that had not been considered in the earlier study.

EVET score

The overall EVET score comprised a weighted score procedure devised to emphasise task efficiency. Following Burgess et al. (2000b), the general principle was to award points for tasks completed and to remove them for rule breaks. Errands that could be completed at any time were awarded one point for each successful action (maximum two points for two-part errands), yielding a maximum action score of 8. Finding the door code and unlocking the stair door counted as one action for this purpose. The time-restricted (cinema and meeting) and open-ended (folder sort) tasks were weighted on a five-point scale (0, 1, 2, 3, and 4), yielding a possible maximum bonus of 12 points, which were added to the action score, for a potential overall maximum of 20 points. There were no obvious a priori criteria for allocating bonus points. Rather than generate arbitrary criteria, we based the allocation on inspection of the frequency distribution of raw scores on each of these measures (Table 2). The rationale was to generate a distribution of scores that fairly reflected the distribution of actual performance across participants on each task. We acknowledge that the precise cut-off criteria for allocating scores to adjacent bonus categories remain somewhat arbitrary, but given that the same criteria were applied across all participants, we have no grounds for suspecting that this procedure resulted in systematic bias in the overall EVET score or distorted its use as a measure of individual differences.

A similar scoring procedure was used for breaking EVET rules, except minus points were allocated for going up the “down” stairs, and vice versa, and for going into rooms or picking up objects that were not part of the errand list. We allocated minus points (0, –1, –2, –3, or −4) on the basis of inspection of the frequency distribution of each error type (Table 2), to fairly reflect the error performance across participants. For example, over 80% of participants broke the stair rule once, so this attracted zero penalty. Across the three error types, the maximum possible penalty score was −12, which was combined with the action+bonus score for successful performance of errands in EVET. A negative score was possible if participants failed to complete most of the errands and incurred a large penalty score, although this never occurred within our participant sample.

EVET learn score (“Learn”)

This is a sum of the free and cued recall scores that participants took prior to starting the EVET test, based on the number of points for each errand. For example, “Pick up computer in G4 and take to T7” contained three elements (the item, collection point, and delivery point), so it was worth 3 points. For free recall, 23 points were available, and for cued recall, 14 points (1 point was removed from each errand due to the cue), making a possible maximum of 37 points.

EVET plan efficiency (“Plan”)

The score for plan efficiency was different from that used for the Greenwich test (Burgess et al., 2000b). There are more possible permutations of order for EVET subtasks, and, unlike the Greenwich test, inefficient ordering might result in failure to complete all errands within the time limit. Assessment of the efficiency of individual EVET plans involved comparison with an optimum plan for each set of errands. There were no clear a priori criteria for identifying the optimum plan. Therefore, we examined the rank order in which errands were actually completed by each of the 5 highest-scoring participants (scoring 19 or 20) for each errand list. The rank order correlations between these 5 individuals ranged from .791 to .955. We therefore used the average rank order as the optimum plan for each list, against which to compare the efficiency of the order in which errands were completed by individual participants. For each errand order for each participant, 1 point was awarded for each errand that was in the same serial position as in the optimum plan. If there was a mismatch of serial position, a point was awarded if a pair of errands were in the same sequential order as on the optimum plan, even if, as a pair of errands, they were completed earlier or later in the sequence. This ensured that there was credit given for partial use of the planned errand sequence. The maximum possible overlap score was 11. Each individual overlap score was divided by 11 to derive a plan efficiency score. We defined “EVET pretest plan” as the efficiency score for the plan participants had made before they attempted the EVET, and defined “EVET posttest plan” as the efficiency score of the plan that participants made at the end of the EVET procedure for the alternative errand set.

EVET follow score (“Follow”)

This was designed to measure the correspondence between the pretest plan and the order in which participants actually completed the errands. It was based on allocating 1 point for each errand that was completed in the same sequential position as planned, and a point if a pair of errands were in the same sequential order but not in the same overall sequential position as on the pretest plan, following an scoring procedure analogous to that for the plan efficiency score. Each individual score was divided by the total possible points that could have been gained from the number of tasks actually completed by each individual participant, to index how closely participants followed their initial plans.

Recall EVET actions (“Recount”)

After completion of EVET, a point was awarded for each errand or type of rule break that participants recalled actually carrying out; this was equivalent to the Burgess et al. (2000b) Recount variable. Only the number of rule break types was recorded. For example, if a participant had gone down the “up” stair 5 times, they were asked to recall if this had happened on at least one occasion. The maximum possible score was 26.

EVET remember score (“Remember”)

Here, the scoring procedure for the Learn measure was repeated but was based on free and cued recall of the errand list after completing the EVET.

Additional measures

The six measures above were designed to be comparable to those considered by Burgess et al. (2000b): Score, Learn, Plan, Plan follow, Recount, and Remember. We then included additional measures that we hypothesised would address the broader theoretical constructs that might be incorporated within or added to the Burgess model (EVET travel time, verbal and spatial working memory span, travelling salesman task, EVET posttest plan, word recall, and breakfast task). The two planning measures (EVET pretest plan and EVET posttest plan) were intended to allow free estimation of the error variance of the planning construct indicators in our structural equation model. This had not been possible for Burgess et al. (2000b), who included only one indicator of planning.

Scores were converted to percentages for analysis. Descriptive statistics and intercorrelations are shown in Table 3.

Table 3 Descriptive statistics and correlation matrix of EVET performance and predictive measures

Analyses

First, we examined which of the five independent measures contributed unique variance to the prediction of EVET score, using multiple regression with backwards stepwise elimination (Table 4). As expected from the Burgess et al. (2000b) model, retrospective memory (word recall) was a significant predictor of EVET score. However, unlike with the earlier model, our measure of prospective memory (breakfast task) did not reliably share variance with EVET, nor did it correlate with any other variables. Like the breakfast task, EVET involves unique items for each of the errands, and so it is not straightforward to calculate split-half reliability across equivalent test items. However, Table 3 shows significant correlations between several different measures of EVET—Travel Time, Learn, Recount, Remember, and Plan Follow—indicating a reasonable level of internal consistency. EVET score also correlated with established independent measures of word recall and verbal and spatial working memory.

Table 4 Results of multiple regression with backwards stepwise elimination to assess the contributions to common variance between scores on the Edinburgh Virtual Errands Test and scores on five different measures of mental ability, as described in the text

Our independent measure of planning ability (the travelling salesman task) was a significant predictor, and so, unlike Burgess et al. (2000b), we included planning in the model on statistical grounds rather than inferring its contribution indirectly from associations with brain lesion sites. Unlike our other independent measures, the travelling salesman problem was a novel task, which allowed calculation of internal reliability. This revealed a Cronbach’s α of .737. Spatial working memory span was also a reliable predictor, indicating that it made a contribution that was independent of word recall and of planning.

Verbal working memory did not make an independent contribution in this initial analysis. Although Table 3 shows that this measure correlated with verbal free recall, when verbal working memory was forced into the regression equation before verbal free recall, the verbal working memory measure did not share significant variance with EVET (Table 4 notes).

To further examine the role of planning in EVET performance, we explored the partial correlations for the efficiency measures of the two pretest plans relative to the optimum plan, the plan follow score, and the EVET score. Controlling for plan following, a significant correlation was found between pretest plan efficiency and EVET performance (partial r = .16, p = .04), but a stronger correlation was found between EVET performance and the efficiency with which participants followed their plan (partial r = .32, p < .01), controlling for pretest plan efficiency. Including the interaction between pretest plan efficiency and plan following added no significant contribution to the prediction of EVET score, so these appear to be separate main effects.

A graphical demonstration of the importance of plan following can be seen in Figure 3, which illustrates the length of time participants spent at each specific set of x, y, z coordinates in the 3-D space. Based on the plan-following measure, participants were split into upper (N = 38) and lower (N = 37) quartiles, and their movements across all four floors during the 8-min EVET test were characterised using kernel density estimation. In the figure, a peak indicates that participants remained in that particular position for a period of time, whereas the absence of a peak indicates movement through the virtual building. It is clear from Figure 3A that participants who did not follow their plan spent more time travelling along the building corridors (lower peaks). Figure 3B, in contrast, shows that participants who adhered to their plan had a strong tendency to focus their time in the folder sorting room (high peak on lower right side), with corridor time and movement kept to a minimum.

Fig. 3
figure 3

Kernel density estimates of participant movement for lower (A) and upper (B) quartile plan followers

An exploratory factor analysis confirmed our theoretical expectations in identifying the three factors that were included in the Burgess et al. (2000b) model, labelled here as “Memory,” “Plan,” and “Intent.” It was clear that the breakfast task had no relationship with any of the factors, and so could not be used as an indicator. The EVET Learn measure was not easily identified with a single factor (first and second factor loadings, .39 and .34, respectively), and so was also excluded.

Structural equation modelling (SEM) with maximum likelihood estimation was carried out using EQS 6.1 (Bentler, 2004) to assess the fit of the Burgess et al. (2000b) model to the EVET data. The following model fit indices were used: χ2 (Bollen, 1989), which tests the hypothesis that an unconstrained model fits the covariance/variance matrix better than the proposed model (non-significant values indicate good fit but are unusual in empirical research with large sample sizes); Bentler’s (1990) comparative fit index (CFI), which compares the proposed model with the null model that assumes that all variables are uncorrelated (values between .90 and .95 are acceptable); and root-mean square error of approximation (RMSEA), a measure of the closeness of fit, with values below .08 indicating reasonable model fit, and below .05 a good fit.

The Burgess et al. (2000b) structural framework (Fig. 1) positioned Memory upstream from both Plan and Intent. Fit indices indicated a reasonable fit to our data: χ2(42, N = 153) = 74.09, p < .01, CFI = .87, RMSEA = .07 (range = .04–.10). Modification indices were used to evaluate possible model changes towards improved model fit. On the basis of these modification index recommendations and theoretical relevance, a path between Plan and Intent was included, which was not possible for Burgess et al. (2000b), given that they had only one index for planning. This second structural model (Fig. 4), therefore assumed that the Intent factor was predicted by both Memory and Plan factors, with a weak link between Memory and Plan. Fit indices suggested a better fit for this second model: χ2(41, N = 153) = 64.91, p < .01, CFI = .90, RMSEA = .06 (range = .03–.09). A chi-square difference test (χ2 Model 1 – χ2 Model 2, df Model 1 – df Model 2) showed a significant difference between the models, χ2(1) = 9.18, p = .01. This suggested that the second model offered a significantly different and better fit, hence supporting the important relationship between Plan and Intent latent variables, and between Memory and Intent, but with Memory and Planning only weakly related.

Fig. 4
figure 4

Our proposed model of EVET multitasking. Selected model fit indices: χ2(41, N = 153) = 64.91, p < .01; comparative fit index = .90, RMSEA = .06 (range = .03–.09), Akaike information criterion = −17.09

Discussion

Our primary goal was to investigate the cognitive factors that contribute to a realistic simulation of everyday multitasking. This was intended to address a major lacuna in our understanding of complex cognition in which several cognitive functions are required to operate in a coordinated fashion, rather than focus on each function in isolation. We drew on previous research on cognitive impairments that are sequelae of frontal-lobe damage and developed a novel paradigm using a virtual environment (EVET) and novel behavioural measures to investigate the general cognitive principles of multitasking in the healthy young adult brain.

The only existing statistical model of everyday multitasking (Burgess et al., 2000b) was used as an initial theoretical framework. However, that framework was based on a set of table-top tasks carried out by individuals with frontal-lobe damage and older controls, with a limited range of measures and only one measure of planning. This model was broadly useful in explaining our data from simulated multiple-errand multitasking with young healthy adults, but it did require modification. We identified separate constructs for memory, preplanning, and plan implementation. This suggests that memory and preplanning may involve different cognitive functions, both of which are required for this form of multitasking in young, healthy adults. This separation between memory and planning is also consistent with the findings by Shallice and Burgess (1991) and Burgess et al. (2000b) that planning can be selectively impaired in individuals with frontal-lobe damage, while leaving other cognitive functions, including memory, intact. Further, our results are consistent with the Burgess et al. (2000a) suggestion that planning is not a unitary construct, and we identified separate constructs for preplanning and for online planning during task performance. However, the latter two factors were not uncorrelated. Participants who formed an efficient plan also tended to achieve a higher EVET score. In addition, participants who closely followed their plans during EVET performance also tended to achieve higher scores than those who modified their plans online. This was true regardless of whether their initial plan was poor or was close to the optimum. Given that errors driven by a change in plan would arise after a plan change, this suggests that changing a plan online in an attempt to make it more efficient at the point of the change might have disrupted the plan for performance of errands yet to be completed.

Our final model comprised three factors: Memory, Plan, and Intent, as did the Burgess et al. (2000b) model. However, for the latter model, the Plan construct was added on neuroanatomical grounds, and Memory was thought to drive Plan and Intent as separate constructs. Also, all of the tasks in the Greenwich test were visible and could be performed in any order. In the present study with the EVET, the Plan construct was added on statistical grounds, and a better fit was obtained when Memory did not drive the Plan construct, but when both drove the Intent construct, for which there were several indicator variables. Moreover, participants were required to memorise a list of errands and create a plan of the order for completion, intended to simulate the everyday requirement of scheduling a list of tasks for completion. These operations might be expected to require resources from memory and planning, so the separate sets of indicator variables for Memory and Plan suggest that these two constructs reflect different cognitive functions that act together but have little mutual dependence. EVET required participants to maintain their internal list of delayed intentions (to plan) and then to realise those intentions within the virtual environment. Presumably this would require frequent consultation of the plan. Clearly, high-scoring participants followed their plans rather than deviating from them online. This argument is consistent with the model in Figure 4, which illustrates strong links between the Intent construct and the indicator variables “EVET remember,” “EVET plan follow,” and “EVET recount.” “EVET remember” refers to the ability to remember, after EVET completion, the errand list, whereas a high score on “EVET plan follow” requires participants to keep track of which errands had been completed and to remember to perform the remaining errands in the planned order. So, a combination of good memory for the errands with consultation and updating of the current representation of the position on the errand list appears to contribute to the Intent construct. A high score on “EVET recount” suggests that participants had an accurate memory for actions that they had actually performed during EVET, so they could recall what errands had actually been completed and which had not. This is consistent with the idea that they could update a representation of which errands had been completed from their planned sequence, and which had not, as they were performing EVET. We would then expect that participants who effectively consulted and followed their plans would have shorter EVET travel times, with fewer deviations from the planned route—hence, the negative loading for the “EVET travel time”—and would achieve more errands as a result (EVET score). These considerations suggest that a factor that reflects plan following should draw on resources of memory and preplanning. This is apparent in the final model, which includes a path between Plan and Intent. This was possible because we included two estimators for planning.

Also intriguing was that the travelling salesman problem loaded on Intent rather than on Plan. Our model therefore suggests that this task is an indicator variable for plan implementation. A detailed analysis of the travelling salesman task is beyond the scope of this research. However, previous studies of the Tower of London task and of chess have shown that most planning for complex tasks takes place online rather than in advance (e.g., Phillips, Wynn, McPherson, & Gilhooly, 2001; Saariluoma, 1995). Yet we found that sticking with an original plan resulted in better scores than did modifying plans during EVET performance. It is possible that participants created a planned order for the errands but then did not preplan the actual moves between errands. For example, when moving around the virtual building, they might have found shorter or more efficient routes between the locations they had to visit for each errand, so the errand order remained as planned, but the movements between errands were modified online. This would also be consistent with EVET travel time loading on the Intent variable.

The model we propose does not exclude the possibility that other models, based on alternative theoretical rationales, might offer a better fit. However, the results support our suggestion that multitasking requires the operation of different cognitive functions acting in concert. One possible account driven by our overall theoretical rationale is that different cognitive functions are required in order to support performance, and they tend to correlate because they act together to achieve a common goal, not because they reflect the operation of a single construct such as a general attentional resource. This adds to our confidence that use of complex paradigms such as EVET, together with a theoretical rationale that assumes the coordinated operation of multiple cognitive functions to achieve task goals, can generate robust and interpretable data to address the complex cognition that supports everyday multitasking.

The multiple regression analysis suggested that overall EVET score was predicted by independent measures of retrospective memory, of visuospatial planning (travelling salesman problem), and of spatial working memory. It may be that a participant’s ability to manipulate spatial information is a key factor for this particular type of multitasking. This may be less important in the Greenwich test, where subtasks are in full view of the participants. No additional unique variance in EVET scores was explained by an independent measure of prospective memory (simulated breakfast making) or by verbal working memory. The lack of a contribution from verbal working memory appeared to be due, at least in part, to co-linearity with the measure of retrospective memory, suggesting that these two measures reflect a common, domain-specific cognitive function that is different from visuospatial working memory and planning and consistent with the multicomponent framework for working memory (Baddeley & Logie, 1999; Logie, in press).

It is striking that performance on the breakfast task (Craik & Bialystok, 2006) was unrelated to overall EVET performance. Craik and Bialystok demonstrated that their task showed clear effects of cognitive ageing and could detect an advantage for older bilingual compared with older monolingual individuals. Participants had to monitor the progress of a number of foods that cook at different rates and stop them cooking at the appropriate time. This is very different from a situation in which participants have a specific, planned order in which they are attempting a memorised list of quite diverse tasks, and this could be a crucial difference between EVET and the breakfast task. It might also be the case that young, healthy participants correct their prospective memory errors by returning to complete “forgotten errands” within the 8-min period. In this case, the EVET score might be insensitive to individual variation in prospective memory performance within EVET. However, this is speculative, and the general issue of prospective memory in multitasking using the EVET clearly merits further investigation. However, detailed consideration of this issue is beyond the scope of the present research, and has been addressed elsewhere (Trawley, Law, & Logie, 2011).

A possible criticism is that the novelty of EVET also comes with uncertainty as to its reliability. This is an issue with any novel paradigm, particularly when it is more complex than is common in studies of human cognition. However, if we are to understand how human cognition deals with everyday complexity, the experimental paradigms will have to be complex as well as robust. The results of the present study illustrate the utility of exploring this form of complex cognition using multiple errands in a virtual environment, retaining a high degree of experimental control together with a degree of realism. Some confidence in the reliability of our paradigm comes from the degree of internal consistency between the different measures of EVET performance and from the robust relationships with more established measures of retrospective memory and working memory. However, further assessment of its reliability would be useful to address in future studies.

One possible limitation of the implementation of EVET in this study is that the results might not generalise to other scenarios. However, with minor modifications to the procedure, EVET could be used to explore a wide range of research questions—for example, when multiple errands are carried out without preplanning, or when plans are interrupted. We have completed experiments of this kind that will be reported elsewhere. Unlike the original multiple-errands test (Shallice & Burgess, 1991), the same environment can be used in a range of different laboratories, and the environment and the data extraction utilities are freely available for not-for-profit research on request from the authors.

This study has developed and demonstrated the utility of a novel methodology to study everyday multitasking. It has added insight into how different cognitive functions act in concert to achieve complex cognitive goals, rather than focusing on each function in isolation. We report new findings regarding the relationship between preplanning and plan following, as well as the relationship between three constructs (Memory, Planning, and Intent) in multitasking by healthy young adults, thereby substantially developing research that has focused previously on the cognitive impact of frontal-lobe damage. The study has also demonstrated that this form of complex cognition can successfully be addressed, and thereby yield insights into how human cognition can meet and manage the multiple requirements of daily living.