Do Video Modeling and Metacognitive Prompts Improve Self-Regulated Scientific Inquiry?

Omarchevska, Yoana; Lachner, Andreas; Richter, Juliane; Scheiter, Katharina

doi:10.1007/s10648-021-09652-3

Do Video Modeling and Metacognitive Prompts Improve Self-Regulated Scientific Inquiry?

Intervention Study
Open access
Published: 20 January 2022

Volume 34, pages 1025–1061, (2022)
Cite this article

Download PDF

You have full access to this open access article

Educational Psychology Review Aims and scope Submit manuscript

Do Video Modeling and Metacognitive Prompts Improve Self-Regulated Scientific Inquiry?

Download PDF

Yoana Omarchevska ORCID: orcid.org/0000-0001-6779-4472¹,
Andreas Lachner^1,2,
Juliane Richter^1,3 &
…
Katharina Scheiter^1,2

5511 Accesses
6 Citations
9 Altmetric
Explore all metrics

Abstract

Guided inquiry learning is an effective method for learning about scientific concepts. The present study investigated the effects of combining video modeling (VM) examples and metacognitive prompts on university students’ (N = 127) scientific reasoning and self-regulation during inquiry learning. We compared the effects of watching VM examples combined with prompts (VMP) to watching VM examples only, and to unguided inquiry (control) in a training and a transfer task. Dependent variables were scientific reasoning ability, hypothesis and argumentation quality, and scientific reasoning and self-regulation processes. Participants in the VMP and VM conditions had higher hypothesis and argumentation quality in the training task and higher hypothesis quality in the transfer task compared to the control group. There was no added benefit of the prompts. Screen captures and think aloud protocols during the two tasks served to obtain insights into students’ scientific reasoning and self-regulation processes. Epistemic network analysis (ENA) and process mining were used to model the co-occurrence and sequences of these processes. The ENA identified stronger co-occurrences between scientific reasoning and self-regulation processes in the two VM conditions compared to the control condition. Process mining revealed that in the VM conditions these processes occurred in unique sequences and that self-regulation processes had many self-loops. Our findings show that video modeling examples are a promising instructional method for supporting inquiry learning on both the process and the learning outcomes level.

Integrating Self-Explanation into Simulation-Based Physics Learning for 7th Graders

Article 22 November 2023

Augmenting the effect of virtual labs with "teacher demonstration" and "student critique" instructional designs to scaffold the development of scientific literacy

Article 04 January 2022

Embedding self-explanation prompts to support learning via instructional video

Article 18 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving scientific reasoning and argumentation is a central aim of science education (Engelmann et al., 2016; OECD, 2013). Consequently, science education has moved toward more inquiry-based learning approaches. Learning from inquiry can be more effective than direct instruction when appropriately guided (Lazonder & Harmsen, 2016). Typically, students use computer simulations to explore scientific concepts by testing hypotheses, conducting experiments, and evaluating data.

Inquiry learning can improve scientific reasoning by having students “act like scientists”, thereby improving their learning of the content and the corresponding scientific processes (Abd-El-Khalick et al., 2004). However, students might struggle with inquiry learning because they lack 1) scientific reasoning skills to conduct experiments or 2) self-regulation abilities, which are particularly important for navigating through complex learning environments like simulations. The present study tested the effectiveness of two types of guidance—video modeling examples and metacognitive prompts. The video modeling examples provided an integrated instruction of scientific reasoning and self-regulated learning. The metacognitive prompts aimed to further ensure the use of self-regulation processes by prompting students to monitor their scientific reasoning activities during inquiry. To our knowledge, this is the first study to develop an intervention aimed at simultaneously fostering scientific reasoning and self-regulation processes in an integrated way and test its effectiveness both at the process and product level (i.e., hypothesis and argumentation quality). To show the intervention’s effectiveness at the process level, we introduced two statistical methods to analyze the conjoint and sequential use of both types of processes that so far have been used only sparingly in educational research, namely, ENA and process mining.

Theoretical Framework

Scientific Reasoning and Argumentation

Scientific reasoning and argumentation skills are essential for comprehending and evaluating scientific findings (Engelmann et al., 2016; Pedaste et al., 2015). These skills refer to understanding how scientific knowledge is created, the scientific methods, and the validity of scientific findings (Fischer et al., 2014). Scientific reasoning and argumentation are defined as a set of eight epistemic activities, applicable across scientific domains (extending beyond the natural sciences, see Renkl, 2018 for a similar discussion)—problem identification, questioning, hypothesis generation, construction and redesign of artefacts, evidence generation, evidence evaluation, drawing conclusions, and communicating and scrutinizing (Fischer et al., 2014; Hetmanek et al., 2018). During problem identification a problem representation is built, followed by questioning, during which specific research questions are identified. Hypothesis generation is concerned with formulating potential answers to the research question, which are based on prior evidence and/or theoretical models. To test the generated hypothesis, an artefact can be constructed and later revised based on the evidence. Evidence is generated using controlled experiments, observations, or deductive reasoning to test the hypothesis. An important strategy for correct evidence generation is the control-of-variables strategy (CVS; Chen & Klahr, 1999), which postulates that only the variable of interest should be manipulated, while all other variables are held constant. The generated evidence is evaluated with respect to the original theory. Next, multiple pieces of evidence are integrated to draw conclusions and revise the original claim.

Argumentation can be considered a consequence of scientific reasoning because the generated evidence is used to draw conclusions about scientific issues (Engelmann et al., 2016). We measured argumentation quality using the claim-evidence-reasoning (CER) framework which breaks down argumentation into a claim, evidence, and reasoning (McNeill et al., 2006). The claim answers the research question, the evidence is the data provided to support the claim, and the reasoning is the justification why the evidence supports the claim. Last, findings are scrutinized and communicated to a broader audience.

Students and adults often struggle with argumentation (Koslowski, 2012; Kuhn, 1991; McNeill, 2011) and scientific reasoning (de Jong & van Joolingen, 1998). However, scientific reasoning and argumentation can be improved with instruction and practice (Osborne et al., 2004), for example, using inquiry learning.

Computer-Supported Inquiry Learning

Students can use online simulations to actively learn about scientific concepts and the inquiry process (Zacharia et al., 2015). During inquiry, students apply some or all of the aforementioned scientific reasoning processes (van Joolingen & Zacharia, 2009). Using online simulations, students can conduct multiple experiments in a short amount of time and investigate concepts which are otherwise difficult to explore (e.g., evolution). More importantly, computer-supported inquiry learning environments provide unique opportunities for learning, like multiple representations and non-linear presentation of information (de Jong, 2006; Furtak et al., 2012).

Students’ active engagement during inquiry learning can pose cognitive and metacognitive challenges for them (Azevedo, 2005; Scheiter & Gerjets, 2007). The (lack of) understanding of the scientific phenomenon or insufficient inquiry skills (e.g., inability to generate a testable hypothesis or designing an unconfounded experiment) can pose cognitive challenges. Moreover, students can experience metacognitive challenges because they need to self-regulate their inquiry process (Hadwin & Winne, 2001; Pintrich, 2000).

Self-Regulated Learning

The importance of metacognition (and self-regulation) for successful scientific reasoning was stressed more than 20 years ago (White & Frederiksen, 1998; Schunk and Zimmerman, 1998). Self-regulated learning is an active, temporal, and cyclical process (Zimmerman, 2013), during which learners set goals, monitor, regulate, and control their cognition, motivation, and behavior to meet their goals (Boekaerts, 1999; Pintrich, 1999). Metacognition is the cognitive component of self-regulated learning (Zimmerman & Moylan, 2009) and is predominantly concerned with monitoring and regulation of learning (Nelson & Narens, 1990). Monitoring refers to students’ ability to accurately judge their own learning. It provides the basis for regulation, that is, students’ selection and application of learning strategies. Self-regulation is particularly important for successful inquiry learning (Chin & Brown, 2000; Kuhn et al., 2000; Omarchevska et al., 2021; Reid et al., 2003; White et al., 2009). For instance, students need to monitor whether they are manipulating the correct variables or how much data they need before drawing a conclusion. According to a fine-grained analysis of students’ self-regulation and scientific reasoning processes monitoring during scientific reasoning activities was associated with higher argumentation quality (Omarchevska et al., 2021).

Because of the fundamental importance of accurate monitoring, we assessed metacognitive monitoring accuracy in relation to hypothesis and argumentation quality using retrospective confidence judgements (Busey et al., 2000). Moreover, we assessed students’ academic self-concept and interest. Interest and academic self-concept are motivational factors that can influence self-regulation (Hidi & Ainley, 2008; Ommundsen et al., 2005). Interest is a psychological state with both affective and cognitive components that is also a predisposition to re-engage with the content in the future (Hidi & Renninger, 2006). Interest is positively associated with understanding, effort, perseverance (Hidi, 1990) and the maintenance of self-regulation (e.g., goal setting, use of learning strategies; Renninger & Hidi, 2019). Academic self-concept is a person’s perceived ability in a domain (e.g., biology; Marsh & Martin, 2011) which is positively related to effort (Huang, 2011), interest (Trautwein & Möller, 2016), achievement (Marsh & Martin, 2011) and self-regulation strategies (Ommundsen et al., 2005). Therefore, we controlled for students’ interest and academic self-concept.

Guidance During Computer-Supported Inquiry Learning

Guidance during inquiry can support students’ learning both cognitively and metacognitively by tailoring the learning experience to their needs during specific phases of inquiry (Quintana et al., 2004). Guidance can be provided using process constraints, performance dashboards, prompts, heuristics, scaffolds, or direct presentation of information (de Jong & Lazonder, 2014). Furthermore, guidance should aim to also support self-regulated learning (Zacharia et al., 2015), as self-regulation has been shown to be important to successful inquiry learning (Omarchevska et al., 2021). Therefore, combining scientific reasoning and self-regulation instruction might be beneficial for teaching scientific reasoning. However, only few studies have invested whether supporting self-regulation during inquiry improves learning (Lai et al., 2018; Manlove et al., 2007, 2009). Last, more research on the effects of combining different types of guidance is needed (Lazonder & Harmsen, 2016; Zacharia et al., 2015). Therefore, we used video modeling examples to support scientific reasoning and self-regulation in an integrated way; in addition, metacognitive prompts were implemented to further support monitoring.

Video Modeling Examples

The rationale for using video modeling examples is rooted in theories of example-based learning during which learners acquire new skills by seeing examples of how to perform them correctly. Novice learners can benefit from studying a detailed step-by-step solution to a task before attempting to solve a problem themselves (Renkl, 2014; van Gog & Rummel, 2010). Studying worked examples reduces unnecessary cognitive load and frees up working memory resources so learners can build a problem-solving schema (Cooper & Sweller, 1987; Renkl, 2014). Example-based learning has been studied from a cognitive (cognitive load theory; Sweller et al., 2011) and from a social-cognitive perspective (social learning theory; Bandura, 1986). From a cognitive perspective, most research has focused on the effects of text-based worked examples, whereas social-cognitive studies have focused on (video) modeling examples (cf. Hoogerheide et al., 2014). Video modeling examples integrate features of worked examples and modeling examples (van Gog & Rummel, 2010) and they often include a screen recording of the model’s problem-solving behavior combined with verbal explanations of the problem-solving steps (McLaren et al., 2008; van Gog, 2011; van Gog et al., 2009).

Video modeling examples can support inquiry and learning about scientific reasoning principles (Kant et al., 2017; Mulder et al., 2014). Watching video modeling examples before or instead of an inquiry task led to performing more controlled experiments, indicating that students can learn an abstract concept like controlling variables (CVS) using video modeling examples (Kant et al., 2017; Mulder et al., 2014). Outside the context of inquiry learning, using video modeling examples to train self-regulation skills (self-assessment and task selection) improved students’ learning outcomes in a similar task (Kostons et al., 2012; Raaijmakers et al., 2018a, 2018b) but this outcome did not transfer to a different domain (Raaijmakers et al., 2018a).

Nevertheless, most studies have focused on either supporting scientific reasoning (e.g., CVS, Kant et al., 2017; Mulder et al., 2014) or self-regulation during inquiry learning (Manlove et al., 2007). Likewise, video modeling research has also focused on either supporting scientific reasoning (Kant et al., 2017; Mulder et al., 2014) or self-regulation (Raaijmakers et al., 2018a, 2018b). However, these studies have investigated scientific reasoning and self-regulation separately, whereas video modeling examples may be particularly suitable for integrating instruction of both scientific reasoning and self-regulated learning. Scientific reasoning principles can be easily demonstrated by showing how to conduct experiments correctly. Providing verbal explanations of the model’s thought processes can be used to integrate self-regulated learning principles into instruction on scientific reasoning. For example, explaining the importance of planning for designing an experiment is one way to integrate these two constructs. Metacognitive monitoring can be demonstrated by having the model make a mistake, detect it, and then correct it (vicarious failure; Hartmann et al., 2020). In contrast to previous research focused on task selection and self-assessment skills (Kostons et al., 2012; Raaijmakers et al., 2018a, 2018b), we investigated the effectiveness of video modeling examples for training and transfer of other self-regulation skills—planning, monitoring, and control. Moreover, we studied whether a video modeling intervention that integrates scientific reasoning and self-regulation instruction will improve inquiry learning. To ensure that participants engaged with the videos constructively (Chi & Wylie, 2014), we supplemented the video modeling examples with knowledge integration principles which involve “a dynamic process of linking, connecting, distinguishing, and structuring ideas about scientific phenomena” (Clark & Linn, 2009, p. 139). To further support self-regulated learning, we tested the effectiveness of combining video modeling examples with metacognitive prompts.

Metacognitive Prompting

Metacognitive prompts are instructional support tools that guide students to reflect on their learning and focus their attention on their thoughts and understanding (Lin, 2001). Prompting students to reflect on their learning can help activate their metacognitive knowledge and skills, which should enhance learning and transfer (Azevedo et al., 2016; Bannert et al., 2015). Metacognitive prompts support self-regulated learning by reminding students to execute specific metacognitive activities like planning, monitoring, evaluation, and goal specification (Bannert, 2009; Fyfe & Rittle-Johnson, 2016). Metacognitive prompts are effective for supporting students’ self-regulation (Azevedo & Hadwin, 2005; Dori et al., 2018) and hypothesis development (Kim & Pedersen, 2011) in computer-supported learning environments.

Even though providing support for self-regulated learning improves learning and academic performance on average (Belland et al., 2015; Zheng, 2016), some studies did not find beneficial effects of metacognitive support on learning outcomes (Mäeots et al., 2016; Reid et al., 2017). To understand why, it is necessary to consider the learning processes of students (Engelmann & Bannert, 2019). Process data could help determine whether students engaged in the processes as intended by the intervention or identify students who failed to do so. For instance, process data can provide further insights on the influence of prompts on the learning process (Engelmann & Bannert, 2019; Sonnenberg & Bannert, 2015).

Modeling Learning Processes

In the following, we will introduce two highly suitable methods for studying the interaction between scientific reasoning and self-regulation processes—epistemic network analysis (Shaffer, 2017) and process mining (van der Aalst, 2016). These methods go beyond the traditional coding-and-counting approaches by providing more insight into the co-occurrences and sequences of learning processes.

Epistemic Network Analysis

Epistemic network analysis (ENA; Shaffer, 2017) is a novel method for modeling the temporal associations between cognitive and metacognitive processes during learning. In ENA “the structure of connections among cognitive elements is more important than the mere presence or absence of these elements in isolation” (Shaffer et al., 2016, p. 10). Therefore, it is essential to not only consider individual learning processes, but also preceding and following processes. ENA measures the structure and the strength of connections between processes, based on their temporal co-occurrence, and visualizes them in dynamic network models (Shaffer et al., 2016). The advantage of ENA is that the temporal patterns of individual connections can be easily captured and compared between individuals. In an exploratory think aloud study (Omarchevska et al., 2021), we found that students who were monitoring during scientific reasoning activities achieved higher argumentation quality than their peers who did not monitor using ENA. These findings demonstrated the added value of studying the temporal interaction between scientific reasoning and self-regulation processes and its effects on argumentation quality. The present study builds upon these findings by studying the effects of an intervention on learning processes as revealed not only by ENA, but also by process mining.

Process Mining

Process mining is a suitable method for modeling and understanding self-regulation processes (Bannert et al., 2014; Engelmann & Bannert, 2019; Roll & Winne, 2015). Process mining is a form of educational data mining, which uses event data to discover process models. Process models reveal the sequences between learning events, which provides insights into the sequential relationships between cognitive and metacognitive processes (Engelmann & Bannert, 2019).

In educational research, process mining was used to discover different student profiles and their learning processes in relation to their grades (Romero et al., 2010). Furthermore, process mining provided additional insights into the sequential structure of self-regulated learning processes (Bannert et al., 2014; Sonnenberg & Bannert, 2015, 2019). However, process mining techniques have not been used to model the relationship between scientific reasoning and self-regulated learning processes yet. Therefore, we used process mining to identify sequential relationships between scientific reasoning and self-regulation processes and combined it with ENA findings for a comprehensive analysis of the interaction between the two processes as each method has its unique benefits.

Process mining does not provide a statistical comparison between different process models, which is, however, offered within ENA. In contrast, ENA does not provide information about the direction of the relationship and does not consider when the same process is performed several times, whereas process mining provides information about the direction of the path and information about self-loops. To our knowledge, this is the first study to combine both methods and to use them to test the effects of educational interventions at the process level.

The Present Study

This study tested the effects of two types of guidance—video modeling examples and metacognitive prompts—on scientific reasoning performance and self-regulation during inquiry learning. Participants engaged in an inquiry training and transfer task using two computer simulations. Screen captures and think aloud protocols were used to collect scientific reasoning and self-regulation process data, respectively. Effects of the intervention were expected to occur for 1) scientific reasoning ability as measured with a multiple-choice test, 2) scientific reasoning and self-regulation processes, and 3) the product of scientific reasoning, namely, the quality of the generated hypotheses and of the argumentation provided to justify decisions regarding the hypotheses. We preregistered (https://aspredicted.org/vs43g.pdf) the following research questions and hypotheses:

RQ1) Can video modeling and metacognitive prompts improve scientific reasoning ability?

In line with Kant et al. (2017), we hypothesized that students in the two VM conditions would have higher scientific reasoning posttest scores than the control group (H1a). Because of the benefits of providing metacognitive support (Azevedo & Hadwin, 2005), we hypothesized that the VMP condition would further outperform the VM condition (H1b).

RQ2) What are the immediate effects of video modeling and metacognitive prompts while working on an inquiry training task at the product level (hypothesis and argumentation quality) and process level (scientific reasoning and self-regulation)?

In line with Mulder et al. (2014), we hypothesized that students in the two VM conditions would have higher hypothesis and argumentation quality (H2a) than the control group in the training task. In line with Kim and Pedersen (2011), we hypothesized that the VMP condition would further outperform the VM condition (H2b).

RQ3) Do the effects of video modeling and metacognitive prompts on scientific reasoning products and processes transfer to a novel task?

In line with van Gog and Rummel (2010), we hypothesized that students in the two VM conditions would have higher hypothesis and argumentation quality (H3a) than the control group in the transfer task. In line with Bannert et al. (2015), we hypothesized that the VMP condition would outperform the VM condition (H3b).

Additionally, we explored the process models of participants’ scientific reasoning and self-regulation processes in different conditions using ENA and process mining in the two tasks. Moreover, we explored participants’ monitoring accuracy for hypothesis and argumentation quality in both tasks.

Method

Participants and Design

Participants were 127 university students from Southern Germany (26 males, M_age = 24.3 years, SD = 4.81). Participants had an academic background in science (n = 40), humanities (n = 43), law (n = 11), social science (n = 20), or other (n = 14). Participation in the experiment was voluntary and informed consents were obtained from all participants. The study was approved by the local ethics committee (2019/031). The experiment lasted 1 h and 30 min and participants received a monetary reward of 12 Euros.

The experiment had a one-factorial design with three levels and participants were randomly placed in one of three conditions. In the first condition (VMP, n = 43), participants watched video modeling examples (VM) before working with the virtual experiments and they received metacognitive prompts (P) during the training phase (see Fig. 1). In the second condition (VM, n = 43), participants watched the same video modeling examples without receiving metacognitive prompts during the training phase. In the third condition (control, n = 41), participants engaged in unguided inquiry task with the same virtual experiment that was used in the video modeling examples; however, they received neither video modeling instruction nor metacognitive prompts.

A priori power analysis using G*Power (Faul et al., 2007) determined the required sample size to be 128 participants (Cohen’s f = 0.25, power = 0.80, α = 0.05) for contrast analyses. Effect size calculations were based on previous research using video modeling examples to enhance scientific reasoning (Kant et al., 2017). Data from one participant were not recorded due to technical issues, resulting in a sample size of 127.

Materials and Procedure

Phase 1—Instruction

We first assessed demographic information, conceptual knowledge, academic interest and self-concept (Fig. 1). During the instruction, participants either watched video modeling examples (intervention groups) or they engaged in an unguided inquiry learning task using the simulation Archimedes’ Principle (control group, Fig. 2). In this simulation, a boat is floating in a tank of water. The boat’s dimensions and weight and the liquid’s density can be varied. When the boat sinks, the displaced liquid overflows in a cylinder. In this way, Archimedes’ principle, which states that the upward buoyant force that is exerted on a body immersed in a fluid, is equal to the weight of the fluid that the body displaces, can be investigated.

In the two video modeling conditions, participants watched 3 non-interactive videos (each 3 min on average). The videos were screen captures recorded using Camtasia Studio which showed a female model’s interactions with the simulation Archimedes’ principle. The model was thinking aloud and explaining the different steps of scientific reasoning, but she was not visible in the videos.

To engage participants with the videos, knowledge integration principles were used (Clark & Linn, 2009). Before watching each video, participants’ ideas about the topic of each video modeling example were elicited (e.g., “When conducting a scientific experiment, what is important to keep in mind before you start collecting data?”). After watching each video, participants noted down the most important points and compared their first answer to what was explained in the video, which engaged them in reflection (Davis, 2000).

In the first video, the model explained problem identification and hypothesis generation. She explained how to formulate a research question and a testable hypothesis. Then, she developed her own hypothesis, which she later tested.

In the second video, the model explained planning a scientific experiment and the control of variables strategy (CVS). To demonstrate CVS, we used a coping model (van Gog & Rummel, 2010), who initially made a mistake by manipulating an irrelevant variable, which she then corrected, and explained that manipulating irrelevant variables can lead to confounded results, thereby also demonstrating metacognitive monitoring.

In the last video, evidence generation, evidence evaluation, and drawing conclusions were modeled by conducting an experiment to test the hypothesis. Data were systematically collected and presented in a graph. The model explained the importance of conducting multiple experiments to not draw conclusions prematurely, which also modeled metacognitive monitoring and control. Last, she evaluated the evidence and drew conclusions.

In the control condition, participants worked with the same virtual experiment used in the videos without receiving guidance. They answered the same research question as the model in the videos. To keep time on task similar between conditions, participants had 10 min to work on the task.

Phase 2—Training Task

Participants were first instructed to think aloud by asking them to say everything that comes to their mind without worrying about the formulation. Participants were given a short practice task (“Your sister's husband's son is your children’s cousin. How is he related to your brother? Please say anything you think out loud as you get to an answer.”). Participants watched a short video about photosynthesis, which served to re-activate their conceptual knowledge. Then, they solved the training inquiry task using the simulation Photosynthesis and an experimentation sheet. In Photosynthesis (see Fig. 3), the rates of photosynthesis (measured by oxygen production) are inferred by manipulating different variables (e.g., light intensity).

Participants were asked to answer the following research question: “How does light intensity influence oxygen production during photosynthesis?”. They wrote down their hypothesis, collected data using the simulation and answered the research question on the experimentation sheet. We asked participants to support their answers with evidence. Hypothesis and argumentation quality were coded from these answers. Participants made retrospective confidence judgments regarding their hypothesis (“How confident are you that you have a testable hypothesis?”) and their final answer (“How confident are you that you have answered the research question correctly?”).

Participants in the VM and in the control condition solved the task using only the Photosynthesis simulation and the worksheet. In the VMP condition, students additionally received 3 metacognitive prompts (see Table 1), which asked them to monitor specific scientific reasoning activities. Each prompt asked participants to rate their confidence on a scale from 0 to 100. The first two prompts were presented as pop-up messages during the training task after 3 and 9 min, respectively. The third prompt was visible after participants finished the training task and gave the option to go back and conduct more experiments.

Table 1 The metacognitive prompts used in the video modeling and prompting condition

Full size table

Phase 3—Transfer Task

In the transfer task, all participants worked with the Energy Conversion in a System (see Fig. 4) simulation. First, participants read a short text about the law of conservation of energy which provided them with the necessary conceptual knowledge to use the simulation. In Energy Conversion in a System, participants could manipulate the quantity and the initial temperature of water in a beaker. Water is heated using a falling cylinder attached to a rotating propeller that stirs the water in a beaker. The mass and height of the cylinder could be adjusted. The change in the water’s temperature is measured and energy is converted from one form to another.

The transfer task had an identical structure to the training task and was delivered through the experimentation sheet. Participants were asked to use the simulation to answer the question “How does changing the waters’ initial temperature and the water’s mass affect the change in temperature?”. Participants investigated the influence of two variables (water mass and water temperature) and noted down their results in a table with four columns (water mass, water initial temperature, water final temperature, change in temperature), which provided further guidance. Retrospective confidence judgments were provided for the hypothesis and the final answer. The task was the same in all conditions.

Measures

Conceptual Knowledge

Conceptual knowledge in photosynthesis (e.g., “What is the function of the chloroplasts?”) and energy conversion (e.g., “The law of conservation of energy states that…”) was assessed prior to the experiment using 5 multiple-choice items with 5 answer options for each topic. Each question had one correct answer and “I do not know” was one of the answer options. Both scales had low internal consistency (photosynthesis, Cronbach’s α = 0.63; energy conversion, Cronbach’s α = 0.15), because they assessed prior understanding of independent facets related to photosynthesis and energy conversion. Therefore, computing internal consistency for such scales might not be appropriate (Stadler et al., 2021).

Academic Self-Concept and Interest in Science

The academic self-concept scale comprised 5 items rated on a Likert scale (Cronbach’s α = 0.93) ranging from 1 (I do not agree at all) to 4 (I completely agree) (Grüß-Niehaus, 2010; Schanze, 2002). An example item of the scale is “I quickly learn new material in natural sciences.”. Likewise, interest in science was assessed using a 5-item Likert scale (Cronbach’s α = 0.94) ranging from 1 (I do not agree at all) to 4 (I completely agree) (Wilde et al., 2009). An example item of the scale is “I am interested in learning something new in natural sciences.”.

Scientific Reasoning Ability

Scientific reasoning ability was assessed using 12 items from a comprehensive instrument (Hartmann et al., 2015; Krüger et al., 2020) that assessed the skills research question formulation (4 items), hypothesis generation (4 items), and experimental design (4 items). For each skill, we chose easy, medium, and difficult questions, based on data obtained by the authors of the instrument. Since the test was originally developed for pre-service science teachers, we chose items that did not rely on prior content knowledge. The questions matched the domains of the inquiry tasks (biology, physics). The scale had low internal consistency (Cronbach’s α = 0.31), most likely because the test assessed three independent skills (cf. Stadler et al., 2021).

Scientific Reasoning Products

Hypothesis Quality

To assess hypothesis quality, we developed a coding scheme which scored participants’ hypotheses based on their testability (0–2) and correctness (0–2), adding up to a maximum score of 4, see Table 2. Due to the complexity of the coding scheme, we used consensus ratings (Bradley et al., 2007) for the scoring (initial inter-rater agreement: Krippendorff’s α = 0.65). Two raters independently scored all hypotheses (N = 127) and then discussed all disagreements until a consensus was reached.

Table 2 Coding scheme used for assessing hypothesis quality

Full size table

Argumentation Quality

Argumentation quality was assessed by coding participants’ answers to the research questions. The quality of the claim, the evidence, and the reasoning were scored between 0 and 2, adding up to a maximum score of 6 (McNeill et al., 2006), see Table 3. We adapted the coding scheme from McNeill et al. (2006) to the context of our study. Participants were given an extra point when they evaluated their hypothesis in their final answer. Again, we used consensus ratings (initial agreement: Krippendorff’s α = 0.67) for the scoring.

Table 3 Coding scheme used for assessing argumentation quality

Full size table

Scientific Reasoning and Self-Regulation Processes

We used screen captures and think aloud protocols to assess scientific reasoning and self-regulation processes, which were coded using Mangold INTERACT® (Mangold International GmbH, Arnstorf, Germany; version 9.0.7). Using INTERACT, the audio (think aloud) and video (screen captures) can be coded simultaneously. First, two experienced raters independently coded 20% of the videos (n = 25) and reached perfect agreement (κ = 1.00); therefore, each rater coded half of the remaining videos. Due to technical issues with the audio recording, the process data analysis was conducted on a smaller sample (N = 88, n_VMP = 29, n_VM = 37, n_control = 22).

The raters used the coding scheme in Table 4, which was previously used by Omarchevska et al. (2021) to code scientific reasoning, self-regulation, and the use of cognitive strategies. Scientific reasoning processes were coded from both data sources, whereas self-regulation processes and use of cognitive strategies were coded from the think aloud protocols only. Regarding scientific reasoning, we focused on the epistemic activities problem identification, hypothesis generation, evidence generation, evidence evaluation, drawing conclusions (Fischer et al., 2014). As measures of self-regulation, we coded the processes of planning (goal setting), monitoring (goal progress, information relevance), and control. We coded the use of cognitive strategies, namely, activation of prior knowledge and self-explaining.

Table 4 Coding scheme used for assessing scientific reasoning, self-regulation, and cognitive strategies processes

Full size table

Monitoring Accuracy

Monitoring accuracy was measured by calculating bias scores, based on the match between the confidence judgements scores and the corresponding performance scores for hypothesis and argumentation quality (4 monitoring accuracy scores per participant). Since confidence judgements scores ranged from 0 to 100 and hypothesis and argumentation quality scores ranged from 0 to 6, they were rescaled to range between 0 and 100. Monitoring accuracy was computed by subtracting the hypothesis and argumentation quality scores from the confidence judgment scores (Baars et al., 2014). Positive scores indicate overestimation, negative scores indicate underestimation, and scores close to zero indicate accurate monitoring (Baars et al., 2014).

Data Analyses

To test our preregistered hypotheses, we used contrast analyses: Contrast 1 (0.5, 0.5, -1) compared the VMP and the VM conditions to the control condition and Contrast 2 (1, -1, 0) compared the VMP and the VM conditions to each other. We applied the Bonferroni correction for multiple tests which resulted in α = 0.025 for all contrast analyses. Benchmarks for effect sizes were: η² = 0.01, 0.06, 0.14 and d = 0.20, 0.50, 0.80 for small, medium, and large effects, respectively.

We first compared the groups in the scientific reasoning ability posttest using contrast analysis (H1a, H1b) with scientific reasoning ability pretest as a covariate. We then applied contrast analyses to compare hypothesis and argumentation quality in the training (H2a, H2b) and the transfer task (H3a, H3b). We explored monitoring accuracy in the two tasks using one-way ANOVAs. To explore the training and transfer effects on the process level, we used ENA and process mining.