Introduction

Inquiry learning environments in science invite students to engage actively in a variety of processes that relate to knowledge development, such as orienting themselves, formulating hypotheses, experimenting, and drawing conclusions (de Jong 2006). Empirical studies invariably reveal that, without scaffolding, students fail to engage adequately in exploratory activities, test variables and conditions randomly and incompletely, and rarely restudy assignments that have been completed incorrectly (e.g., de Jong and van Joolingen 1998; Hagemans et al. 2013; Mayer 2004). Therefore, it is not surprising that pure inquiry learning has been found to be less conducive to learning than guided-inquiry learning or direct instruction (e.g., Alfieri et al. 2011; Eysink et al. 2009; Kirschner et al. 2006; Scalise et al. 2011).

Considerable effort has been spent on creating effective support in inquiry learning environments in science. A large majority of these scaffolds address cognitive and meta-cognitive processes (Hagemans et al. 2013). Attention for motivational scaffolding is scarce. This is somewhat strange in view of the fact that motivation, “the process whereby goal-directed activity is instigated and sustained” (Pintrich and Schunk 2002, p. 5), can be considered a precondition of successful inquiry learning. Learning is at risk when students exert little effort because they lack confidence in their ability to complete inquiry activities successfully and fail to perceive the relevance of task engagement (Wang et al. 2008). This study investigates whether student motivation in a science inquiry learning environment can be enhanced via motivational support from an animated pedagogical agent (APA). In addition, we investigate whether the APA influences learning.

A long-standing concern in science education is its limited attractiveness to female students; girls have been found to be under-represented in science classrooms (Ceci et al. 2009). One factor that contributes to this gender difference is task relevance and self-efficacy for science for which girls tend to have lower appraisals than boys (Ackerman et al. 2013; Nagy et al. 2010; Wigfield and Eccles 2000; Yeung et al. 2010). These findings suggest that the presence of a motivational APA in a science inquiry learning environment could be especially helpful for female students.

Animated pedagogical agents and student motivation

APAs are software agents that guide users through virtual (computer-based) environments. They are commonly presented as an image and voice. These images come in many different guises, including humans (e.g., AutoTutor—Graesser and McNamara 2010), animals (e.g., Herman the Bug—Moreno et al. 2001), and inanimate objects (e.g., Microsoft’s clippy—Haake 2009). Guidance in the form of an APA is expected to have several benefits.

Besides being potentially beneficial for learning, APAs can decrease anxiety and direct the student’s attention to key elements (Clark and Choi 2005; Gulz 2005). APAs can also humanize the user experience in virtual environments. Indeed, some of the first APAs were introduced into these environments specifically for the purpose of making the users’ interactions with the system more life-like (e.g., André et al. 1996; Bates 1994; Cassell 2000; Lester et al. 1997; Paiva and Machado 1998; Picard and Klein 2002). Finally, APAs can also prime a social interaction schema that can positively influence student motivation (e.g., André et al. 1999; Atkinson 2002; Choi and Clark 2006; Domagk 2010; Frechette and Moreno 2010; Moreno et al. 2000; Moreno et al. 2001; Moreno et al. 2010; Moundridou and Virvou 2002; Plant et al. 2009).

A subset of empirical studies on APAs have included a design type that is of particular interest for education, namely, a voice-only variant (e.g., André et al. 1999; Atkinson 2002; Choi and Clark 2006; Dunsworth and Atkinson 2007; Lusk and Atkinson 2007; Moreno et al. 2000). The main argument for trying such a scaled-down version of an APA is that the agent’s voice might be enough to personalize the students’ experience with the system, and implementing an embodied APA has added costs in terms of time and money. However, the outcomes of voice-only studies have been mixed.

A large set of design features for image and voice APAs has been examined. Many of these belong to what Moreno (2005) calls the agent’s social features, or external properties. They consist of the aspects of the visual and auditory presence of the APA that are hypothesized to “make the learning experience more interesting, believable, or natural” (p. 508). Empirical studies of these properties have explored the influence of the APA’s age, gender, ethnicity, politeness, responsiveness, dynamism, and visual appeal, among others (e.g., André et al. 1996; Arroyo et al. 2011, 2009; Baylor 2011; Baylor and Kim 2004; Baylor and Ryu 2003; Baylor et al. 2004; Craig et al. 2002; Domagk 2010; Kim et al. 2007; Lester et al. 1997; Moreno 2004; Moreno et al. 2000; Plant et al. 2009; Wang et al. 2008; Woolf et al. 2010). Given our special interest in gender and science education, we summarize below the empirical studies that have investigated gender and motivation in relation to APAs in virtual science learning environments.

The influence of external APA properties on female students’ motivation for science was studied by Rosenberg-Kima et al. (2008) in two consecutive experiments. Study 1 investigated the hypothesis that an APA can be a more persuasive social model when the agent is embodied and not merely a voice. The study further tested Bandura’s (1997) claim that model-target similarity increases the impact of the model, by systematically varying the APA’s gender. Female undergraduate students received a 20-min presentation on female engineers by a male or female APA with or without visual presence. The findings bore out the prediction that the students would be more motivated by the embodied APA. The science interest and self-efficacy beliefs of female students who had both seen and heard the APA increased more than for those who had only heard the voice. However, the predicted advantage of the female over the male APA was not found.

Study 2 further investigated the model-target similarity claim, with the APA made more peer-like or expert-like in different ways. The APAs’ gender, age and “coolness” (i.e., type of clothing and hairstyle) were systematically varied. Age and coolness, but not gender, were found to positively affect self-efficacy and interest. The young and cool-looking APA was significantly more effective for self-efficacy than the old and uncool APA. Only a trend in that direction was found for interest. The agent’s gender affected only the students’ stereotype beliefs. The female APA helped counter the students’ traditional science stereotypes significantly more than did the male APA.

These findings were partially replicated in a study involving female undergraduate students of different races (i.e., black or white) and APAs that systematically varied in race and gender (Rosenberg-Kima et al. 2010). That is, the most persuasive APA for improving the self-efficacy and science interest of female black students was a female black agent. For female white students, interest was positively affected by the white APA, and a marginally significant stereotype-reducing effect of the female APA was found.

Plant et al. (2009) investigated the influence of an APA and its gender (male, female, or no agent) on the science motivation, beliefs and performance of male and female middle school students who received a 20-min presentation on female engineers. A main effect of APA gender was found for stereotyping. Students, especially boys, significantly weakened their traditional gender stereotypes about science with the female APA. A significant main effect of the agent’s presence on self-efficacy was also found. Finally, the female APA had a significantly higher effect on science interest than did the male APA. These findings led the authors to conclude that the female APA was more effective on most measures, and that both males and females benefitted equally from her presence.

External properties are obviously important for the design of an APA. However, according to Moreno (2005) it is just as important, if not more so, to consider as well the internal properties, the actions or instructional methods, of an APA. These properties are primarily communicated through the APA messages, which for a motivational APA should focus on facilitating or enhancing student motivation.

The design and effectiveness of the internal properties of such a motivational APA have rarely been studied. The research literature offers only a handful of empirical studies that have manipulated APA messages to influence student motivation. None of these studies concerned science education. Some empirical studies have investigated gender and motivation in relation to APAs in virtual math learning environments, however. We describe these studies below because they represent the closest relevant research in this area. That is, there are similar gender issues to math motivation as there are to science (e.g., Else-Quest et al. 2013; Ferry et al. 2000).

Arroyo et al. (2009) investigated the influence of motivational APAs in two consecutive studies. Participants in study 1 were high school students enrolled in math classes. The instructional material consisted of an adaptive tutoring system for math. There were three conditions: no APA, male, or female APA. The embodied APA took on the role of learning companion. He or she commented on the student’s answer to a math problem. These messages were based on Dweck’s (2007) recommendations about disregarding success and valuing effort. For instance, if a student gave a correct answer to a problem in which he or she had invested little effort, the APA would give a message that diminished the student’s feelings of ability and stimulated testing of the student’s boundaries (i.e., “That was good, however I prefer harder questions.”). And if a student gave a correct answer after exerting high effort, the APA would be very complimentary (i.e., “Hey, congratulations! Your effort paid off, you got it right.”). The experiment yielded no significant effects of condition on task relevance, self-efficacy (the authors use the term self-concept), and learning. The participants in study 2 were female undergraduate students taking a math class. Students were randomly assigned to the math tutoring system with a male or female APA who gave similar motivational messages as in study 1. The findings showed that there was a significant effect of condition on motivation and learning outcomes. The female students in the study had higher appraisals for self-efficacy and higher learning outcomes when they had been working with a male APA.

In a later study (Arroyo et al. 2011), the APA’s messages were based on Weiner’s (1979) attribution theory. ‘Attribution training’ and ‘effort-affirmation’ interventions were now linked with the students’ problem-solving stages. Attribution training messages tried to address the students’ beliefs about success or failure. Effort-affirmation messages acknowledged, and sometimes praised correct solutions. Participants in this study were high school students. The study included conditions with a male APA, a female APA or no agent for the math tutorial. Significant effects for both the presence and gender of the agent were found. The APA significantly increased math interest during training. In addition, the APA increased self-efficacy and reduced anxiety for the female participants, but not the males. In addition, the female APA yielded significantly higher gains (post–pre) on students’ perceptions of task relevance and self-efficacy than did the male APA. No effect of condition on learning outcomes was found.

All in all, empirical research on the external properties of APAs shows that students’ science motivation can be positively affected by the embodied presence (i.e., image and voice) of an agent. In addition, these studies suggest that the impact of the APA on female students’ motivation can be maximized by presenting the agent as female, young, attractive and cool (see also Baylor 2011). The study by Plant et al. (2009) suggests that such a design “may benefit both genders equally” (p. 214). Empirical research on the internal properties of APAs in the related domain of mathematics has also revealed benefits from the presence of a motivational agent. However, these studies provide equivocal support for employing a female agent. Whereas an earlier study by Arroyo et al. (2009) favored the presence of a male APA for female students, a later study (2011) favored the female agent for a mixed population for enhancing student motivation.

Experimental design and research questions

The primary aim of this study is to find out whether student motivation in a science inquiry learning environment can be enhanced with a motivational APA. In addition, we look at the influence of such an APA on learning. Three conditions are compared: control (no image and no voice), voice (no image), and agent (image and voice). In the control condition, students work with the basic version of the inquiry learning environment. In the voice condition, the agent’s voice is added to this environment. In the agent condition, students see and hear the APA in the environment. Special attention is given to the issue of gender because research on science learning repeatedly reports lower motivation for girls (e.g., Catsambis 1995; Lau and Roeser 2002; Mattern and Schau 2002; Osborne et al. 2003; Yeung et al. 2010).

The APA’s motivational focus was based upon the expectancy-value model of achievement motivation (Eccles and Wigfield 2002). According to this model, task values and expectancies for success are important predictors of behavior. Task values have to do with incentives or reasons for task engagement. A commonly used term for the concept that refers to these values is task relevance, which can be defined as a person’s valuation, interest, and commitment to achieving a particular goal (Pintrich and Schunk 2002). Expectancies for success are beliefs that affect goal setting, activity choice, and willingness to expend effort and persistence. An important construct for these expectancies is self-efficacy, which refers to the student’s belief in his or her own capacity to succeed at specific tasks (Bandura 1997).

Keller’s (2010) attention, relevance, confidence, and satisfaction (ARCS) model was used to design the agent’s messages. More specifically, the APA’s messages were based on the design guidelines for the relevance and confidence components of the ARCS, which address perceptions of task relevance and self-efficacy. Several (non-agent) studies have reported significant effects of ARCS strategies on these motivational constructs (e.g., Feng and Tuan 2005; Huett et al. 2008; Keller and Suzuki 2004; Loorbach et al. 2007, 2006; Newby 1991; Song and Keller 2001). Specifically, the APA’s messages were designed to inform students about task relevance by addressing goal orientation, motive matching and familiarity. In addition, self-efficacy was addressed through messages on learning requirements, opportunities for success and personal control (see “Methods” section). Students in both the agent and voice conditions received the same audible messages, but in the voice condition students did not see the accompanying image of the APA who is speaking the messages.

Bandura (1997) indicates that model-target similarity increases the impact of the model. This suggests that it is preferable to use a female agent because female students are more likely to need motivational support for science learning. For this reason, and also because most of the reviewed studies favored the female agent, we presented a female APA in the science inquiry learning environment.

The research questions of the study are as follows:

  • Research question 1: Does time, condition or gender affect motivation?

Students completed a motivation questionnaire in which they appraised task relevance and self-efficacy before, during and after training. The following hypotheses based on the studies reviewed were tested:

Hypothesis 1

There will be a positive overall change in motivation over time.

Hypothesis 2

The strongest increase in motivation will be found in the agent condition, followed by the voice condition, and, finally, the control condition.

Hypothesis 3

There will be a greater increase in girls’ motivation than boys’ motivation in all conditions.

  • Research question 2: Does gender affect appraisal of the agent as a model?

After training was completed, students in the agent condition, who both saw the APA and heard its messages, rated the quality of the agent as a model. Because the APA was designed to be especially appealing to girls, the following hypothesis was tested:

Hypothesis 4

Girls will give higher ratings for the agent as a model than boys.

  • Research question 3: Does time, condition or gender affect knowledge development?

Before and after training students completed a knowledge test, so that learning gains could be assessed. Earlier research has already shown that the learning environment used in this study can itself yield significant learning gains (Hagemans et al. 2013). Based on the assumption that motivation influences learning, the following hypotheses were tested:

Hypothesis 5

There will be an overall learning gain over time.

Hypothesis 6

The strongest learning gain will be found in the agent condition, followed by the voice condition and, finally, the control condition.

Hypothesis 7

There will be a greater learning gain for girls than boys in all conditions.

Methods

Participants

Participants were 61 students (mean age 14.7 years; range 13–16) from four third-year classrooms in a secondary school in the Netherlands. Students were randomly assigned to conditions. Stratification was used in order to have similar distributions for classroom and gender within each condition. There were 10 boys and 11 girls in the agent condition, and 10 boys and 10 girls in both the voice and the control conditions. The inquiry learning environment dealt with kinematics, a required component in the science curriculum for the participants. When the study began kinematics had already been introduced, but the particular topic addressed by the inquiry learning environment, uniformly accelerated motion, had not yet been covered.

Materials

Learning environment

The participants worked with an inquiry learning environment that was created in SimQuest (de Jong et al. 2005). This environment, called motion, covers three topics from the physics domain of kinematics, namely ‘displacement and time’, ‘speed and velocity’, and ‘acceleration’. These topics are addressed in nine to ten assignments each, with a total of 29 assignments. All information is presented in Dutch.

When students select an assignment in the motion environment, a simulation interface opens up (see Fig. 1). Students can begin by reading the assignment description (B1). In the simulation (part A), students can manipulate input values of variables (A2). Pressing start (A3) displays the effect of their manipulation in real time in the graphs (A1), the output display (A4), and the animation of the car (A5). At any time, a student can stop experimenting with the simulation, select an answer (B2), and press the answer button (B4). After an answer is selected, a text box opens that gives feedback on its correctness. For incorrect answers the feedback also includes an explanation or hint.

Fig. 1
figure 1

Interface for the motion environment; control and voice conditions

Agent condition

In the agent condition, the inquiry learning environment displayed a female APA called Emma (see Fig. 2). Emma was created by linking SimQuest with Elckerlyc (van Welbergen et al. 2010), and Loquendo (http://www.loquendo.com). Emma’s voice used the standard female Dutch voice of the Loquendo program. Only Emma’s face and a small part of her upper body were visible. This is in line with the recommendation to crop an image of a person just below the shoulders to create a pleasing picture (Agrawala et al. 2011). Emma had the role of a fellow student, rather than a tutor or teacher. This is in line with Bandura’s (1997) suggestion of model-target similarity, and also because (non-agent) peer-students have been found to positively influence motivation and learning (Griffin and Griffin, 1998, in Kim and Baylor 2006). Emma’s visual appearance was modeled in such a way that she would be a believable peer learner, being of similar age as the participants and looking ‘cool’. In Emma’s design we were restricted by the available Elckerlyc models, however. Emma’s body movements and facial expressions were fully programmed in the Elckerlyc software. Although it was technically possible to dynamically change Emma’s behavior based on input from the students, we choose to ‘hard code’ her behavior to make sure Emma’s reactions were the same for all participants. Emma’s facial expressions (e.g., neutral, happy, or sad) and head movements were attuned to her messages. Her expressions, eye-blinks, and movements were neutral between messages. Emma automatically delivered an audible message 2 s after the student had opened an assignment, and 1 s after an assignment had been answered. Messages at each point could equally address task relevance and self-efficacy.

Fig. 2
figure 2

Interface for the motion environment; agent condition

The APA’s messages were based on Keller’s (2010) ARCS model, which describes a wide set of strategies for increasing perceptions of task relevance and self-efficacy beliefs. Keller indicates that task relevance can be affected by dealing with three concepts and their associated process questions. For goal orientation the central question is “How can the APA best meet the student’s needs?” Motive matching strategies revolve around the question “How can the APA link her messages to the student’s needs and personal interests?” For familiarity the leading question is “How can the APA tie the instructions to the students’ experiences?” The suggested strategies for all three task relevance processes were adopted in designing the motivational APA.

For self-efficacy the core issue is how the messages can boost students’ feelings that they have the ability to succeed. Keller’s three self-efficacy related concepts are: learning requirements, opportunities for success, and personal control. For learning requirements the central question is “How can the APA assist in building positive expectations for success?” Opportunities for success revolve around the question “How can the APA enhance the students’ beliefs in their competence?” For personal control the leading question is “How can the APA convince the students that their success is based upon their efforts and abilities?” Just as for task relevance, all suggested strategies for these processes were adopted in designing the motivational APA.

Two pilot studies were conducted prior to the experiment, in order to find out (1) whether motivational support was needed in the motion environment, and (2) whether the agent’s messages could increase students’ perceptions of task relevance and self-efficacy. The first pilot indicated that students probably would benefit from motivational support, as there were low initial appraisals for task relevance of the science topics, and students, especially girls, frequently expressed insecurity while working with the motion environment (e.g., “I don’t know how to do that”, and “I don’t think I can do that”). The motivational messages were subsequently tested in a second pilot. This pilot indicated that the messages improved students’ appraisals of task relevance and self-efficacy.

The embodied agent visually displayed a broad range of emotions and feelings. The agent’s messages were accordingly extended with words or sentences that matched these moods (compare Baylor and Kim 2004; Clore and Palmer 2009; Dehn and Van Mulken 2000). Examples of motivational sentences include “very annoying” and “I am curious”. Motivational words such as “boring”, “cool”, “love to”, “handy”, “odd”, and “cute” were loosely based on a validated list of 500 words from a motivational lexicon (Ortony et al. 1987).

An example of one of Emma’s comments concerning the assignment is: “Oh, this one looks difficult, let’s take some time to look at it”. Emma’s reaction to an answer given by the student could be directed to the (in)correctness of the answer (e.g., “That’s right!” or “Oh, that’s not correct; well, no worries, it’s just the first assignment”) or could be a reaction to the textual feedback given by SimQuest (e.g., “Oh, that’s useful to know” or “let’s read the feedback”).

Voice condition

This condition was the same as the agent condition, with the exception of the visibility of the agent. In the voice condition, only Emma’s audio messages were presented. There was no embodied agent.

Control condition

In this condition the APA’s image and motivational messages were absent. Students neither saw nor heard the agent. The control condition consisted of the basic version of the inquiry learning environment, which has feedback on correctness and hints provided, but no explicit motivational scaffolding.

Questionnaires and tests

A task relevance and self-efficacy (TRSE) questionnaire measured task relevance and self-efficacy before, during and after the learning task, using a paper-based questionnaire. The TRSE consisted of an example assignment similar to the style of the assignments in the inquiry learning environment, followed by the questions “How relevant are these assignments in your opinion?” (task relevance) and “How well do you think you will do on these assignments?” (self-efficacy). Answers on the task relevance scale run from “not relevant” on one end of the scale to “extremely relevant” on the other end. For self-efficacy the anchors were “extremely poorly–extremely well”. Students answered each question by selecting a data point on a 10-point Likert scale. To ensure that students were in a good position to appraise their initial level of motivation, they were asked for their ratings for initial task relevance and self-efficacy (TR before and SE before) only after having worked on the introductory assignment for the first topic in the inquiry learning environment. Likewise, the two administrations of the TRSE during training occurred after the introductory assignment for the second (TR during-1 and SE during-1) and third topics (TR during-2 and SE during-2). Immediately after completing training, the last TRSE (TR after and SE after) was completed.

A paper-based agent questionnaire (AgentQ) asked the students to appraise the agent. The questionnaire consisted of 11 questions regarding the qualities of the APA’s comments as a model for the students (e.g., “Emma said what I also thought.” and “I felt just like Emma did.”). Answers on these questions run from “Do not agree” on one end of the scale to “agree” on the other end. Students answered each question by selecting a data point on a 10-point Likert scale. The AgentQ was administered only in the agent condition. Reliability was good, with a Cronbach α of 0.89.

A paper-and-pencil pre-test and post-test were used to assess the students’ knowledge development. Both tests contained 27 multiple-choice items with four answer alternatives. Test items in the pre-test and post-test measured the same underlying constructs that were covered in the three topics of the inquiry learning environment. Only their textual presentation varied, as did the order of presentation. A score of one point was awarded for each correct answer on the test. An incorrect answer was awarded zero points. The maximum score for each test was 27 points. Reliability of the post-test was satisfactory, with a Cronbach α of 0.66.

Procedure

About 1 week before the training students took the knowledge pre-test, for which a maximum time of 20 min was given. During training, a maximum of three students at a time worked individually on computers with the inquiry learning environment. Students first received a 5 min introduction, in which they were told what the training session would involve and about how to use the inquiry learning environment. Students in the control condition received 35 min for their training (pilot studies had revealed that this afforded students ample time for engaging actively with all 29 assignments in the motion environment). Students in the voice and agent conditions received an additional 5 min to compensate for the time spent listening to the APA. Students in these conditions received Emma’s messages via headphones. The students completed the TRSE once near the beginning of the training, and twice during training. Directly after training was completed, students again completed the TRSE. In addition, students in the agent condition responded to the AgentQ. Finally, students took the post-test, for which the maximum completion time was 20 min.

Data analyses

Repeated measures ANOVAs with condition and gender as fixed factors were conducted to assess changes in motivation and knowledge over time. Two girls in the control condition and one girl in the voice condition did not complete the TRSE during-2 measure, so that these participants had missing values for TR during-2 and SE during-2. The ANOVA for agent ratings included only gender as a fixed factor. In all analyses, the significance level was set at an α of 0.05 (two-tailed). Trends (0.10 > p > 0.05) are reported for findings that were in the predicted direction. Cohen’s (1988) d-statistic is reported for effect size. These tend to be qualified as small for d = 0.2, medium for d = 0.5 and large for d = 0.8.

Results

Does time, condition, or gender affect motivation?

Table 1 shows that the scores for task relevance nearly all hovered slightly above the scale midpoint at all four measurement points. Repeated measures ANOVAs with condition and gender as fixed factors revealed only a trend for time for the first interval (before to during TR-1), F(1,55) = 3.12, p = 0.08, d = 0.14. There was a slight increase in students’ appraisals of task relevance after they had begun working in the inquiry learning environment. No effects of condition or gender on task relevance were found at any point in time.

Table 1 Means (standard deviations) for task relevance changes over time for boys and girls by condition

The scores for self-efficacy are shown in Table 2. Repeated measures ANOVAs with condition and gender as fixed factors revealed a significant effect for time, F(3,53) = 13.18, p = 0.00. Detailed analyses revealed the presence of a significant increase in self-efficacy over the first time interval (before to SE during-1), F(1,55) = 29.71, p = 0.00, d = 0.51. This level remained stable during and after training. Compared to their initial appraisals, the scores for self-efficacy after training were significantly higher (before to after), F(1,55) = 29.47, p = 0.00, d = 0.51.

Table 2 Means (standard deviation) for self-efficacy over time for boys and girls by condition

A significant difference in self-efficacy for gender favoring boys was found at each of the four measurement points. This gender difference was evident at the outset of the study (before), F(1,55) = 26.3, p = 0.00, d = 1.29 and was also found at the first measurement point during training (SE during-1), F(1,55) = 23.27, p = 0.00, d = 1.26. At the next measurement point (SE during-2), the gender difference was reduced, but still significant, F(1,52) = 8.05, p = 0.00, d = 0.75. At the final measurement point (after), the gender difference was still present, F(1,55) = 10.17, p = 0.00, d = 0.78. No main effects of condition were found at any point in time.

In addition, for self-efficacy there was a statistically significant interaction between time, condition, and gender, F(6,108) = 2.20, p = 0.048. Detailed analyses revealed that this interaction took place during training (SE during-1 to SE during-2), F(2,55) = 4.45, p = 0.016. The interaction is illustrated in Figs. 3 and 4. Figure 3 shows that the self-efficacy of boys decreased during training in the experimental conditions, but rose in the control condition. For girls, the pattern was the opposite. Figure 4 shows that girls’ self-efficacy rose in the experimental conditions across these measurement points and dropped in the control condition.

Fig. 3
figure 3

Self-efficacy appraisals of boys by condition over time

Fig. 4
figure 4

Self-efficacy appraisals of girls by condition over time

All in all, the findings indicate that when students began to work with the inquiry learning environment their appraisals of self-efficacy rose significantly. They maintained that higher level during and after training. The significant interaction between time, condition, and gender during training is in line with the prediction of a differential effect of treatment for girls and boys. No effects of time, condition or gender were found for task relevance appraisals.

Does gender affect appraisal of the agent as a model?

The quality of the agent as a model was rated overall at about the scale midpoint (M = 4.79, SD = 1.84). An ANOVA with gender as a fixed factor and model ratings for the agent as dependent variable showed a significant effect for gender, F(1,20) = 4.87, p = 0.04, d = 0.95. Girls rated the agent as a better model than boys (see Table 3). Girls’ ratings of the agent were about half a standard deviation above the scale midpoint (M = 5.57, SD = 1.23). For boys the mean was about half a standard deviation below that point (M = 3.94, SD = 2.08).

Table 3 Means (standard deviation) for agent appraisal for boys and girls

Does time, condition, or gender affect learning?

Table 4 shows the pretest and posttest knowledge scores. An ANOVA with condition and gender as fixed factors and the test scores as dependent variable showed a significant effect over time, F(1,55) = 60.23, p = 0.00, d = 1.10. There was no main effect of condition or gender, and there was also no interaction effect. In all conditions, boys and girls made significant progress.

Table 4 Means (standard deviation) for pre-test scores, post-test scores and knowledge gain for boys and girls by condition

Discussion and conclusion

At the outset, the issue was raised whether motivational support could be a meaningful supplementary scaffold to the motion environment, serving to increase student motivation and, in its wake, learning. In addition, we directed this support especially to female students because they tend to be under-represented in science classrooms, perhaps as a result of their lower appraisals for task relevance and self-efficacy for science (Ackerman et al. 2013; Ceci et al. 2009; Yeung et al. 2010). This question now leads us back to the rationale for the development of science inquiry learning environments.

According to Edelson et al. (1999) one reason for developing such environments was that the students’ tasks needed to be made more meaningful. Inquiry learning environments could make abstract concepts more concrete, and by providing a valuable context, could position students better to acquire, clarify, and apply an understanding of scientific concepts. The other reason was that they should afford students more opportunities for self-directed learning. More specifically, they should enable students to engage in question-driven activities to personally experience and appreciate scientific inquiry.

The greater part of research on science inquiry learning environments has focused on their contributions to cognition and learning. Relatively little attention has gone to their impact on motivation. This is surprising, because the opportunities for self-regulation that these environments offer are also likely to be affected by, and affect, student motivation.

One might expect the opportunity for self-regulated action afforded by inquiry learning environments to be attractive for students who like setting their own learning goals and initiating and sustaining the actions needed to achieve them. These students may feel tempted to pose their own questions, to engage in solving realistic problems, to conduct experiments, and to reflect on their outcomes. Other students may be daunted by the very same prospect. They may perceive the possibilities for self-regulated action as threatening rather than inviting, and concern themselves with goals of maintaining well-being rather than with trying to achieve learning goals (see Boekaerts and Corno 2005). Students who are uncertain about their domain-specific competencies may be especially vulnerable to this downside from the affordances for self-regulated learning offered in inquiry learning environments.

We thought that the female participants in the experiment would mainly fall in the latter category and, therefore, would benefit more than the males from motivational scaffolding. In line with other research on self-efficacy in science (e.g., Catsambis 1995; Lau and Roeser 2002; Mattern and Schau 2002; Osborne et al. 2003; Yeung et al. 2010), the girls in our study were found to start with significantly lower self-efficacy than the boys. However, the data further showed that the absolute level of these appraisals left room for improvement for both boys and girls. In short, the conditions seemed favorable for an agent to affect student motivation, and perhaps to benefit the girls somewhat more than the boys.

The beneficial effects of the motivational support provided by the APA in the science inquiry learning environment presumably depended in part on the agent’s qualities as a model. In human–computer-interaction studies, ratings of these qualities are often considered a litmus test for the design of an animated person. In educational studies, such ratings mainly revolve around the question whether the APA addresses the students’ feelings and thoughts well enough to be effective. The findings in our study indicated that the girls’ ratings for the agent were well above the mean scale value. This suggests that there was a positive match that could have made the agent an effective motivator. On the other hand, boys rated the agent significantly lower than girls and their scores were also considerably below the mean scale value. Although the agent questionnaire concentrated on the APA’s internal qualities, the possibility cannot be ruled out that the boys’ ratings also depended on their valuation of the agent’s external properties. This is an issue for future research.

A factor that may have negatively affected the impact of the APA on girls as well as boys is the usage of computer-generated messages. The APA messages consisted of a combination of pre-programmed and generated sentences. Only the former were expressed vocally with emotion; newly-constructed sentences were more monotonous-sounding. The technical limitation of the APA voice may have had a negative effect on the APA appraisals and may have diminished her impact on the students (compare Dehn and Van Mulken (2000)).

The data revealed that all students felt substantially more confident about their ability to handle similar assignments after training than before. This finding is important because studies repeatedly indicate that self-efficacy has a significant influence on (later) achievement in science (e.g., Areepattamannil et al. 2011; Bandura 1997, 2012; Britner 2008; Kaya and Rice 2009; Lavonen and Laaksonen 2009). In our study, girls increased their self-appraisals more than boys, but their ratings still remained significantly lower than those of the boys.

The hypothesis that girls would especially benefit from the presence of a motivational APA was supported by the finding of a significant interaction for time, condition and gender for self-efficacy during training. More specifically, this outcome pointed to a differential effect of the experimental manipulation when students moved from the second to the final topic in the inquiry learning environment. It was at this moment that girls increased their self-efficacy appraisals in both the agent and voice conditions and decreased them in the control condition, whereas for boys exactly the opposite pattern was obtained.

However, all in all the experiment indicated that the control condition did not do much worse or better than the experimental conditions on the various measures of motivation recorded in the study. In addition, there was only weak support for the hypothesis that girls would especially benefit from the presence of a motivational APA in the learning environment. Apart from the significant interaction during training, girls in the experimental conditions did not improve their perceptions of task relevance or self-efficacy more than boys.

The motivational APAs in the studies by Arroyo et al. (2011) and Baylor et al. (2004) did not affect learning. In our study, we likewise found no difference in learning gains for conditions. Two linked factors can perhaps explain the absence of the anticipated positive effect of the APA on learning. One factor is the finding that students in the control condition were able to make significant progress on their own. This condition is indicative of the baseline level of change that one should expect the inquiry learning environment to achieve. Although the test scores left room for further improvement, such additional incremental effects may be difficult to come by. Another factor is that motivation affects learning only indirectly. It is a mediator that influences facets of task engagement such as goal setting, task choice, effort, and persistence (e.g., Järvelä et al. 2008; Lau and Roeser 2002; Vollmeyer and Rheinberg 2006). In turn, these mediators influence learning. The absence of a learning effect is, therefore, perhaps best seen as a signal that students were not sufficiently motivated by the APA. Even the boys’ existing significantly higher level of self-efficacy than the girls was not sufficient to prompt them to a level of task engagement that yielded greater learning gains. Perhaps (in line with the expectancy-value model) both self-efficacy and task relevance need to be high to prompt this type of productive engagement.

To conclude, in designing a motivational APA that can influence student motivation and learning in science inquiry learning environments, considerable attention is required to the agent’s design. That this is no easy matter was illustrated in the present study as well as earlier when we reviewed the empirical studies on motivational APAs. Another signal of the complexity of such a design project comes from two recent meta-analyses on APAs. Both reviews mention the fact that there are currently not enough studies on APAs to afford a meta-analysis of the agent’s effects on student motivation. In addition, these reviews arrive at different conclusions about the effects of APAs on learning. Schroeder et al. (2013) reported a small, but positive and significant effect of using APAs on learning. In contrast, Heidig and Clarebout (2011) found that “existing studies on pedagogical agents … draw a discouraging picture” (p. 30) and concluded that the issue of their effectiveness is still largely an open question in which “most prominently the design of the pedagogical agent” (p. 52) has to be taken into account. The different conclusions drawn in these meta-analyses on the educational benefits of APAs derive to some degree from the complexities of designing the agent’s multi-faceted features. Alternatively, one could also say that research on motivational APAs is promising, but still faces the challenge of designing agents that well address the intricacies of the sources and nature of motivational constructs such as task relevance and self-efficacy.