Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparing artificial intelligence and human coaching goal attainment efficacy

  • Nicky Terblanche ,

    Roles Conceptualization, Investigation, Software, Writing – original draft

    nickyt@usb.ac.za,

    Affiliation University of Stellenbosch Business School, Cape Town, South Africa

  • Joanna Molyn,

    Roles Conceptualization, Data curation, Project administration, Writing – review & editing

    Affiliation University of Oxford Brookes, Oxford, United Kingdom

  • Erik de Haan,

    Roles Conceptualization, Writing – review & editing

    Affiliations Ashridge Centre for Coaching, Hult International Business School, Berkhamsted (Herts.), United Kingdom, VU University Amsterdam, Amsterdam, The Netherlands

  • Viktor O. Nilsson

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliation Ashridge Centre for Coaching, Hult International Business School, Berkhamsted (Herts.), United Kingdom

Abstract

The history of artificial intelligence (AI) is filled with hype and inflated expectations. Notwithstanding, AI is finding its way into numerous aspects of humanity including the fast-growing helping profession of coaching. Coaching has been shown to be efficacious in a variety of human development facets. The application of AI in a narrow, specific area of coaching has also been shown to work. What remains uncertain, is how the two compare. In this paper we compare two equivalent longitudinal randomised control trial studies that measured the increase in clients’ goal attainment as a result of having received coaching over a 10-month period. The first study involved human coaches and the replication study used an AI chatbot coach. In both studies, human coaches and the AI coach were significantly more effective in helping clients reach their goals compared to the two control groups. Surprisingly however, the AI coach was as effective as human coaches at the end of the trials. We interpret this result using AI and goal theory and present three significant implications: AI coaching could be scaled to democratize coaching; AI coaching could grow the demand for human coaching; and AI could replace human coaches who use simplistic, model-based coaching approaches. At present, AI’s lack of empathy and emotional intelligence make human coaches irreplicable. However, understanding the efficacy of AI coaching relative to human coaching may promote the focused use of AI, to the significant benefit of society.

Introduction

Since its inception in the 1950s, artificial intelligence (AI) has seen several periods of growth and decline, casting doubt on its actual versus claimed efficacy [1]. Lately, renewed interest in AI has led to numerous novel applications of this technology, including in healthcare and helping professions such as psychology and coaching [24]. In this paper, coaching is defined as a one-on-one structured conversation between a coach and client with the aim of facilitating sustainable change for the individual and potentially other stakeholders [5]. Coaching is a late entrant to the application of AI, and AI’s role and efficacy in coaching remain largely under-researched.

Coaching is an important helping profession. It is a fast-growing multi-billion dollar per year industry [6] and has grown substantially both in practice and research in the last decade [7]. Numerous coaching meta-studies have made a clear case for its efficacy [813]. There is a strong link between successful coaching outcomes and the relationship and bond between the coach and client with convincing evidence that the coach-client relationship is the most significant factor in coaching success [1416]. Nevertheless, the current limitations of AI, especially relating to true human intelligence and emotions [17], cast doubt over the ability of an AI coach to currently compete with a human coach.

Recent studies on the application of AI in psychology, however, have suggested that AI could be effective in certain domains of promoting human wellbeing. Fulmer et al. [18], for example, used an AI agent based on cognitive behavioral therapy (CBT) to reduce self-identified symptoms of depression and anxiety in college students. They concluded that AI could serve as a cost-effective and accessible therapeutic agent. Greer et al. [19] found that young adult cancer patients had reduced anxiety compared to a control group after using a positive psychology-based AI coach for four weeks. These findings suggest that, while AI lacks true human intelligence and emotions, positive outcomes are possible even in practices that have traditionally relied on a strong human connection. This might potentially also be the case for coaching.

One of the primary focal areas of coaching and what sets it apart from other helping professions, is assisting clients with goal attainment [20, 21]. Understanding the efficacy of AI coaching compared to human coaching in the domain of goal attainment therefore seems like a reasonable starting point for AI coaching research. This leads us to ask the following research question: In a similar setting, how does AI coaching compare to human coaching efficacy in relation to client goal attainment?

In this paper, we investigate this question by presenting a comparison of the two studies on goal attainment coaching: the first involving human coaches and the second an AI coach. We interpret the results in terms of the current state of AI and goal theory. We also discuss the way these results may pave the way for aspects of coaching to be made more widely available and the implication for coaches and the coaching industry. Given the continued growth of coaching as a helping profession and its proven efficacy, understanding how AI could play a role in scaling and democratizing this service is an important research area.

Current capabilities of artificial intelligence

AI has seen several false starts mostly because of exaggerated claims of progress and ability that inevitably led to disappointment. An example is Marvin Minsky, the father of AI who back in 1967 stated that “Within a generation … the problem of creating artificial intelligence will substantially be solved” [22]. AI has experienced a few “winters” where these types of exaggerations led to withdrawal of funding and the collapse of interest in AI research and development [1]. However, the recent resurgence in AI interest appears to be more sustainable as AI is focused on specific specialist areas in line with current AI capabilities, and shows promise in areas such as decision-making processes [23].

AI is defined as “the broad collection of technologies, such as computer vision, language processing, robotics, robotic process automation and virtual agents that are able to mimic cognitive human functions” [24 p4]. However, in order to understand AI’s realistic capabilities, it is important to distinguish between three types of AI: (i) Artificial narrow intelligence (ANI) refers to systems that can perform a specific task in a narrow context, such as a self-driving car; (ii) Artificial general intelligence (AGI) refers to systems that have intelligence similar to human intelligence; and (iii) Artificial super intelligence (ASI) refers to systems that can outperform human intelligence [2527]. AGI and ASI do not currently exist and by acknowledging this fact, it creates a more realistic picture of what AI and specifically ANI can accomplish [28].

For the foreseeable future AI entities will remain unconscious machines that can at best support humans in complex, specific tasks [17]. This implies that ANI systems will be highly specialised and skilled in specific tasks and may even outperform humans in these narrow focus areas [29]. Perhaps instead of waiting for true AGI, multiple narrow AI applications could be interconnected to collaboratively perform tasks in a synergistic manner, possibly with utility beyond what a singly ANI application could do [17]. AI is not yet poised to completely replace humans; however, the improved ability of AI and its increased use in the helping professions suggest we need to investigate more closely the relationship between AI and human interaction. In the highly human-centric context of coaching, the human-AI relationship becomes critical.

Human-AI interactions and relationships in the context of coaching

An indication of the growing importance of human-AI interaction is the emergence of studies augmenting current human-computer interaction (HCI) theory dedicated to human-centered AI (HCAI) [30] and human-AI interaction (HAII) [31]. The focus of AI development seems to be shifting away from pure scientific and academic exploration to useful applications that also consider human factors [30]. Human factors include the creation of AI systems that have social benefits and consider the ethical implications of AI. It also includes the consideration of the role of humans in the AI ecosystem and awareness of the need for a more human-centred approach [32, 33]. The advancement of AI, combined with the focus on placing humans at the centre, have led to new development of AI roles, ranging from being purely assistive to helping with team collaboration [34, 35]. The fast-growing area of AI-assisted decision-making, for example, requires clear boundaries on human versus AI authority and accountability. This is observed in the healthcare industry context, where decisions on patient care and diagnosis can have life or death consequences. As healthcare professionals team up with AI, there is a real danger that the “third wheel” effect (additional, potentially redundant or confusing opinions) may decrease combined human and AI effectiveness [23].

The present study is not focused on augmented human plus AI interaction since the AI coach used operates autonomously from a human coach. However, the AI coach’s sole task is to interact with (coach) a human client. Therefore, the interaction and especially the relationship between the AI and human remain important. There are several suggested ways to create AI coaches that focus on strong human-AI relationships.

Of primary concern is the need for the AI to have social ability, demonstrate credibility and context awareness and be proactive in assisting clients [36]. It is also important that the AI coach strives to embody the aspects that make human coaching effective, including demonstrating trust, empathy, transparency, predictability, reliability, ability, benevolence, and integrity. To create a strong AI-human relationship these aspects can be operationalized as suggested in Table 1 (see Terblanche [4] for a detailed discussion).

thumbnail
Table 1. AI design practices to support strong coach-coachee relationships.

https://doi.org/10.1371/journal.pone.0270255.t001

Potential benefits and ethical challenges in AI coaching

The application of AI in the helping professions and in coaching specifically holds numerous potential benefits. In the related field of psychology AI offers new modes of treatment, the ability to reach currently excluded populations, improve patient response and free up limited resources such as highly trained psychologists [51]. These same advantages apply to coaching.

The benefits of coaching are well researched and several meta-studies have shown that coaching can help people with various aspects including: performance and skills; wellbeing; coping; work attitudes; goal-directed self-regulation; improved work/life balance; psychological and social competencies; self-awareness and assertiveness, increased confidence; developing relationships, networks and interpersonal skills; adapting to change more effectively; helping to set and achieve goals; role clarity; and changing behaviors [9, 10, 13]. However, not everyone has access to a coach, especially in less affluent societies. In Africa, for example, the average cost of an organisational coach is approximately 100 USD per session, which puts it out of reach of many [52]. The problem is not only cost. There is a dire shortage of skilled coaches in many parts of the world. Of the more than 40,000 coaches registered with the International Coaching Federation, fewer than 2,000 are in Africa [53]. It seems that currently most of humanity is excluded from the benefits of professional coaching, even though there are calls for coaching to be viewed as a social process that could benefit currently marginalised groups [54]. AI potentially holds the key to expanding the reach of coaching. The ability of AI to scale and provide basic coaching services at a vastly reduced cost could overcome these current limitations, possibly democratizing coaching to the significant benefit of society.

The use of AI in coaching raises ethical concerns. These include prevention of harm, lack of guidance on developing ethical AI, respect and protection of client autonomy, transparency in the use of algorithms, bias, and data ownership [4, 51]. For AI coaching to be widely accepted and trusted, these ethical challenges must be addressed by stakeholders [36].

Goal theory

An important theoretical foundation of this paper is goal theory as applied in coaching. Goal theory is well established and widely used due to a history of empirical research and application. It is in essence an approach explaining the need to establish goals as an intrinsic motivation where a relationship exists between goal difficulty, level of performance, and effort involved [55]. Goal theory is supported by five principles regarding goal setting: clarity (specific and clear); challenge (sufficiently difficult); commitment (buy-in from onset); feedback (regular stock-taking on progress); and complexity (not too complex) [56]. Goals are “internal representations of desired states or outcomes” [57 p388]. Goal setting and attainment have been shown to have a positive effect on workplace performance [56]. Goal attainment has also been linked to positive emotions and increased wellbeing [5860].

Various factors influence peoples’ goal attainment success. A study by Klein and Fishbach [61] showed that disrupting the expectations of goal attainment may lead to reduced satisfaction and lower goal evaluation, even though the goal is eventually achieved. Other factors that could influence goal attainment include the experience of power whereby people who feel less powerful are less motivated to reach their goals when the goal seems far away [62]. People from cultures where personal honor is important may also delay their goal pursuit if they receive a threat to their moral reputation, such as being called a liar [63].

Certain actions can enhance goal attainment, including the writing down of goals, measuring goals and having specific time frames attached to them, and making a public commitment to someone regarding the goal [55]. There are various types of goals, for example, proximal and distal goals [64].

Goal theory is used extensively in coaching as an underlying mechanism to facilitate self-regulation [64]. During coaching, individuals with the help of their coach set goals, develop and execute action plans, monitor progress and change either goals or action plans based on feedback and progress [65]. Coaching in particular provides the monitoring function that helps to translate goals into actions, which in turn leads to progress [66].

Several empirical studies have shown that coaching is effective in improving goal attainment. In a randomized control trial (RCT) study, Grant et al. [67] found that four coaching sessions over a 10-week period led the intervention group to a significantly higher level of goal attainment compared to the control group. Zimmerman and Antoni [68] analyzed 33 coaching dyads using longitudinal multilevel analyses and found that clients experienced increased goal attainment. Losch et al. [69] compared individual coaching, self-coaching and group training and found that individual coaching was effective and superior in helping leaders achieve their goals.

As goal theory is intrinsic to coaching and since coaching has been shown to improve goal attainment, it emerged as an appropriate theory to investigate an impact of AI coaching on goal attainment in the present study.

Coach maturity

With the coaching industry growing rapidly, a diverse range of people are attracted to become coaches for reasons ranging from the promise of increased freedom, balanced lifestyle, self-control and a reprieve from corporate politics, bureaucracy and pressure [70]. Coaching is an unregulated industry, which implies that coaches enter the profession with various levels of training and experience [71], ranging from no training at all to post-graduate degrees [72]. The result is that coaches practice at different levels of coach maturity [73]. The notion of coach maturity is an important consideration given that AI has taken over the jobs of some people [74], suggesting that humans coaches who operate at a low level of complexity may be rivalled by AI coaching.

Megginson and Clutterbuck [73] distinguished among four levels of coach maturity. At the lowest level coaches follow a “models-based” approach where they are typically more interested in following a set, mechanistic process rather than exploring the complexities of the client’s world. They are “doing coaching to the client”. This type of coaching is typical of novice coaches who rely on the coaching skills and techniques they had been trained in initially [72]. On the second level, “process-based” coaches follow a slightly more flexible approach using an expanded but still limited set of tools and techniques. They are “doing coaching with the client”. On the third level, “philosophy-based” coaches apply a broader mind-set to the client’s situation and practice reflection before and after coaching sessions. The top level, “systemic eclectic” is acquired through much experience and allows a coach to exhibit a sensitive, intelligent approach to the client situation and utilize the most appropriate approach give the context. They are part of the system in which the coaching occurs. This coach maturity model was summarized by Drake [75 p143]: “…as novices they learn the rules, as intermediates they break the rules, as masters they change the rules and as artisans they transcend the rules”.

This coach maturity model is testimony to how humans can integrate knowledge and apply learning across domains, allowing navigation of complex situations. While AI is currently incapable of this, the fact that ANI can perform specific tasks on a level of human competency and beyond [29] suggests that the lowest level of coach maturity (models-based) is potentially within the ability of a well-designed narrow AI system.

Methodology

The two studies we compare were both longitudinal RCT designs. The studies were designed with the CONSORT guidelines for RCT research in mind and these guidelines were adhered to as far as possible [76]. Both studies were conducted over ten months with different groups of participants. Study 1 ran from October 2017 to July 2018 and consisted of a control group and a human coach group where participants received coaching from a human coach. Study 2 ran from November 2019 to August 2020 and consisted of a control group and an intervention group where participants received coaching from an AI chatbot coach. The second study was conducted after the first one because the AI coach was only created in 2019 after the completion of the first study. The same data collection instruments were used in both studies.

The research was approved by the ethics committee of a London-based University, project reference UREC/19.1.5.6. Written informed consent was obtained from all participants in both studies as per the requirements of the ethics approval.

Participants

Participants in both studies were recruited via email from a business school in the United Kingdom. Their fields of study included business management, economics, marketing, tourism, events management and logistics. Participants in Study 1 were randomly allocated into two different groups: Control 1 (n = 105) and Human coaching (105). For Study 2, participants were allocated into Control 2 (n = 134) and AI coaching (n = 134) groups. In total over the two studies, 327 participants successfully submitted data over all eight time-points which were used for the data analysis. See Table 2 for group numbers, demographic distribution, and mean scores of the dependent variable used in this study.

thumbnail
Table 2. Goal attainment means of the four participant groups across the eight time-points.

https://doi.org/10.1371/journal.pone.0270255.t002

Capturing the placebo effect in research requires subjects to not be aware whether they receive treatment or not [77]. In these two studies we were unable to investigate the placebo effect as coached students were given access to a human coach or the AI coach (Vici) almost immediately after the start of the experiment. The students in the control group were also aware of not having access to human or AI coaching. To address outcomes expectancy [78], the control groups received a fact sheet that provided information about goal attainment, psychological wellbeing, resilience and perceived stress at the start of the trails. They were also asked to think of and specify goals they wanted to achieve over the next ten months. In RCT studies there is a danger of a nocebo effect where participants in the control group have negative outcome expectations because they are aware of not receiving the intervention. In the present study we believe this was managed based on the research of Colloca et al. [79], who had found that a higher number of exposures to trial conditioning correlated to longer duration of nocebo responses. In our study, control group participants were only “conditioned” (made aware of not receiving coaching and given information sheets) at the start of the trials. The relatively long duration of the trials (10 months) likely also helped to diminish the nocebo effect.

Procedure

Participants conducted a survey over eight time-points using an online survey platform. The first survey was a baseline survey before the participants had been allocated to any of the conditions. The baseline survey collected demographic data, dependent variables and the participants were asked to specify two goals that they aimed to work on over the phase of the project. These goals were supposed to be something challenging that was either new or something that has been difficult for them to achieve in the past. After the baseline survey had been completed, the participants completed a monthly survey for six months where the participants were asked to rate the success and the difficulty of their two goals. The monthly survey was distributed for Control group 1, Control group 2, and the AI coaching group. The participants of the Human coaching group received their survey after they had conducted their monthly coaching session. The eighth survey was distributed to all the participants three months after the last monthly survey.

Each student had unique login details to their survey, allowing the participants to be reminded of their goals and allowing the administrators of the survey to send out reminders to the participants when submissions were missing.

Human coaching

Students were coached by professional coaches trained and qualified in a relational model of coaching by a UK-based institution. Their qualifications included at least the Ashridge Executive Coach Accreditation and the European Individual Award (EIA) by the European Mentoring & Coaching Council at Senior Practitioner level. The 105 coaches were on average 50 years old and had at least three years of business coaching experience. Coaches and students were matched randomly with each coach assigned to one student only. Students had six one-hour coaching sessions over a period of six months, one session per month. All sessions were conducted via Skype. There was no prescription on the topic or content of the coaching sessions. Coaches and participants had complete freedom to decide how they wanted to use the session, which topics and goals they wanted to set and pursue and which homework tasks between sessions needed to be completed. All participants had to participate in all six coaching sessions to remain part of the study.

AI coaching

Applying the principle of narrow AI to coaching suggests the creation of a form of artificial narrow intelligence (ANI) that can perform one specific coaching task, rather than an attempt to create a machine replica of a human coach. The AI coach used in this study, Coach Vici was based on expert system (ANI) principles using chatbot technology. The sole purpose of the chatbot was to help participants with goal attainment. Expert systems are considered a form of narrow AI and are described as complex software programs based on specialized knowledge, able to provide acceptable solutions to individual problems in a narrow topic area [80, 81]. Chatbots in turn are computer programs that interact with users via natural language either through text, voice, or both [82].

Vici is a custom-developed text-based chatbot deployed on the Telegram instant messaging platform. The chatbot was developed using the Designing AI Coach (DAIC) framework that recommends merging aspects of strong human coaching relationship with chatbot design best practices and using proven, evidence-based coaching theories as foundation [4]. In line with these recommendations, Vici was designed to facilitate goal attainment according to goal theory [55]. Vici had two types of text-based conversations with users. In the first type of conversation, Vici helped users to specify realistic goals by questioning them on the importance, feasibility and impact of their stated goals. Vici then helped users to commit to achievable actions that would help them reach their goals. In the second type of conversation, users would check in with Vici to report on their goal and action progress, reflect on obstacles that prevented them from progressing and changing their actions plans if necessary. These conversations assisted users to monitor the progress of their goals and actions. Vici also helped users to distinguish between proximal (< 6 months) and distal (> 6 months) goals [83]. Vici was available 24/7 to the experimental group and they could use it as often as they wanted, but at least once a month. A detailed analysis of the AI coach usages is presented in the Results. Fig 1 shows examples of interactions with Coach Vici.

Measures

Goal attainment

Grant et al. [67] developed a goal attainment measure which was adapted for the purpose of this study. The goal attainment measure contained self-reported scores of how successful the participants perceived they had been in achieving their goals and how difficult they perceived their goals. The successful score was measured on 11 points, where each point represented 0%, 10%, 20%, etc. up to 100%. The difficulty score was measured using a seven-point Likert scale ranging from ‘Very easy’ to ‘Very difficult’.

The overall goal attainment score was then calculated by multiplying the success and the difficulty score for each goal separately and dividing the scores for the two different goals to create an average goal attainment score.

Results

To assess the implications of coaching on goal attainment, a 4x8 Mixed Factorial ANOVA was conducted using the four different groups (Control 1, Control 2, Human coach group and AI coach group) as grouping variable and their eight self-reported measures of goal attainment as dependent variables. A power-analysis using G*Power 3.1.9.7 [84] was conducted to determine the effect size required to identify a statistically significant interaction between four groups over eight time-points. A Mixed Factorial ANOVA with a within-between interaction of 327 participants, a power of 0.95 and alpha level of 0.05 indicated that effect of the coaching intervention would have to be above ηp2 = .014 to identify a significant interaction.

The Mixed Factorial ANOVA indicated a statistically significant interaction of group and time, f (13.18, 1296.36) = 2.35, p = .004, ηρ2 = .023. To break down this interaction, the development of goal attainment was first analysed within each group using separate Repeated Measures ANOVA over the eight time-points. The Repeated Measures ANOVA indicated that all groups had a significant development of Goal Attainment over time. Both control groups behaved similar, and the trend of the development remained similar over the two time-points. The first control group reported and effect size of ηρ2 = .16, p < .001 and the second control group showed an effect size of ηρ2 = .11, p < .001. Both control groups significantly developed their goal attainment from baseline to time-point 4 (p = .005 and p = .003 respectively), but kept it at a stable level for the remaining time-points.

Furthermore, the effects size over the eight time-points of the two experiment groups were remarkably higher compared to those of the control groups (Human coach group = ηρ2 = .265, p < .001 and AI coach group = ηρ2 = .269, p < .001). The first significant development of goal attainment was identified at time-point 4 for both experiment groups (p = .01 and p = .004 respectively), but the development kept significantly increasing over time-point 7 and time-point 8 for both experiment groups.

Bonferroni corrected multiple comparisons were used to test the difference between each of the four groups at each time-point. The tests indicated that a significant difference between the Human coach and Control group 2 could be identified at time-point 5 where the human experiment group had a significant higher goal attainment score (p = .047) than Control group 2. This effect increased with time and both the Human coach group, and the AI coach group showed significant effects compared to those of the control groups at time-point 7 (Human coach group–Control 1, p = .002; Human coach group–Control 2, p = .008; AI coach group–Control 1, = .022; AI coach group–Control 2 = .048). However, the goal attainment of the two experiment groups never significantly differed between each other throughout the experiment.

The effect size of ηρ2 = .023 which was higher than the critical effect size (> ηp2 = .014) according to the power analysis indicated that the sample size for this study was sufficient. These findings further indicate that the participants successfully increased their goal attainment over the time of the study. As shown in Fig 2, receiving coaching had a positive impact on the development of goal attainment for the participants, but both formats of coaching appeared to have had very similar effect.

We also analyzed the usage of the AI coach in terms of usage frequency of the chatbot to identify any potential within-group differences in Study 2. We were able to identify a significant difference in development of goal attainment when splitting the frequency of usage into two equal groups based on their median usage (6 AI coaching sessions, t (73) = -2.24, p = 0.028, d = 0.52. The lower usage group had an average increase on goal attainment of 17.62 (sd = 32.50) compared to 37.62 (sd = 34.16) in the higher usage group. This suggests that more frequent use of the AI coach led to higher goal attainment.

To understand the nature of goals across the two studies and four groups, two of the authors independently analysed the first goal in both studies to assess the theme of the goal, the type of outcome (concrete or vague goal) and whether the goal was proximal (<6 months) or distal (>6 months). The inter-rater reliability of the analysis indicated a very high similarity between the reviewers on all three categories, with Cohen’s kappa of κ = .95, p < .001 for the theme of the goal, κ = .94 for the outcome and κ = .91, p < .001 for the timeline of the goal.

The themes of the goals that were identified related to the participants’ studies (38%), self-development (22%), career (18%), health and wellbeing (16%), other (2%), finances (2%) family (1%) and property or car (1%). Most of the goals were concrete and measurable (60%), for example, “To gain overall mark of 75% in study year 1” and 58% of the goals were long-term focused (>6 months). Furthermore, the proportion of themes, type or proximal within the four different groups did not significantly differ among each other.

The type of outcome and the proximity of the goals were added as covariates into the model, but neither had a significant impact on the development of goal attainment ((outcome, f (7, 2037) = 1.46, p > .05, ηp2 = .005) and proximity, f (7, 2037) = 2.06, p > .05, ηp2 = .007)). The themes were analysed separately due to the large variety of the themes, but no significant differences between any of the themes were found on the development of goal attainment, f (7, 334) = .80, p > .05, ηp2 = .017). These findings indicate that the individual differences in the goals among the participants did not impact the development of goal attainment over time in this study.

Discussion

This paper investigates the performance of an AI chatbot coach relative to human coaches in terms of client goal attainment by comparing two longitudinal RCT studies, the second being a replication study of the first. In both studies the experimental groups who had received either human coaching (Study 1) or AI coaching (Study 2) had significantly higher goal attainment than the control groups. A surprising result is that the AI coach rivalled the human coaches in participant goal attainment with a similar outcome at the end of the study after ten months (Human coach group = ηρ2 = .265, p < .001 and AI coach group = ηρ2 = .269, p < .001). Using goal theory, we attempt to explain this result and we discuss three important implications: (i) the possibility of democratizing coaching; (ii) the way AI coaching could enlarge the need for human coaches; and (iii) a warning to coaches to enhance their coaching praxis.

Goal theory states that there is a higher level of goal attainment if goals are clear, there is buy-in from the onset, regular feedback is provided on progress, and the goals are sufficiently challenging and not too complex. Practically this translates to writing down goals, having specific time frames associated with a goal, measuring progress and making a commitment to someone about completing the goal [55]. Most coaching approaches have at their core the notion of goal attainment. In fact, this is what sets coaching apart from other helping professions [64] and explains why human coaches were able to help participants increase their goal attainment. Goal theory and how it is typically practised by human coaches was used to create the AI coach, Vici. The AI coach helped participants write down their goals by typing it into the application, asked questions to test for the feasibility and level of realism of the goals, and went even further by helping participants create an action plan to reach their goals. Human coaches would be able to engage in a more complex and nuanced discussion about goals and one would therefore expect human coaches to outperform the AI coach who employs a scripted conversation. Although the AI coach lacked nuanced intelligence, it had an advantage over human coaches due to the rigorous and consistent way it executed goal theory. The human coach could decide which aspects of goal theory they implemented in each session and in fact, some of the coaches may have not been well versed in goal theory. For example, in the human coaching sessions the coaches may occasionally have forgotten to ask about goal progress or potentially did not keep an explicit record of goal progress as it is up to each coach to decide how to manage the coaching intervention. The AI coach, however, was programmed to always enquire about goal progress and keep a record for reference to share with the participant. Furthermore, human coaches may have asked participants to verbalize their goals instead of writing them down, whereas the AI coach required the participants to type their goals into the application as writing down a goal has been shown to increase goal attainment [55]. It seems therefore that the rigor and mechanistic execution of goal theory by the AI coach and its inability to deviate from a set process (which could in fact detract human coaches) compensated for its lack of human intelligence.

The results (Fig 2) show that the AI coach trailed the human coaches slightly throughout the eight measurements up to the last time-point (T8). Measures were taken monthly between T1 and T7 with a final follow-up measure (T8) after three months. Between T7 and T8, human coach participants did not receive any more coaching, whereas the AI coaching participants were free to keep using the AI coach. This could explain why goal attainment of human coaching participants declined between T7 and T8, but kept increasing for the AI coach group to ultimately equal the human coach group. The convenience and constant availability of the AI coach probably assisted in its performance relative to human coaches.

An important predictor of coaching success is the readiness of the coachee [85]. Due to the randomised nature of this study, we can assume that in both studies participants were equally open and ready for coaching. Being perceptive to coaching relates to a person’s state at a particular time of day such as their energy levels, mental alertness, and general physical state. In the human coaching group, sessions were scheduled in advance and because two people are involved in the logistics, one can assume that at times appointments were honoured despite the coach or participant not being in an optimal state for the engagement, potentially negatively affecting the efficacy of that session. In the AI coaching group, the participants had complete freedom to decide when to have a conversation with the chatbot, which may have contributed to a more optimal engagement. While we did not explicitly measure these variables, we suggest that this convenience factor may have helped the AI coach to perform well compared to human coaches. Additionally, the AI coach was available 24/7 and participants could use it as often as they chose.

The results indicate that participants who used the AI coach more often had higher goal attainment. Human coaching is expensive and therefore participants in Study 1 only had one session per month. There was no extra cost associated with additional AI coaching sessions. This underscores two of the main advantages of AI coaching–its scalability and cost effectiveness compared to human coaching. The superior availability and use of the AI coach compared to the human coaches could therefore also explain why the AI coaching group performed so well relative to the human coaching group.

The implications of these results are three-fold. Firstly and most importantly, this presents the possibility of democratization of coaching. A number of coaching efficacy meta-studies have shown coaching to be effective in helping people develop, grow and achieve their goals. Coaching is however reserved for a select few due to its cost and the availability of coaches, especially on low-income geographies such as Africa. Even in organizations, individual coaching is usually reserved for the managers and senior leaders. The results from this comparative study suggest that AI coaching, when implemented to have a specific focus in line with the current capabilities of narrow AI, is an affordable and scalable alternative to certain aspects of human coaching. The benefits of coaching could therefore be made available to many more people than is currently the case.

The second implication relates to the coaching industry. Many people are concerned that AI threatens their job security [86]. Coaches may therefore rightly be concerned that AI coaches, such as Vici, pose a threat to their livelihood. The opposite may in fact be true. If AI can help democratize coaching, more first-time users of coaching services would be exposed to the benefits of coaching. Due to the limited abilities of AI, at some point users of AI coaching services may have the need for more advanced and intelligent human coaching. We believe this broadening awareness of and exposure to coaching through AI could in fact create more opportunities for human coaches. Human coaches should view AI coaching as an opportunity, not a threat, in line with the findings of a recent study [74].

The final implication relates to coaches and their praxis. The efficacy of the AI coach in this study suggests that coaches who operate at a low level of coach maturity [73, 75] could be replaced by AI coaches. Therefore, human coaches need to evaluate their coach maturity and invest the necessary resources to improve their coaching knowledge and skills to ensure that they offer their clients a valuable and relevant service. Humans currently and for the foreseeable future will outperform AI in terms of context awareness, transference of learning and higher order complex sense-making [17]. Coaches should ensure that they embody these uniquely human forms of intelligence in their coaching praxis.

Limitations and future research

Participants in both studies were undergraduate students. This implies that the results may not generalise to other populations; however, the effects observed are still valid given that similar groups of participants were used in both studies. Measurements were performed by means of self-scores by participants, which may introduce the possibility of self-score bias. These limitations are offset to some degree by the relatively large sample size and the longitudinal, RCT research design.

In terms of future research, other narrowly focused AI coaches, who specialise in one specific coaching aspect such as wellness, self-awareness or emotional intelligence, should be created and used in a replication study similar to this goal-attainment AI coach. This would help us understand what other coaching aspects can be automated. Should some of these other coaching aspects yield positive results in an AI implementation, the possibility of creating a composite AI coach consisting of an amalgamation of these narrow AI capabilities should be researched. While general AI is not yet possible, perhaps the sum of numerous narrow AI coaching capabilities could create a synergetic AI coaching effect.

Conclusion

Uniquely human characteristics such as emotional intelligence and empathy allows human coaches to build bonds with their clients that no AI can currently rival. This comparison study however shows that AI coaches that focus on a narrow aspect of coaching and are based on fundamental, proven theories may very well rival human coaches in that specific coaching aspect. While AI coaches will not out-perform human coaches as a whole any time soon, these specific applications of coaching could democratize coaching and make its benefits available to a much wider audience, while at the same time potentially growing the demand for human coaches through exposing more people to the benefits of coaching.

References

  1. 1. Haenlein M, Kaplan A. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. Calif Manage Rev. 2019;61(4):5–14.
  2. 2. Gaffney H, Mansell W, Tai S. Conversational agents in the treatment of mental health problems: mixed-method systematic review. JMIR mHealth. 2019;6(10):e14166. pmid:31628789
  3. 3. Lattie EG, Adkins EC, Winquist N, Stiles-Shields C, Wafford QE, Graham AK. Digital mental health interventions for depression, anxiety, and enhancement of psychological well-being among college students: systematic review. JMIR. 2019;21(7):e12869. pmid:31333198
  4. 4. Terblanche N. A design framework to create Artificial Intelligence Coaches. International Int. J. Evid. Based Coach. Mentor. 2020;18(2):152–65.
  5. 5. Bachkirova T, Cox E, Clutterbuck D. Introduction. In: Cox E, Bachkirova T, Clutterbuck D, editors. The Complete Handbook of Coaching. Thousand Oaks: SAGE; 2014. p. 1–20.
  6. 6. International Coach Federation (ICF). 2020 ICF Global Coaching Study. 2020 [Internet]. Available from: https://coachingfederation.org/research/global-coaching-study
  7. 7. De Haan E. What works in executive coaching: Understanding outcomes through quantitative research and practice-based evidence. Oxfordshire: Routledge; 2021.
  8. 8. Athanasopoulou A, Dopson S. A systematic review of executive coaching outcomes: Is it the journey or the destination that matters the most? Leadersh Q. 2018;29(1):70–88.
  9. 9. Blackman A, Moscardo G, Gray DE. Challenges for the theory and practice of business coaching: A systematic review of empirical evidence. Hum Resour Dev Rev. 2016;15(4):459–86.
  10. 10. De Haan E, Nilsson VO. Does executive coaching work? A meta-analysis based only on randomized controlled trials. J Occup Organ Psychol. In press.
  11. 11. Grover S, Furnham A. Coaching as a developmental intervention in organisations: A systematic review of its effectiveness and the mechanisms underlying it. PloS One. 2016;11(7), e0159137. pmid:27416061
  12. 12. Jones RJ, Woods SA., Guillaume YR. The effectiveness of workplace coaching: A meta analysis of learning and performance outcomes from coaching. J Occup Organ Psychol. 2016;89(2):249–77.
  13. 13. Theeboom T, Beersma B, van Vianen AE. Does coaching work? A meta-analysis on the effects of coaching on individual level outcomes in an organizational context. J Posit Psychol. 2014;9(1):1–18.
  14. 14. De Haan E, Grant AM, Burger Y, Eriksson PO. A large-scale study of executive and workplace coaching: The relative contributions of relationship, personality match, and self-efficacy. Consult Psychol J Pract Res. 2016;68(3):189–207.
  15. 15. Graßmann C, Scholmerich F, Schermuly CC. The relationship between working alliance and client outcomes in coaching: A meta-analysis. Human Relations. 2020;73(1):35–58.
  16. 16. McKenna DD, Davis SL. Hidden in plain sight: The active ingredients of executive coaching. Ind Organ Psychol. 2009;2(3):244–60.
  17. 17. van de Boer-Visschedijk GC, Blankendaal RAM, Boonekamp RC, Eikelboom AR. Human-versus Artificial Intelligence. Front Artif Intell Appl. 2021;4. pmid:33981990
  18. 18. Fulmer R, Joerin A, Gentile B, Lakerink L, Rauws M. Using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: randomized controlled trial. JMIR mHealth. 2018;5(4):e64. pmid:30545815
  19. 19. Greer S, Ramo D, Chang YJ, Fu M, Moskowitz J, Haritatos J. Use of the chatbot “vivibot” to deliver positive psychology skills and promote well-being among young people after cancer treatment: randomized controlled feasibility trial. JMIR mHealth and uHealth 2019;7(10):e15018. pmid:31674920
  20. 20. Grant AM. An integrated model of goal-focused coaching: An evidence-based framework for teaching and practice. Int Coach Psychol Rev. 2012;7(2):146–65.
  21. 21. Grant AM. Autonomy support, relationship satisfaction and goal focus in the coach-coachee relationship: Which best predicts coaching success? Coaching. 2014;7(1):18–38.
  22. 22. French RM. The Turing Test: the first 50 years. Trends Cogn Sci. 2000;4(3):115–22. pmid:10689346
  23. 23. Triberti S, Durosini I, Pravettoni G. A “third wheel” effect in health decision making involving artificial entities: A psychological perspective. Front public health, 2020;8:117. pmid:32411641
  24. 24. Bughin J, Hazan E. The new spring of artificial intelligence: A few early economies. VoxEU.org. [Internet]. 2017, August 21. Available from: https://voxeu.org/article/new-spring-artificial-intelligence-few-early-economics
  25. 25. Bostrom N. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press; 2014.
  26. 26. Shanahan M. The Technological Singularity. Cambridge, Massachusetts: MIT Press; 2015.
  27. 27. Siau KL, Yang Y. Impact of Artificial Intelligence, Robotics, and Machine Learning on Sales and Marketing. Midwest United States for Information Systems Conference Proceedings (MWAIS 2017), 48. [Internet]. Available from: http://aisel.aisnet.org/mwais2017/48; 2017.
  28. 28. Panetta K. Widespread artificial intelligence, biohacking, new platforms and immersive experiences dominate this year’s Gartner Hype Cycle. Gartner. [Internet]. 2018. Available from: https://www.gartner.com/smarterwithgartner/5-trends-emerge-in-gartner-hype-cycle-for-emerging-technologies-2018/
  29. 29. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature. 2017;550(7676):354. pmid:29052630
  30. 30. Xu W, Dainoff M, Ge L, Gao Z. Transitioning to human interaction with AI systems: New challenges and opportunities for HCI professionals to enable human-centered AI. Int J Hum-Comput Interact. 2021.
  31. 31. Sundar SS. Rise of machine agency: A framework for studying the psychology of human-AI interaction (HAII). J Comput-Mediat Commun. 2020;25(1):74–88.
  32. 32. Xu W. Toward human-centered AI: A perspective from human-computer interaction. Interactions. 2019;26(4):42–6.
  33. 33. Zheng N, Liu Z, Ren P, et al. Hybrid-augmented intelligence: Collaboration and cognition. Front Inform Technol Electron Eng. 2017;18(2):153–79.
  34. 34. Brill JC, Cummings ML, Evans AW III, Hancock PA, Lyons JB, Oden K. Navigating the Advent of Human-Machine Teaming. Proceedings of the Human Factors and Ergonomics Society Annual Meeting; 2018;62(1):455–59. Los Angeles, CA: SAGE.
  35. 35. O’Neill T, McNeese N, Barron A, Schelble B. Human-autonomy teaming: A review and analysis of the empirical literature. Hum Factors. 2020;Oct. pmid:33092417
  36. 36. Kamphorst BA. E-coaching systems: What they are, and what they aren’t. Personal and Ubiquitous Computing. 2017;21(4):625–32.
  37. 37. Ciechanowski L, Przegalinska A, Magnuski M, Gloor PA. In the shades of the uncanny valley: An experimental study of human-chatbot interaction. Future Gener Comput Syst. 2019;92(March):539–48.
  38. 38. Bakker D, Kazantzis N, Rickwood D, Rickard N. Mental health smartphone apps: Review and evidence-based recommendations for future developments. JMIR mHealth. 2016;3(1):e7. pmid:26932350
  39. 39. Shum H, He X, Li D. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Front Inform Technol Electron Eng. 2018;19(1):10–26.
  40. 40. Thies IM, Menon N, Magapu S, et al. How do you want your chatbot? An exploratory Wizard-of-Oz study with young, urban Indians. IFIP Conference on Human-Computer Interaction, 2017 September 25–29; Mumbai, India; 2017. p.441–59.
  41. 41. Araujo T. Living up to the chatbot hype: The influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Comput Hum Behav. 2018;85:183–89.
  42. 42. Lovejoy J. The UX of AI. 2019. [Internet]. Available from: https://design.google/library/ux-ai/.
  43. 43. Lee S, Choi J. Enhancing user experience with conversational agent for movie recommendation: Effects of self-disclosure and reciprocity. Int J Hum Comput Stud. 2017;103:95–105.
  44. 44. Neururer M, Schlögl S, Brinkschulte L, Groth A. Perceptions on authenticity in chat bots. Multimodal Technol Interact. 2018;2(3).
  45. 45. Sjödén B, Silvervarg A, Haake M, Gulz A. Extending an Educational Math Game with a Pedagogical Conversational Agent: Facing design challenges. In: De Wannemacker S, Clarebout G, De Causmaecker P, editors. Interdisciplinary Approaches to Adaptive Learning: A Look at the Neighbours. Berlin: Springer; 2011. p.116–30.
  46. 46. Chaves AP, Gerosa MA. Single or Multiple Conversational Agents? An Interactional Coherence Comparison. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018 April; Montreal, Canada; 2018.
  47. 47. Geissler H, Hasenbein M, Kanatouri S, Wegener R. E-Coaching: Conceptual and empirical findings of a virtual coaching programme. International Journal of Evidence Based Coaching and Mentoring [Internet]. 2014;12(2):165–86. Available from: https://radar.brookes.ac.uk/radar/items/585eb4f9-19ce-49e1-b600-509fde1e18c0/1/.
  48. 48. Poepsel MA. The impact of an online evidence-based coaching program on goal striving, subjective well-being, and level of hope. Capella University. 2011. [Internet]. Available from: https://pqdtopen.proquest.com/doc/872553863.html.
  49. 49. Tallyn E, Fried H, Gianni R, et al. The Ethnobot: Gathering Ethnographies in the Age of IoT. CHI ’18: CHI Conference on Human Factors in Computing Systems, 2018 April; Montreal, Canada. 2018.
  50. 50. Jain M, Kumar P, Kota R, Patel SN. Evaluating and Informing the Design of Chatbots. DIS ’18: Designing Interactive Systems Conference, 2018 June; Hong Kong, China. 2018.
  51. 51. Fiske A, Henningsen P, Buyx A. Your robot therapist will see you now: Ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. J Medical Internet Res. 2019;21(5):e13216. pmid:31094356
  52. 52. Terblanche N, Passmore J, Myburgh J. African organisational coaching practice: Exploring approaches used, and the factors influencing coaches’ fees. S Afr J Bus Manag. 2021;52(1):a2395.
  53. 53. International Coach Federation (ICF). 2020 ICF Global Coaching Study: Executive summary. 2020. [Internet]. Available from: https://coachfederation.org/app/uploads/2020/09/FINAL_ICF_GCS2020_ExecutiveSummary.pdf
  54. 54. Shoukry H, Cox E. Coaching as a social process. Manag Learn, 2018;49(4):413–28.
  55. 55. Locke EA, Latham GP. Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. Am Psychol. 2002;57(9):705–17. pmid:12237980
  56. 56. Locke EA, Latham GP. A theory of goal setting & task performance. Upper Saddle River, NJ: Prentice-Hall; 1990.
  57. 57. Austin JT, Vancouver JB. Goal constructs in psychology: Structure, process, and content. Psychol Bull. 1996;120(3):338–75.
  58. 58. Koestner R, Lekes N, Powers TA, Chicoine E. Attaining personal goals: Self-concordance plus implementation intentions equals success. J Pers Soc Psychol. 2002;83(1):231–44. pmid:12088128
  59. 59. Niemiec CP, Ryan RM, Deci EL. The path taken: Consequences of attaining intrinsic and extrinsic aspirations in post-college life. J Res Pers. 2009;43(3):291–306. pmid:20161160
  60. 60. Sonnentag S. Performance, well-being, and self-regulation. In: Sonnentag S, editor. Psychological management of individual performance. Hoboken, NJ: Wiley; 2002. p. 405–24.
  61. 61. Klein N, Fishbach A. Feeling good at the right time: Why people value predictability in goal attainment. J Exp Soc Psychol. 2014;55:21–30.
  62. 62. Schmid PC. Power reduces the goal gradient effect. J Exp Soc Psychol. 2020;90:104003.
  63. 63. Günsoy C, Joo M, Cross SE, Uskul AK, Gul P, Wasti SA, et al. The influence of honor threats on goal delay and goal derailment: A comparison of Turkey, Southern US, and Northern US. J Exp Soc Psychol. 2020;88:103974.
  64. 64. Grant AM. An Integrative Goal-Focused Approach to Executive Coaching. In: Stober DR, Grant AM, editors. Evidence based coaching handbook: Putting best practices to work for your clients. Hoboken, NJ: John Wiley & Sons Inc. 2006. p. 153–92.
  65. 65. Carver CS, Scheier MF. On the self regulation of behaviour. Cambridge, UK: Cambridge University Press; 1998.
  66. 66. Harkin B, Webb TL, Chang BPI, Prestwich A, Conner M, Kellar I, et al. Does monitoring goal progress promote goal attainment? A meta-analysis of the experimental evidence. Psychol Bull. 2016;142(2):198–229. pmid:26479070
  67. 67. Grant AM, Curtayne L, Burton G. Executive coaching enhances goal attainment, resilience and workplace well-being: a randomized controlled study. J Posit Psychol. 2009;4(5), 396–407.
  68. 68. Zimmermann LC, Antoni CH. Problem-specific coaching interventions influence goal attainment via double-loop learning. Zeitschrift für Arbeits-und Organisationspsychologie A&O. 2018(September).
  69. 69. Losch S, Traut-Mattausch E, Mühlberger MD, Jonas E. Comparing the effectiveness of individual coaching, self-coaching, and group training: How leadership makes the difference. Front Psychol. 2016;7:629. pmid:27199857
  70. 70. Terblanche NHD, Jock RJ, Ungerer M. Creating and maintaining a commercially viable executive coaching practice in South Africa. South Afr J Entrep Small Bus Manag. 2019;11(1):a192.
  71. 71. Bozer G, Sarros JC, Santora JC. Academic background and credibility in executive coaching effectiveness. Personnel Review. 2014;43:881–97.
  72. 72. Bachkirova T, Smith CL. From competencies to capabilities in the assessment and accreditation of coaches. Int J Mentor. 2015;13(2):123–40.
  73. 73. Megginson D, Clutterbuck D. Further techniques for coaching and mentoring. New York: Routledge; 2010.
  74. 74. Bhargava A, Bester M, Bolton L. Employees’ perceptions of the implementation of robotics, artificial intelligence, and automation (RAIA) on job satisfaction, job security, and employability. J Technol Behav Sci. 2021;6(1):106–13.
  75. 75. Drake DB. What do coaches need to know? Using the Mastery Window to assess and develop expertise, Coaching. 2011;4(2):138–55.
  76. 76. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. Trials. 2010.11(1), 1–8. pmid:20332509
  77. 77. Wampold BE, Imel ZE. The great psychotherapy debate: The evidence for what makes psychotherapy work. 2nd ed. New York: Routledge; 2015.
  78. 78. Colagiuri B, Smith CA. A systematic review of the effect of expectancy on treatment responses to acupuncture. Evid. 2012:1–12. pmid:22203882
  79. 79. Colloca L, Petrovic P, Wager TD, Ingvar M, Benedetti F. How the number of learning trials affects placebo and nocebo responses. Pain®. 2010;151(2):430–39. pmid:20817355
  80. 80. Chen Y, Hsu C, Liu L, Yang S. Constructing a nutrition diagnosis expert system. Expert Syst Appl. 2012;39(2):2132–56.
  81. 81. Telang PR, Kalia AK, Vukovic M, Pandita R, Singh MP. A conceptual framework for engineering chatbots. IEEE Internet Comput. 2018;22(6):54–9.
  82. 82. Chung K, Park RC. Chatbot-based healthcare service with a knowledge base for cloud computing. Cluster Computing. 2019;22(1):S1925–S1937.
  83. 83. Latham GP, Locke EA. New developments in and directions for goal-setting research. Eur Psychol. 2007;12(4):290–300.
  84. 84. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:175–91. pmid:17695343
  85. 85. MacKie D. The effects of coachee readiness and core self-evaluations on leadership coaching outcomes: A controlled trial. Coaching. 2015;8(2):120–36.
  86. 86. West DM. What happens if robots take the jobs? The impact of emerging technologies on employment and public policy. Centre for Technology Innovation at Brookings, Washington DC. 2015(October). [Internet] Available from: https://www.brookings.edu/wp-content/uploads/2016/06/robotwork.pdf