Elsevier

Computers in Human Behavior

Volume 49, August 2015, Pages 245-250
Computers in Human Behavior

Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations

https://doi.org/10.1016/j.chb.2015.02.026Get rights and content

Highlights

  • How does communication change when speaking to intelligent agents over humans?

  • We compared 100 IM conversations to 100 exchanges with the chatbot Cleverbot.

  • People used more, but shorter, messages when communicating with chatbots.

  • People also used more restricted vocabulary and greater profanity with chatbots.

  • People can easily adapt their language to communicate with intelligent agents.

Abstract

This study analyzed how communication changes when people communicate with an intelligent agent as opposed to with another human. We compared 100 instant messaging conversations to 100 exchanges with the popular chatbot Cleverbot along seven dimensions: words per message, words per conversation, messages per conversation, word uniqueness, and use of profanity, shorthand, and emoticons. A MANOVA indicated that people communicated with the chatbot for longer durations (but with shorter messages) than they did with another human. Additionally, human–chatbot communication lacked much of the richness of vocabulary found in conversations among people, and exhibited greater profanity. These results suggest that while human language skills transfer easily to human–chatbot communication, there are notable differences in the content and quality of such conversations.

Introduction

Artificial intelligence’s (A.I.) efforts in the last half century to model human language use by computers have not been wildly successful. While the idea of using human language to communicate with computers holds merit, A.I. scientists have, for decades, underestimated the complexity of human language, in both comprehension and generation. The obstacle for computers is not just understanding the meanings of words, but understanding the endless variability of expression in how those words are collocated in language use to communicate meaning.

Nonetheless, decades later, we can find an abundance of natural language interaction with intelligent agents on the internet, from airline reservation systems to merchandise catalogs, suggesting that humans have little or no difficulty transferring their language skills to such applications. Because so much of this communication occurs through digital technology rather than in person, computer-mediated communication (or “CMC”) has become a prominent area of research in which to explore this simulation of natural human language.

One of the most popular forms of CMC today, particularly among adolescents and teenagers, is instant messaging (IM) (Tagliamonte & Denis, 2008). While many specialized applications enable instant messaging, the service is also provided through many other popular media, such as multiplayer online games, email clients, and social networking websites (Varnhagen et al., 2009).

Several studies have compared IM and other forms of CMC to other forms of language. Ferrara, Brunner, and Whittemore (1991) determined that CMC possesses uniquely distinguishing linguistic features that display qualities of both written and spoken dialogue. Compared to other standard forms of communication, CMC’s most distinctive trait is its unique, shortened-form language of acronyms and abbreviations, and an informal discursive style that is similar to face-to-face spoken language (Werry, 1996). CMC differs from spoken communication, however, in its lack of cues from features such as body language, communicative pauses, and vocal tones (Hentschel, 1998). Despite this absence of cues, however, CMC has been found to be able to communicate emotion as well as or better than face-to-face communication (Derks, Fischer, & Bos, 2008).

Although CMC has been compared to other forms of communication, few studies have compared different forms of CMC to one another. Perhaps the most noteworthy of these studies is Baron’s (Baron, 2007) comparison of the linguistic characteristics of IM and text (or SMS) messages – another form of CMC – among American college students, which found that the average text message contained more words, characters, sentences, abbreviations, and contractions than the average instant message.

To our knowledge, however, no research has investigated the linguistic characteristics of a different form of CMC: chatbot communication. Chatbots, or chatterbots, are another widespread domain of CMC. Chatbots are “machine conversation system[s] [that] interact with human users via natural conversational language” (Shawar & Atwell, 2005, p. 489). Users interact with these applications primarily to engage in small talk. Functionally, their approach to natural language processing is an extension of the same technique used in Weizenbaum’s ELIZA (Weizenbaum, 1966). A variety of new chatbot architectures and technologies (e.g., Ultra Hal, ALICE, Jabberwacky, Cleverbot) have arisen recently, each attempting to simulate natural human language more accurately and thoroughly (Carpenter, n.d. a, Shawar and Atwell, 2007, Wallace et al., 2003, Zabaware, n.d.).

Despite the popularity of chatbots today, we are not aware of any research analyzing how humans converse with them, particularly from a linguistic perspective. Several extant studies on chatbots have focused on developing or improving their ability to interpret and respond meaningfully to human language: one study examined a chatbot’s ability to respond correctly when faced with common CMC features like abbreviations and overlapping utterances from multiple speakers (Shawar & Atwell, 2005), while another examined a chatbot’s robustness when faced with unconventional linguistic features from non-native ESL speakers (such as misspellings and incorrect word order) (Coniam, 2008). Another area of research has focused on evaluating users’ attribution of human qualities or personality traits to the chatbots they converse with, and how that may lead to greater disclosure in medical, research, or therapeutic settings (Hasler et al., 2013, Holtgraves et al., 2007, Lortie and Guitton, 2011). Lortie and Guitton (2011) specifically investigated how judges go about distinguishing between humans and computers when interacting in a formal Turing Test. They tracked several descriptive and cognitive parameters along with indicators of interest using the Linguistic Inquiry and Word Count (LIWC; Pennebaker, Chung, Ireland, Gonzales, and Booth (2007)) program. Their results suggested that judges’ determinations of humanness were associated with communication that contains more words per message, a higher percentage of articles, and a higher percentage of words that were longer than six letters. Such communication, however, is biased by the goal-directed behavior of judges trying to figure out if they are talking to a computer.

Section snippets

Current study

The purpose of this study was to investigate how users’ explicit and implicit transfer and expectations of human language is manifested in human–computer interaction. Specifically, we sought to answer an unexplored question in the fields of both computer-mediated communication and chatbot development: do humans communicate differently when they know their conversational partner is a computer as opposed to another human being?

To accomplish this, we compared 100 random human IM conversations

Participants and procedure

After obtaining approval to conduct this research from the Institutional Review Board, we collected data from two different communication sources. For the human–human conversations we collected IM conversations from a convenience sample of undergraduate and graduate students at a liberal arts college in the mid-Atlantic region. We asked participants to submit unaltered conversations from past IM sessions. All participant names and screen names were replaced with unique user numbers, ensuring

Linguistic variables

The following are the results of the MANOVA for the seven dependent variables. The Table 1 presents the means, standard deviations, and subsequent univariate ANOVA results analyzing the seven variables by communication type. Of the seven variables, four were statistically significant: messages per conversation, words per message, type/token ratio, and frequency of profanity. There was a significantly greater number of human–chatbot messages and profanity in these messages, but with

Discussion

The purpose of this study was to compare human to human conversations with human to chatbot conversations. One hypothesis was that the average human would send fewer messages and write fewer words per message when sending messages to chatbots than when communicating with other humans. While messages sent to chatbots did contain fewer words per message than those sent to another person, as predicted, people were actually inclined to send more than twice as many messages to chatbots compared to

Limitations

There were several limitations to this study. Because our human–human corpus was derived from conversations submitted by volunteers, it is difficult to assess how representative they were of natural human conversation. Although we ensured complete anonymity to all volunteers, there is no guarantee that the conversations submitted were not first censored or otherwise altered in some way. Additionally, the human volunteers were not restricted to submitting a single conversation, so our corpus of

Future research

Follow-up research could include a parts of speech analysis that would allow for a detailed comparison of the grammatical structures of the conversations in each corpus. This analysis should isolate individual sentences in both seriatim fragmented messages and messages containing multiple sentences. Another area of research could investigate the differences in conversation length by analyzing the amount of physical time spent in each conversation as well as the length of silences between

Conclusion

Whenever we interact with our environment or with other people, we put to use our adaptive processes. This is how we can walk in our house in the dark, search for items in an unfamiliar grocery store, or converse with a stranger. While these adaptive processes allow us to successfully negotiate the novel and unexpected events in our lives, they nonetheless come with a cost. Such situations require that we pay more attention, draw more from the history of our experiences, and be ready to change

Conflict of Interest

The authors declare that they have no conflict of interest.

References (32)

  • D. Craig

    Instant messaging: The language of youth literacy

    The Booth Prize Essays

    (2003)
  • D. Driscoll

    The Ubercool morphology of internet gamers: A linguistic analysis

    Undergraduate Research Journal for the Human Sciences

    (2002)
  • C.A. Ferguson

    Toward a characterization of English foreigner talk

    Anthropological Linguistics

    (1975)
  • K. Ferrara et al.

    Interactive written discourse as an emergent register

    Written Communication

    (1991)
  • Gianotto, A. (n.d.). BanBuilder banned – Word list generator....
  • E. Hentschel

    Communication on ircLinguistik

    Linguistik Online

    (1998)
  • Cited by (511)

    View all citing articles on Scopus
    1

    Tel.: +1 240 684 5606.

    2

    Tel.: +1 301 696 3762.

    View full text