Review
Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis

https://doi.org/10.1016/j.paid.2017.12.018Get rights and content

Highlights

  • This is a meta-analysis on the use of social media data to predict Big 5 traits.

  • We investigate use of different digital footprints including text and pictures.

  • Accuracy of prediction is consistent across Big 5 traits.

  • Use of multiple types of digital footprints increases prediction accuracy.

Abstract

The growing use of social media among Internet users produces a vast and new source of user generated ecological data, such as textual posts and images, which can be collected for research purposes. The increasing convergence between social and computer sciences has led researchers to develop automated methods to extract and analyze these digital footprints to predict personality traits. These social media-based predictions can then be used for a variety of purposes, including tailoring online services to improve user experience, enhance recommender systems, and as a possible screening and implementation tool for public health. In this paper, we conduct a series of meta-analyses to determine the predictive power of digital footprints collected from social media over Big 5 personality traits. Further, we investigate the impact of different types of digital footprints on prediction accuracy. Results of analyses show that the predictive power of digital footprints over personality traits is in line with the standard “correlational upper-limit” for behavior to predict personality, with correlations ranging from 0.29 (Agreeableness) to 0.40 (Extraversion). Overall, our findings indicate that accuracy of predictions is consistent across Big 5 traits, and that accuracy improves when analyses include demographics and multiple types of digital footprints.

Introduction

Social media and social network sites have become increasingly popular; currently about 2 billion people worldwide have a Facebook account, and over 1250 million users access Facebook on a daily basis (Statista, 2017). Similarly, Twitter averages about 328 million active users (Statista, 2017), with about 100 million daily users (Aslam, 2017). Social media has revolutionized how people interact with each other, is a virtually unavoidable avenue for social interactions, and a place where users present themselves to the world by creating an online profile. Every day, millions of people express their immediate thoughts, emotions, and beliefs by writing, posting, and sharing content on social media, which is then viewable by the user's online social network. Evidence also suggests that content generated and shared on social media user profiles represents an extension of “one's self” and reflects the actual personality of its individual users rather than project their most desirable traits (Back et al., 2010, Seidman, 2013). Consequently, the interactive nature of social media coupled with its ever-increasing utilization results in a naturally occurring, immense, ecologically valid dataset of online human activity, or digital footprints, consisting of information shared by users on their social media profiles - e.g., personal information about age, gender orientation, place of residence, as well shared texts, pictures, and videos (Madden, Fox, Smith, & Vitax, 2007). These digital footprints can be recorded, and have been previously analyzed by researchers from diverse disciplines, including computer science, public health, and social sciences (e.g., De Choudhury et al., 2013, De Choudhury et al., 2014, Eichstaedt et al., 2015, Gosling et al., 2011, Matz and Netzer, 2017, Padrez et al., 2015, Settanni and Marengo, 2015). In particular, the human migration to social media has steered psychologists toward studying existing relationships between digital footprints and psychological characteristics (Kosinski, Matz, Gosling, Popov, & Stillwell, 2015). The emergence of, and access to, these large user data sets has reshaped the way social science researchers use content analysis to study psychological characteristics and has resulted in the convergence of social and computer sciences. This interdisciplinary work of social and computer sciences has allowed researchers to not only seek to gain insights from studying human behaviors on social media, but to also predict psychological characteristics and behaviors based on automated data mining and the analysis of digital footprints (Schwartz & Ungar, 2015).

Personality has been regarded as one of the most important topics in psychological research (Li et al., 2014, Ozer and Benet-Martinez, 2006). Research has shown that personality may be predictive of many aspects of life, including academic success (e.g., Komarraju, Karau, & Schmeck, 2009), job performance (e.g., Judge et al., 1999, Neal et al., 2012), social status (e.g., Anderson, John, Keltner, & Kring, 2001), health (e.g., Soldz & Vaillant, 1999), success in romantic relationships (e.g., Donnellan et al., 2004, Donnellan et al., 2005), political attitudes (e.g., Gerber, Huber, Doherty, Dowling, & Ha, 2010), subjective well-being (e.g., Hayes & Joseph, 2003), and online behaviors (e.g., Wang, 2013). While several models to describe personality exist, one of the most well researched, well regarded, and widely accepted theoretical frameworks of personality is the five-factor (or Big 5) model, comprised of openness to new experiences, conscientiousness, extraversion, agreeableness and neuroticism (McCrae and Costa, 1987, McCrae and John, 1992). Big 5 traits have been shown to be significantly associated with users' behaviors on social media. For example, individuals with high extraversion have been characterized by higher levels of activity on social media (e.g., Blackwell et al., 2017, Kuss and Griffiths, 2011), and have a greater number of friends (Kosinski, Bachrach, Kohli, Stillwell, & Graepel, 2014) than introverted individuals. Individuals with high neuroticism are more prone to self-disclose hidden aspects of themselves, use social media as a passive way to learn about others (Seidman, 2013), and use more negative words in their posts, or ‘status updates’ (Schwartz et al., 2013). On the other hand, agreeable individuals tend to use fewer swear words and express positive emotions more frequently in their posts (Schwartz et al., 2013), and are more likely to post pictures expressing a positive mood (Liu, Preotiuc-Pietro, Samani, Moghaddam, & Ungar, 2016). Individuals with high conscientiousness appear to be cautious in managing their social media profiles; they tend to post fewer pictures (Amichai-Hamburger & Vinitzky, 2010), express less “Likes”, and engage in less group activity on social media (Kosinski et al., 2014). Furthermore, individuals with high openness tend to have larger networks (Quercia, Lambiotte, Stillwell, Kosinski, & Crowcroft, 2012), and “Like” more content found on social media (Bachrach, Kosinski, Graepel, Kohli, & Stillwell, 2012) than individuals low on the trait. Driven by increasing evidence of the presence of links between personality and online behaviors, researchers have begun exploring the use of digital footprints left by people on social media to infer the Big 5 traits. Researchers in this field have generally employed a common research design consisting of, 1. The administration of self-report questionnaires to assess personality traits of social media users, 2. The collection of digital footprints from users' social media profiles, 3. The processing of these digital footprints to extract single or multiple features to be employed in predictive models, and 4. The evaluation of accuracy of personality predictions based on these features. However, studies vary in terms of type of digital footprints (e.g., text, pictures, Likes, user activity, which may be examined separately or in combination), and social media platforms (e.g., Facebook, Twitter, Instagram, Youtube) examined. For instance, Schwartz et al. (2013) investigated the feasibility of predicting personality traits based on textual features extracted from Facebook status updates using topic-modeling techniques. Similarly, Liu et al. (2016) and Qiu, Lin, Ramsay, and Yang (2012) both analyzed language/text used on Twitter to build predictive models for the Big 5 traits. While Gao et al. (2013), Li et al. (2014), and Wei et al. (2017) inferred the Big 5 traits using samples from the Sina Weibo micro blog albeit using different combinations of digital footprints (activity vs. activity + language vs. activity + language + pictures) in their analysis. Additionally, Kosinski, Stillwell, and Graepel (2013) and Youyou, Kosinski, and Stillwell (2015) explored Big 5 personality predictions based on Facebook Likes. Findings emerging from these studies are heterogeneous with respect to the accuracy of prediction for each personality trait. For instance, using “Likes” data extracted from Facebook, Kosinski et al. (2013) found prediction accuracy to vary significantly across traits, with openness being the easiest to predict. Conversely, Li et al. (2014) analyzed user activity statistics from the Sina Weibo microblog and achieved similar prediction accuracy among all Big 5 Personality traits, and Skowron, Tkalčič, Ferwerda, and Schedl (2016) analyzed language + user features from users of both Twitter and Instagram and found a high prediction accuracy for conscientiousness, but a relatively low prediction accuracy for agreeableness. Even though many studies have been conducted on the subject, this area of psychological research is still quite young, which in part explains the reason for the lack of uniformity in the employed research methods. For example, studies vary largely on sample sizes, type of digital footprints analyzed, and social media platform used for data collection. Given these circumstances with psychological research conducted on social media, there is a need to synthesize and summarize the existing literature in order to evaluate their accuracy, and recommend best methods for personality prediction from social media.

The ability to use digital footprints to accurately predict personality traits may represent a rapid, cost-effective alternative to surveys and reach larger populations, which can be beneficial for academic, health-related, and commercial purposes. With respect to academic research, the development of automated procedures to measure personality would permit to reach larger samples, and obtain measures potentially less prone to social-desirability bias. Furthermore, personality traits have also been shown to act as potential risk and protective factors for many health-related outcomes (Booth-Kewley and Vickers, 1994, Raynor and Levine, 2009, Widiger and Oltmanns, 2017), and to influence beliefs about health (e.g., Hill & Gick, 2011). Therefore, the ability to distinguish online users based on their personality profiles could be leveraged in order to tailor techniques aimed at improving the efficacy of health related messages (Gale et al., 2015, Lawson et al., 2007, Neeme et al., 2015, Rimer and Kreuter, 2006) and individual interventions (Chapman et al., 2014, Franks et al., 2009) directed at online populations, and thus assist in the effective implementation of public health policies (Chapman et al., 2011, Hengartner et al., 2016). With respect to commercial applications, knowledge about individuals' personalities can allow for the enhancement and personalization of recommender systems in order to improve user experiences (Bachrach et al., 2012, Farnadi et al., 2016). Also, social media sites, online advertisers, e-commerce retailers, and e-learning websites may be tailored based on individual personality and present information in ways that will be better received by users (Bachrach et al., 2012, Gao et al., 2013, Golbeck et al., 2011, Kosinski et al., 2013, Markovikj et al., 2013).

The aim of the current study is to conduct a series of meta-analyses to estimate the mean predictive value of digital footprints on each of the Big 5 Personality Traits. Further, we aim to study if the use of different types of digital footprints influence the accuracy of personality prediction, and if data from different social media platforms lead to different results. Lastly, we will check for possible bias in effect size estimates due to study quality.

Section snippets

Literature search

To identify relevant studies on the relationships between Big 5 personality traits and digital footprints, we followed the literature search strategies proposed by Durlak and Lipsey (1991). We conducted a broad literature search in databases from various disciplines; i.e., Scopus, ISI Web of Science, Pubmed, and Proquest, using multiple groups of keywords. The first group of keywords used referred to social media platforms, namely; myspace, facebook, instagram, twitter, youtube, photobucket,

Overview of included studies

In total, we identified 24 papers focusing on the analysis of digital footprints extracted from social media and Big 5 Personality traits. Selected papers included 28 studies in which Big 5 personality traits were assessed using versions of the Big 5 Personality Inventory and IPIP measures. 19 studies obtained their samples from Facebook, 5 from Twitter, 3 from the Sina Weibo micro-blogging site, and 1 article used a combined sample from Instagram and Twitter. Twenty studies analyzed a single

Discussion

To our knowledge, this is the first meta-analysis aimed at summarizing findings from studies investigating the predictability of Big 5 personality traits based on digital footprints automatically extracted from social media. Our first aim was to estimate the mean predictive value of digital footprints over each trait. Overall, prediction of Big 5 traits based on the analysis of digital footprints from social media ranged from 0.29 (agreeableness) to 0.40 (extraversion), with no significant

Limitations of the study

The present study is not without limitations. First, given the relatively low number of studies investigating diverse social media platforms and the heterogeneity of both the features analyzed and the analytical approaches employed in the studies included in the analysis, we could not perform a thorough comparison of the accuracy of personality prediction across specific social media platforms. The diverse usage, or activities, users partake in while engaging in specific types of social media

Conclusions

Overall, the present meta-analysis demonstrates that Big 5 personality traits can be inferred using digital footprints extracted from social media with remarkable accuracy. The ability to make distinct but similarly accurate predictions of Big 5 traits allows for the identification of social media users with different personality profiles. This information is of utmost relevance since it can be beneficial for research, commercial, and public health purposes. First, the ability to assess

References (101)

  • E.M. Hill et al.

    The big five and cervical screening barriers: Evidence for the influence of conscientiousness, extraversion and openness

    Personality and Individual Differences

    (2011)
  • M. Komarraju et al.

    Role of the Big Five personality traits in predicting college students' academic motivation and achievement

    Learning and Individual Differences

    (2009)
  • S.C. Matz et al.

    Using big data as a window into consumers' psychology

    Current Opinion in Behavioral Sciences

    (2017)
  • L. Qiu et al.

    You are what you tweet: Personality expression and perception on Twitter

    Journal of Research in Personality

    (2012)
  • R. Rosenthal

    The file drawer problem and tolerance for null results

    Psychological Bulletin

    (1979)
  • G. Seidman

    Self-presentation and belonging on Facebook: How personality influences social media use and motivations

    Personality and Individual Differences

    (2013)
  • S. Soldz et al.

    The Big Five personality traits and the life course: A 45-year longitudinal study

    Journal of Research in Personality

    (1999)
  • J.A. Sterne et al.

    Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis

    Journal of Clinical Epidemiology

    (2001)
  • C. Anderson et al.

    Who attains social status? Effects of personality and physical attractiveness in social groups

    Journal of Personality and Social Psychology

    (2001)
  • S. Aslam

    Twitter by the numbers: Stats, demographics & fun facts

    (2017)
  • Y. Bachrach et al.

    Personality and patterns of Facebook usage

  • M.D. Back et al.

    Facebook profiles reflect actual personality, not self-idealization

    Psychological Science

    (2010)
  • B. Barrett

    You should go check Facebook's new privacy settings

    Wired

    (2016)
  • C.B. Begg et al.

    Operating characteristics of a rank correlation test for publication bias

    Biometrics

    (1994)
  • S. Booth-Kewley et al.

    Associations between major domains of personality and health behavior

    Journal of Personality

    (1994)
  • M. Borenstein et al.

    Meta-regression

  • N.A. Bowling et al.

    Workplace harassment from the victim's perspective: A theoretical model and meta-analysis

    (2006)
  • C. Cadwalladr

    The great British Brexit robbery: How our democracy was hijacked

  • F. Celli et al.

    Automatic personality and interaction style recognition from Facebook profile pictures

  • B.P. Chapman et al.

    Personality-informed interventions for healthy aging: Conclusions from a National Institute on Aging work group

    Developmental Psychology

    (2014)
  • B.P. Chapman et al.

    Personality and longevity: Knowns, unknowns, and implications for public health and personalized medicine

    Journal of Aging Research

    (2011)
  • Cisco

    Cisco visual networking index: Global mobile data traffic forecast update, 2016–2021

    (2017)
  • N. Confessore et al.

    Data firm says ‘secret sauce’ aided trump: Many scoff

  • M. De Choudhury et al.

    Social media as a measurement tool of depression in populations

  • M. De Choudhury et al.

    Characterizing and predicting postpartum depression from shared Facebook data

  • M.B. Donnellan et al.

    Personality, family history, and competence in early adult romantic relationships

    Journal of Personality and Social Psychology

    (2005)
  • J.A. Durlak et al.

    A practitioner's guide to meta-analysis

    American Journal of Community Psychology

    (1991)
  • S. Duval et al.

    Trim and fill: A simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis

    Biometrics

    (2000)
  • J.C. Eichstaedt et al.

    Psychological language on Twitter predicts county-level heart disease mortality

    Psychological Science

    (2015)
  • G. Farnadi et al.

    Computational personality recognition in social media

    User Modeling and User-Adapted Interaction

    (2016)
  • P. Franks et al.

    Five factor model personality factors moderated the effects of an intervention to enhance chronic disease management self-efficacy

    British Journal of Health Psychology

    (2009)
  • R. Fu et al.

    Conducting quantitative synthesis when comparing medical interventions

  • C.R. Gale et al.

    Cognitive ability and personality as predictors of participation in a national colorectal cancer screening programme: The English Longitudinal Study of Ageing

    Journal of Epidemiology and Community Health

    (2015)
  • R. Gao et al.

    Improving user profile with personality traits predicted from social media content

  • A.S. Gerber et al.

    Personality and political attitudes: Relationships across issue domains and political contexts

    American Political Science Review

    (2010)
  • J. Golbeck

    Predicting personality from social media text

    AIS Transactions on Replication Research

    (2016)
  • J. Golbeck et al.

    Predicting personality with social media

  • S.D. Gosling et al.

    Manifestations of personality in online social networks: Self-reported Facebook-related behaviors and observable profile information

    Cyberpsychology, Behavior, and Social Networking

    (2011)
  • M.S. Hershcovis

    “Incivility, social undermining, bullying… oh my!”: A call to reconcile constructs within workplace aggression research

    Journal of Organizational Behavior

    (2011)
  • J.E. Hunter et al.

    Meta-analysis: Cumulating research findings across studies

    (1982)
  • Cited by (289)

    View all citing articles on Scopus
    View full text