ReviewPredicting the Big 5 personality traits from digital footprints on social media: A meta-analysis
Introduction
Social media and social network sites have become increasingly popular; currently about 2 billion people worldwide have a Facebook account, and over 1250 million users access Facebook on a daily basis (Statista, 2017). Similarly, Twitter averages about 328 million active users (Statista, 2017), with about 100 million daily users (Aslam, 2017). Social media has revolutionized how people interact with each other, is a virtually unavoidable avenue for social interactions, and a place where users present themselves to the world by creating an online profile. Every day, millions of people express their immediate thoughts, emotions, and beliefs by writing, posting, and sharing content on social media, which is then viewable by the user's online social network. Evidence also suggests that content generated and shared on social media user profiles represents an extension of “one's self” and reflects the actual personality of its individual users rather than project their most desirable traits (Back et al., 2010, Seidman, 2013). Consequently, the interactive nature of social media coupled with its ever-increasing utilization results in a naturally occurring, immense, ecologically valid dataset of online human activity, or digital footprints, consisting of information shared by users on their social media profiles - e.g., personal information about age, gender orientation, place of residence, as well shared texts, pictures, and videos (Madden, Fox, Smith, & Vitax, 2007). These digital footprints can be recorded, and have been previously analyzed by researchers from diverse disciplines, including computer science, public health, and social sciences (e.g., De Choudhury et al., 2013, De Choudhury et al., 2014, Eichstaedt et al., 2015, Gosling et al., 2011, Matz and Netzer, 2017, Padrez et al., 2015, Settanni and Marengo, 2015). In particular, the human migration to social media has steered psychologists toward studying existing relationships between digital footprints and psychological characteristics (Kosinski, Matz, Gosling, Popov, & Stillwell, 2015). The emergence of, and access to, these large user data sets has reshaped the way social science researchers use content analysis to study psychological characteristics and has resulted in the convergence of social and computer sciences. This interdisciplinary work of social and computer sciences has allowed researchers to not only seek to gain insights from studying human behaviors on social media, but to also predict psychological characteristics and behaviors based on automated data mining and the analysis of digital footprints (Schwartz & Ungar, 2015).
Personality has been regarded as one of the most important topics in psychological research (Li et al., 2014, Ozer and Benet-Martinez, 2006). Research has shown that personality may be predictive of many aspects of life, including academic success (e.g., Komarraju, Karau, & Schmeck, 2009), job performance (e.g., Judge et al., 1999, Neal et al., 2012), social status (e.g., Anderson, John, Keltner, & Kring, 2001), health (e.g., Soldz & Vaillant, 1999), success in romantic relationships (e.g., Donnellan et al., 2004, Donnellan et al., 2005), political attitudes (e.g., Gerber, Huber, Doherty, Dowling, & Ha, 2010), subjective well-being (e.g., Hayes & Joseph, 2003), and online behaviors (e.g., Wang, 2013). While several models to describe personality exist, one of the most well researched, well regarded, and widely accepted theoretical frameworks of personality is the five-factor (or Big 5) model, comprised of openness to new experiences, conscientiousness, extraversion, agreeableness and neuroticism (McCrae and Costa, 1987, McCrae and John, 1992). Big 5 traits have been shown to be significantly associated with users' behaviors on social media. For example, individuals with high extraversion have been characterized by higher levels of activity on social media (e.g., Blackwell et al., 2017, Kuss and Griffiths, 2011), and have a greater number of friends (Kosinski, Bachrach, Kohli, Stillwell, & Graepel, 2014) than introverted individuals. Individuals with high neuroticism are more prone to self-disclose hidden aspects of themselves, use social media as a passive way to learn about others (Seidman, 2013), and use more negative words in their posts, or ‘status updates’ (Schwartz et al., 2013). On the other hand, agreeable individuals tend to use fewer swear words and express positive emotions more frequently in their posts (Schwartz et al., 2013), and are more likely to post pictures expressing a positive mood (Liu, Preotiuc-Pietro, Samani, Moghaddam, & Ungar, 2016). Individuals with high conscientiousness appear to be cautious in managing their social media profiles; they tend to post fewer pictures (Amichai-Hamburger & Vinitzky, 2010), express less “Likes”, and engage in less group activity on social media (Kosinski et al., 2014). Furthermore, individuals with high openness tend to have larger networks (Quercia, Lambiotte, Stillwell, Kosinski, & Crowcroft, 2012), and “Like” more content found on social media (Bachrach, Kosinski, Graepel, Kohli, & Stillwell, 2012) than individuals low on the trait. Driven by increasing evidence of the presence of links between personality and online behaviors, researchers have begun exploring the use of digital footprints left by people on social media to infer the Big 5 traits. Researchers in this field have generally employed a common research design consisting of, 1. The administration of self-report questionnaires to assess personality traits of social media users, 2. The collection of digital footprints from users' social media profiles, 3. The processing of these digital footprints to extract single or multiple features to be employed in predictive models, and 4. The evaluation of accuracy of personality predictions based on these features. However, studies vary in terms of type of digital footprints (e.g., text, pictures, Likes, user activity, which may be examined separately or in combination), and social media platforms (e.g., Facebook, Twitter, Instagram, Youtube) examined. For instance, Schwartz et al. (2013) investigated the feasibility of predicting personality traits based on textual features extracted from Facebook status updates using topic-modeling techniques. Similarly, Liu et al. (2016) and Qiu, Lin, Ramsay, and Yang (2012) both analyzed language/text used on Twitter to build predictive models for the Big 5 traits. While Gao et al. (2013), Li et al. (2014), and Wei et al. (2017) inferred the Big 5 traits using samples from the Sina Weibo micro blog albeit using different combinations of digital footprints (activity vs. activity + language vs. activity + language + pictures) in their analysis. Additionally, Kosinski, Stillwell, and Graepel (2013) and Youyou, Kosinski, and Stillwell (2015) explored Big 5 personality predictions based on Facebook Likes. Findings emerging from these studies are heterogeneous with respect to the accuracy of prediction for each personality trait. For instance, using “Likes” data extracted from Facebook, Kosinski et al. (2013) found prediction accuracy to vary significantly across traits, with openness being the easiest to predict. Conversely, Li et al. (2014) analyzed user activity statistics from the Sina Weibo microblog and achieved similar prediction accuracy among all Big 5 Personality traits, and Skowron, Tkalčič, Ferwerda, and Schedl (2016) analyzed language + user features from users of both Twitter and Instagram and found a high prediction accuracy for conscientiousness, but a relatively low prediction accuracy for agreeableness. Even though many studies have been conducted on the subject, this area of psychological research is still quite young, which in part explains the reason for the lack of uniformity in the employed research methods. For example, studies vary largely on sample sizes, type of digital footprints analyzed, and social media platform used for data collection. Given these circumstances with psychological research conducted on social media, there is a need to synthesize and summarize the existing literature in order to evaluate their accuracy, and recommend best methods for personality prediction from social media.
The ability to use digital footprints to accurately predict personality traits may represent a rapid, cost-effective alternative to surveys and reach larger populations, which can be beneficial for academic, health-related, and commercial purposes. With respect to academic research, the development of automated procedures to measure personality would permit to reach larger samples, and obtain measures potentially less prone to social-desirability bias. Furthermore, personality traits have also been shown to act as potential risk and protective factors for many health-related outcomes (Booth-Kewley and Vickers, 1994, Raynor and Levine, 2009, Widiger and Oltmanns, 2017), and to influence beliefs about health (e.g., Hill & Gick, 2011). Therefore, the ability to distinguish online users based on their personality profiles could be leveraged in order to tailor techniques aimed at improving the efficacy of health related messages (Gale et al., 2015, Lawson et al., 2007, Neeme et al., 2015, Rimer and Kreuter, 2006) and individual interventions (Chapman et al., 2014, Franks et al., 2009) directed at online populations, and thus assist in the effective implementation of public health policies (Chapman et al., 2011, Hengartner et al., 2016). With respect to commercial applications, knowledge about individuals' personalities can allow for the enhancement and personalization of recommender systems in order to improve user experiences (Bachrach et al., 2012, Farnadi et al., 2016). Also, social media sites, online advertisers, e-commerce retailers, and e-learning websites may be tailored based on individual personality and present information in ways that will be better received by users (Bachrach et al., 2012, Gao et al., 2013, Golbeck et al., 2011, Kosinski et al., 2013, Markovikj et al., 2013).
The aim of the current study is to conduct a series of meta-analyses to estimate the mean predictive value of digital footprints on each of the Big 5 Personality Traits. Further, we aim to study if the use of different types of digital footprints influence the accuracy of personality prediction, and if data from different social media platforms lead to different results. Lastly, we will check for possible bias in effect size estimates due to study quality.
Section snippets
Literature search
To identify relevant studies on the relationships between Big 5 personality traits and digital footprints, we followed the literature search strategies proposed by Durlak and Lipsey (1991). We conducted a broad literature search in databases from various disciplines; i.e., Scopus, ISI Web of Science, Pubmed, and Proquest, using multiple groups of keywords. The first group of keywords used referred to social media platforms, namely; myspace, facebook, instagram, twitter, youtube, photobucket,
Overview of included studies
In total, we identified 24 papers focusing on the analysis of digital footprints extracted from social media and Big 5 Personality traits. Selected papers included 28 studies in which Big 5 personality traits were assessed using versions of the Big 5 Personality Inventory and IPIP measures. 19 studies obtained their samples from Facebook, 5 from Twitter, 3 from the Sina Weibo micro-blogging site, and 1 article used a combined sample from Instagram and Twitter. Twenty studies analyzed a single
Discussion
To our knowledge, this is the first meta-analysis aimed at summarizing findings from studies investigating the predictability of Big 5 personality traits based on digital footprints automatically extracted from social media. Our first aim was to estimate the mean predictive value of digital footprints over each trait. Overall, prediction of Big 5 traits based on the analysis of digital footprints from social media ranged from 0.29 (agreeableness) to 0.40 (extraversion), with no significant
Limitations of the study
The present study is not without limitations. First, given the relatively low number of studies investigating diverse social media platforms and the heterogeneity of both the features analyzed and the analytical approaches employed in the studies included in the analysis, we could not perform a thorough comparison of the accuracy of personality prediction across specific social media platforms. The diverse usage, or activities, users partake in while engaging in specific types of social media
Conclusions
Overall, the present meta-analysis demonstrates that Big 5 personality traits can be inferred using digital footprints extracted from social media with remarkable accuracy. The ability to make distinct but similarly accurate predictions of Big 5 traits allows for the identification of social media users with different personality profiles. This information is of utmost relevance since it can be beneficial for research, commercial, and public health purposes. First, the ability to assess
References (101)
- et al.
Social network use and personality
Computers in Human Behavior
(2010) - et al.
Extraversion, neuroticism, attachment style and fear of missing out as predictors of social media use and addiction
Personality and Individual Differences
(2017) - et al.
The Big Five and enduring marriages
Journal of Research in Personality
(2004) - et al.
The diagnostic odds ratio: A single indicator of test performance
Journal of Clinical Epidemiology
(2003) - et al.
The international personality item pool and the future of public domain personality measures
Journal of Research in Personality
(2006) - et al.
Demographic variables and personality: The effects of gender, age, education, and ethnic/racial status on self-descriptions of personality attributes
Personality and Individual Differences
(1998) - et al.
A very brief measure of the Big-Five personality domains
Journal of Research in Personality
(2003) - et al.
Deep learning for visual understanding: A review
Neurocomputing
(2016) - et al.
Big 5 correlates of three measures of subjective well-being
Personality and Individual Differences
(2003) - et al.
Big Five personality traits may inform public health policy and preventive medicine: Evidence from a cross-sectional and a prospective longitudinal epidemiologic study in a Swiss community
Journal of Psychosomatic Research
(2016)
The big five and cervical screening barriers: Evidence for the influence of conscientiousness, extraversion and openness
Personality and Individual Differences
Role of the Big Five personality traits in predicting college students' academic motivation and achievement
Learning and Individual Differences
Using big data as a window into consumers' psychology
Current Opinion in Behavioral Sciences
You are what you tweet: Personality expression and perception on Twitter
Journal of Research in Personality
The file drawer problem and tolerance for null results
Psychological Bulletin
Self-presentation and belonging on Facebook: How personality influences social media use and motivations
Personality and Individual Differences
The Big Five personality traits and the life course: A 45-year longitudinal study
Journal of Research in Personality
Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis
Journal of Clinical Epidemiology
Who attains social status? Effects of personality and physical attractiveness in social groups
Journal of Personality and Social Psychology
Twitter by the numbers: Stats, demographics & fun facts
Personality and patterns of Facebook usage
Facebook profiles reflect actual personality, not self-idealization
Psychological Science
You should go check Facebook's new privacy settings
Wired
Operating characteristics of a rank correlation test for publication bias
Biometrics
Associations between major domains of personality and health behavior
Journal of Personality
Meta-regression
Workplace harassment from the victim's perspective: A theoretical model and meta-analysis
The great British Brexit robbery: How our democracy was hijacked
Automatic personality and interaction style recognition from Facebook profile pictures
Personality-informed interventions for healthy aging: Conclusions from a National Institute on Aging work group
Developmental Psychology
Personality and longevity: Knowns, unknowns, and implications for public health and personalized medicine
Journal of Aging Research
Cisco visual networking index: Global mobile data traffic forecast update, 2016–2021
Data firm says ‘secret sauce’ aided trump: Many scoff
Social media as a measurement tool of depression in populations
Characterizing and predicting postpartum depression from shared Facebook data
Personality, family history, and competence in early adult romantic relationships
Journal of Personality and Social Psychology
A practitioner's guide to meta-analysis
American Journal of Community Psychology
Trim and fill: A simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis
Biometrics
Psychological language on Twitter predicts county-level heart disease mortality
Psychological Science
Computational personality recognition in social media
User Modeling and User-Adapted Interaction
Five factor model personality factors moderated the effects of an intervention to enhance chronic disease management self-efficacy
British Journal of Health Psychology
Conducting quantitative synthesis when comparing medical interventions
Cognitive ability and personality as predictors of participation in a national colorectal cancer screening programme: The English Longitudinal Study of Ageing
Journal of Epidemiology and Community Health
Improving user profile with personality traits predicted from social media content
Personality and political attitudes: Relationships across issue domains and political contexts
American Political Science Review
Predicting personality from social media text
AIS Transactions on Replication Research
Predicting personality with social media
Manifestations of personality in online social networks: Self-reported Facebook-related behaviors and observable profile information
Cyberpsychology, Behavior, and Social Networking
“Incivility, social undermining, bullying… oh my!”: A call to reconcile constructs within workplace aggression research
Journal of Organizational Behavior
Meta-analysis: Cumulating research findings across studies
Cited by (289)
Cybervetting of organizational citizenship behavior Expectations: Profile summary as a Key in LinkedIn-based assessments
2024, Computers in Human BehaviorFlip the tweet – the two-sided coin of entrepreneurial empathy and its ambiguous influence on new product development
2024, Journal of Business VenturingThe generalizability of machine learning models of personality across two text domains
2024, Personality and Individual Differences