Introduction

Within electronic markets more and more recommendation systems are employed in order to improve the preselection of available products and services (Adomavicius and Tuzhilin 2005). Determining a user’s preferences is an important condition for effectively running these automatic recommendation systems (Xiao and Benbasat 2007). Personality theorists claim that a user’s personality traits have a substantial influence on preferences and subsequently on behaviour. The human personality significantly influences the way people think, feel and, especially, behave (Barrick and Mount 1991; Judge et al. 1999). Personality traits are defined as “endogenous, stable, hierarchically structured basic dispositions governed by biological factors such as genes and brain structures” (Romero et al. 2009, p. 535). These traits remain quite stable over the entire lifetime and through varying situations (Costa and McCrae 1992; Romero et al. 2009), and that is why a user’s personality is a good starting point for predicting user behavior – especially in electronic markets where digitized information for mining a user’s personality is frequently available (e.g., Blachnio et al. 2013; Kosinski et al. 2014). Everyday, people load hundreds of millions of photos to Facebook, and write messages, publish interests, activities and wall postings. Simultaneously hundreds of millions of tweets are published daily on Twitter etc. (Buettner and Buettner 2016).

Information for mining a user’s personality is largely available in online social networks (OSN) such as Facebook (Ortigosa et al. 2011, 2014), LinkedIn (Faliagka et al. 2012a, 2012b, 2014) or Renren (Bai et al. 2012). In order to exploit the knowledge nuggets in OSNs in terms of predicting a user’s personality and subsequently product preferences in electronic markets, I propose a framework which comprises retrieving the personality relevant OSN-features, personality-predicting and product recommendation. My personality-based product recommender (PBPR) framework is based on the conceptional framework for social media application development originally introduced by Ngai et al. (2015a) where personality (traits) theory offers a well established basis for application development. Ngai et al. (2015b) emphasized that “personality traits are often taken to be one of the fundamental theories explaining the characteristics affecting users’ subsequent behavior” (p. 34).

With the PBRS framework I contribute to theory-based IT-artefacts incorporating big data and social media analytics in electronic markets and can significantly help businesses in the electronic markets to create added value. The most important contributions from this work are:

1.:

Proposing and evaluating a personality-based framework analyzing social networks for product recommendation.

2.:

Based on a systematic literature review, the stable and substantial relationships between specific online social networks indicators and the big five personality traits are presented.

3.:

The Personality Prediction Engine outperforms prior approaches in terms of personality completeness and random control in terms of accuracy, sensitivity, specificity, precision and negative predictive value.

4.:

The Product Recommender Engine substantially outperforms the random control group in terms of accuracy (minimized error score).

The paper is organized as follows: Next I present the research methodology before providing an overview of the research background including personality-based consumer behavior in (electronic) markets, personality mining initiatives and personality-based product recommender systems. After that the personality-mining based product recommender framework is proposed, before I present the evaluation results of its instantiation. After that, the discussion of the results is shown, before I conclude with limitations and future research.

Methodology

Design science methodology (cf. Hevner et al.2004) is used to develop the PBRS framework as the IT-artefact. The IT-artefact is based on two theories: (a) the Five Factor personality theory of Goldberg (1990) and Costa and McCrae (1992) and (b) the product personality – human personality congruency theory by Govers and Schoormans (2005). These established theories will be used to “make the building process more disciplined, rigorous, and transparent” (Hevner and Chatterjee 2010, p. 56). Both of the theories will be explained in detail in the research background section.

The personality prediction part of the IT-artefact will be evaluated using an empirical dataset comprising personality traits and online social network XING indicators – captured through an online questionnaire. The product recommendation part of the IT-artefact will be evaluated by empirical data comprising personality traits and coffeemaker preferences which were recorded during a laboratory experiment.

Research background

Personality and consumer behavior in (electronic) markets

Marketing researchers have long analyzed the impact of the human personality on product preferences and buying decisions. These scholars found substantial correlations between personality traits and preferred products such as mouthwash, alcoholic drinks, automobiles etc. (Kassarjian 1971). Grubb and Grathwohl summarized in their review that prior research “demonstrate the existence of some relationship between personality of the consumers and the products they consume” (Grubb and Grathwohl 1967, p. 23). Kassarjian (1971) came to the same conclusion in his review.

Despite psychologists’ and marketing researchers’ insights into the significant impact of personality on (consumer) behavior (e.g., Grubb and Grathwohl1967; Kassarjian1971; Barrick and Mount1991), IT/IS research for a long time pretty much ignored this factor (Wang et al. 2012c). However, recent IT/IS research has turned towards personality as a potential predictor of IT usage patterns (Devaraj et al. 2008; McElroy et al. 2007; Junglas et al. 2008; Venkatesh and Windeler 2012). McElroy et al. (2007) directly tested the effect of personality on internet use in general. The results supported the use of personality as an explanatory factor finding that a meaningful part of the variance in IS use can be explained by the Big Five personality traits. Devaraj et al. (2008) demonstrated the potential utility of incorporating personality into IT/IS research in the context of technology acceptance and use and Wang et al. (2012c) extended the work to the context of IS continuance. Junglas et al. (2008) revealed the important role of personality traits in perceptions of privacy to explain behavioral intentions towards adopting location based IT-services. Venkatesh and Windeler (2012) analyzed the impact of the FFM on team technology use and found a positive influence of Agreeableness, Conscientiousness, Extraversion, and Openness to Experience on technology use.

While empirical oriented IS/IT scholars have acknowledged the significant impact of personality on the anticipated (consumer) behavior in electronic markets, personality theory-based IT-artefacts are still largely absent. However, implementing such IT-artefacts in electronic markets could be very useful for all market participants since behavioral uncertainties and transaction costs could be reduced leading to higher market efficiency and avoiding market failure (Arrow 1969; Akerlof 1970). This additional behavioral information could be used to substantially change or fine tune supply and demand in electronic markets, e.g., by bundling or price-tuning.

Prior research on electronic markets has shown that incorporating personal data into electronic market mechanisms is very useful – for example for customer acquisition (Kazienko et al. 2013) or better pricing (Rayna et al. 2015). The conceptional framework by Ngai et al. (2015a) opens the way to develop social network applications for electronic markets based on a user’s personality as a well established theoretical basis for assessing consumer behavior. Ngai et al. also emphasized that “personality traits are often taken to be one of the fundamental theories explaining the characteristics affecting users’ subsequent behavior” (Ngai et al. 2015b, p. 34). In addition, IS scholars pointed to the opportunity of analyzing large volumes of big data in order to improve knowledge about partners in electronic markets (Alt and Klein 2011; Alt and Zimmermann 2014; Akter and Wamba 2016), which opens up a wide range of customer relationship management applications in electronic markets (Ngai et al. 2009).

Determining a user’s personality and mining initiatives

The most commonly used model to describe personality is the Five Factor Model (FFM) of Goldberg (1990) and Costa and McCrae (1992), which describes and measures human personality as a result of mainly biological-determined “basic tendencies”: Openness to Experience, Conscientiousness, Extraversion, Agreeableness and Neuroticism commonly known as the Big Five (Costa and McCrae 1992). A user’s personality was traditionally captured by questionnaires such as the Big Five Inventory (BFI, John et al.1991). However, during the last years IS scholars and psychologists found evidence by mining a user’s personality directly from social networks.

Three research groups have mainly worked on social network based personality mining: Ortigosa et al. (2011, 2014) on Facebook, Faliagka et al. (2012a, 2012b, 2014) on LinkedIn, and Bai et al. (2012) on Renren. Mining Facebook data, Ortigosa et al. (2011, 2014) predicted the personality trait neuroticism at an accuracy above 63 percent (classification trees, J48, C4.5 algorithm). As a result of the comparison of different techniques they emphasized that classification trees achieved the best results (Ortigosa et al. 2011, p. 565). Faliagka et al. (2012a, 2012b, 2014) also achieved only moderate results through the use of linear regression, regression trees (M5) and support vector machines in order to analyze LinkedIn data. In line with this result, Bai et al. (2012) also reported that they tested “many classification algorithms such as Naive Bayesion (NB), Support Vector Machine (SVM), Decision Tree and so on” (Bai et al. 2012, p. 5). By considering only the two extreme personality cases (no middle group), within their Renren analysis they reached a two class classification accuracy of above 69 percent. In addition to these research groups, Kosinski et al. (2013) showed correlations between Facebook Likes and specific personal attributes such as personality traits and also presented a simple linear model for personality prediction (Kosinski et al. 2014).

Personality-based product recommender systems

Recommender systems have gained a lot of attention since the advent of the internet. Previous designs for recommender systems have mainly focused on user preference information (e.g., user rating), content-based information (e.g., item prices) and collaborative information (e.g., recommendation of friends). Personality as a main driver of buying behavior has been largely neglected. However, very recent research on recommender services has been interested in personality-based approaches. For example, Rana and Jain (2015) emphasized this potential in their current overview (“personality attributes ... could then be implemented in recommender system[s]” (Rana and Jain 2015, p. 143)). Concerning the use of personality information in recommender systems, Cantador and Fernández-Tobías (2014) states that “there is plenty of room for alternative, more sophisticated methods” (Cantador and Fernández-Tobías 2014, p. 43).

In fact, a few researchers have initially sketched personality-based approaches: For instance, Hu and Pu (2010) proposed a general method that infers a user’s music preferences in terms of their personalities. Wu et al. (2013) presented a strategy that explicitly embeds a user’s personality – as a moderating factor – to adjust the item’s degree of diversity within multiple recommendations. Fernández-Tobías and Cantador (2015) presented a study comparing collaborative filtering methods enhanced with user personality traits and showed that incorporating personality information facilitates improvement in the accuracy of recommendations. Hu and Pu (2011) aimed to address the so-called cold-start problem by incorporating a user’s personality into the collaborative filtering framework. The cold-start problem refers to the dilemma of recommending a product without any information basis.

As stated above, the relationship between personality and consumer behavior is not new. Many decades ago marketing scholars found substantial correlations between personality traits and preferred products such as mouthwash, alcoholic drinks, automobiles etc. (Kassarjian 1971). But nowadays it is possible to predict a user’s personality from large social network data. That is why mining a user’s personality seems to be very fruitful for designing future recommender systems. Consequently new business opportunities towards personality-based recommender systems when analyzing social network footprints are possible.

Combining personality mining and personality-based product recommendation will substantial improve electronic markets – which will be addressed in the following.

A personality-mining based product recommender framework

I applied the framework of Ngai et al. (2015a) towards a personality-mining based product recommender framework (see Fig. 1).

Fig. 1
figure 1

Framework for social media application development introduced by Ngai et al. (2015a, p. 790)

The personality-mining based recommender framework consists of three engines which comprise of retrieving the personality relevant online social network features, the prediction of the user’s personality and the product recommendation (Fig. 2).

Fig. 2
figure 2

The personality-mining based product recommender framework

In its essence the proposed framework (IT-artefact) uses personality traits theory to predict user preferences from trait-induced social media data traces.

Retrieval & transformation engine

The Retrieval & Transformation Engine retrieves the specific online social network indicators from various social networks (see Tables 1 and 2) and transforms the data to standardized vectors. Every social network offers a specific application programming interface for information retrieval (e.g., Twitter API). Additionally or if no API is available data can be retrieved by the use of public search engines such as Google’s X-Ray Search Engine.

Table 1 Social networks containing personality-relevant indicators
Table 2 Stable and substantial relationships between online social networks indicators and big five traits. (+)+ (very) positive correlation, (−)− (very) negative correlation

Personality prediction engine

Ngai et al. (2015a, 2015b) proposed using the personality (traits) theory as a well established basis for social network application development. The human personality is characterized and measured through personality traits, which are defined as “endogenous, stable, hierarchically structured basic dispositions governed by biological factors such as genes and brain structures” (Romero et al. 2009, p. 535). These traits remain quite stable over an entire lifetime and through varying situations (Costa and McCrae 1992; Romero et al. 2009). Personality significantly influences the way people think, feel and, especially, behave (e.g., Barrick and Mount1991; Judge et al.1999). Because of its significant impact on behavior, there are several models for capturing personality, the most important theories relating to which are the psychoanalytical personality theory of Sigmund Freud, the personality theory of C. G. Jung, the personality theory of Carl Rogers and the Three Factor Theory of Hans J. Eysenck. The most commonly used model to describe personality is the Five Factor Model (FFM) of Goldberg (1990) and Costa and McCrae (1992), which is also seen as a state-of-the-art measuring model for personality (Barrick and Mount 1991; Gosling et al. 2003; Judge et al. 1999; McCrae and Costa 1999; Romero et al. 2009). The FFM states and measures human personality as a result of mainly biological-determined “basic tendencies”: Openness to Experience, Conscientiousness, Extraversion, Agreeableness and Neuroticism commonly known as the Big Five (Costa and McCrae 1992). The corresponding “Five Factor Theory on Personality” (FFT) uses the Big Five to explain a significant part of human behavior (Costa and McCrae 1992) and has been successfully applied to various research domains. Barrick and Mount (1991), for example predict job performance by means of the Big Five, while Judge et al. (1999) explain career success with reference to the Big Five.

Researchers found relationships between online social network usage and a user’s personality. The early work by Rosengren (1974) had previously referred to the relationship between individual and social characteristics and the use of mass media. Eventually his paradigm was also widely confirmed as relevant for modern social (mass) media. Besides the strong focus on a user’s personality, a lot of research exists concerning other personality-related constructs in a broader sense, such as user preferences and attitudes (e.g., research on self-disclosure in online social networks (Krasnova et al. 2010)).

However, focusing on personality in its narrower definition, the relevant research on social media dates from the last few years: As several scholars have examined the influence of personality on the use of online social media, personality is deemed to be a predictor of the social media use of a person. There are many papers which cover the relationship between social media usage and different personality traits (e.g., the Big Five, narcissism, and self-esteem). Quite stable relationships were found between the FFM based personality traits and some specific social media features/data:

Extraverted people have a higher need for social affiliation/personal communication (Costa and McCrae 1992), for strategic self-presentation (Seidman 2013; Krämer and Winter 2008) and as a result they have more satisfying/stable friendships (McCrae and Costa 1999) than introverts. Extraverts are more likely to use social media in general (Correa et al. 2010; Gosling et al. 2011; Hughes et al. 2012; Lin et al. 2012; Ryan and Xenos 2011). Researchers found positive relationships between extraversion and the number of contacts (e.g., Aharony2013; Amichai-Hamburger and Vinitzky2010; Gosling et al.2011; Hall and Pennington2013; Ivcevic and Ambady2012; Martin et al.2012, Moore and McElroy2012, Tazghini and Siedlecki2013; Wang et al.2012b; Winter et al.2014), the number of pictures posted (Gosling et al. 2011; Muscanell and Guadagno 2012), the number of status updates (Garcia and Sikström 2014), and the usage frequency (Michikyan et al. 2014).

People who have lower Neuroticism values are high in self-esteem and have less pessimistic attitudes than those who have higher Neuroticism values (McCrae and Costa 1999). Because they feel less isolated and experience less psychological distress (Costa and McCrae 1992), emotionally stable individuals who have lower Neuroticism values are less likely to use social media at all (Correa et al. 2010; Hughes et al. 2012). The usage intensity is also found to be positively correlated with Neuroticism. Individuals with low Neuroticism values spend less time on social media (Moore and McElroy 2012; Ryan and Xenos 2011), update their status less often (Wang et al. 2012b), belong to fewer groups (Skues et al. 2012) and are less addicted to social media usage (Karl et al. 2010).

People who are high in Openness to Experience have broad interests and seek novelty (McCrae and Costa 1999). Therefore, Openness to Experience is regarded as correlating positively with social media use (Amichai-Hamburger and Vinitzky 2010; Correa et al. 2010; Hughes et al. 2012). Individuals who score high on Openness to Experience also show higher social media usage intensity. They spend more time on social media (Skues et al. 2012), have more friends (Gosling et al. 2011; Skues et al. 2012), play more games (Wang et al. 2012b) and are more active (Ross et al. 2009) than individuals low on Openness to Experience.

Conscientious people make long-term plans, are diligent and have organized support networks (McCrae and Costa 1999). Social media could be seen as a sort of distraction for conscientious people (Hughes et al. 2012), but there are contradictory findings on the relationship between Conscientiousness and social media usage. Conscientious individuals are less likely to use social media (Ryan and Xenos 2011) and also spend less time on social media (Gosling et al. 2011; Ryan and Xenos 2011; Wilson et al. 2010).

Agreeable people are friendly, kind, sympathetic and warm (Costa and McCrae 1992) and have a tendency to be trusting, sympathetic, and cooperative (Amichai-Hamburger and Vinitzky 2010). Individuals high on Agreeableness have more pictures on their social media profile (Ivcevic and Ambady 2012), give more information about their activities and interests (Ivcevic and Ambady 2012; Wang 2013), view their own and other’s pages more often (Gosling et al. 2011), have more posts from their friends on their wall (Ivcevic and Ambady 2012) and often comment on social networking sites (Wang et al. 2012b). On the other hand, individuals high on Agreeableness use fewer page features (Amichai-Hamburger and Vinitzky 2010), have fewer back-and-forth conversations (Ivcevic and Ambady 2013) and are less likely to become addicted to social media (Karl et al. 2010).

In summary a lot of weaker and stronger correlations between online social network features and a user’s personality were found. However, in order to predict a user’s personality effectively it is good to know which OSN-features are the most predictive. Based on an extensive literature review,Footnote 1 and capturing personality-based social network related work, the stable and substantial relationships between specific online social networks indicators and the big five personality traits are summarized in Table 2.

The Personality Prediction Engine uses machine learning approaches in order to predict a user’s personality. The digital footprints of humans in online social networks contain substantial information for accurately predicting a wide range of personal attributes including personality traits. For example, Kosinski et al. (2013) showed correlations between Facebook Likes and specific personal attributes such as personality traits. As presented above such correlations between specific OSN-features and personality traits were also found in other social networks (see also Table 2). The Personality Prediction Engine uses these correlations (i.e. the specific online social networks indicators) to predict a user’s personality (cf. Ortigosa et al.2011, 2014; Bai et al.2012; Faliagka et al.2014; Kosinski et al.2014).

Product recommender engine

Based on the user’s personality the Product Recommender Engine offers suitable products (or services) to the user. This engine uses the relationships between the personality-based consumer preferences and the product’s characteristics (cf. product personality – human personality congruence by Govers and Schoormans (2005)). Consumer products not only have a functional utility but also a symbolic meaning (Wells et al. 1957). This symbolic meaning that refers to the product itself, and is described with human personality characteristics, forms the product personality (Govers and Schoormans 2005).

Products can also be seen as symbols by which people convey something about themselves to themselves (self-concept) and to others (Solomon 1983). That part of the symbolic meaning which can be described with human personality characteristics is called product personality (Jordan 1997). Marketing scholars showed that self-congruence is an important factor in directing consumer preferences (Sirgy 1982). Consumers prefer products “with a symbolic meaning that is consistent with their self-concept” (Govers and Schoormans 2005, p. 190).

For example, people scoring high on the personality trait Agreeableness prefer products which can be characterized with agreeable characteristics such as cheerful, relaxed, pretty, or cute and definitely not provocative.

The human personality is typically measured using specific instruments such as the Big Five Inventory (BFI, John et al.1991), its short version (BFI-S, Hahn et al.2012) or the Ten Item Personality Inventory (TIPI, Gosling et al.2003). The product’s characteristics are measured using the Product Personality Scale (PPS, Mugge et al.2009).

Evaluation of the personality-mining based product recommender framework

In order to avoid interferences between the evaluation results of the three engines I will evaluate the engines within the framework separately.

Evaluation of the retrieval & transformation engine

The Retrieval & Transformation Engine connects to various online social networks via their specific Application Programming Interface (API). As described in Table 1, personality-relevant information can be extracted from various online social networks. For instance, Twitter offers an API to retrieve the numbers of tweets/messages (e.g., GET direct_messages(/sent), followers (GET followers/ids), friends (GET friends/ids)). The intersection of followers and friends IDs can be interpreted as (the number of) contacts. (The number of) faux pas (dirty words) within a specific time frame can be analyzed via Twitter’s Search API in conjunction with R’s text mining package.

Facebook also offers a powerful API for getting e.g. the number of contacts (Friend List) or wall postings (GET feed, GET posts), etc.

In addition career-oriented social network sites such as LinkedIn or XING implemented feature-rich APIs. For example, with the XING API it is possible to retrieve user profiles (GET /v1/users/:id) including the user’s profile photo, employment status, language skills etc. It is also possible to get the list of groups the given user belongs to (GET /v1/users/:user_id/groups), to retrieve messages (GET /v1/users/:user_id/conversations), or the (number of) contacts (GET /v1/users/:user_id/contacts) etc.

Besides the powerful data access via these APIs it must be noted that the social network operators usually restricts its access by rate and/or time limits. In addition API standards changes regularly. That is why retrieving data by the use of public search engines such as Google’s X-Ray Search Engine is also an alternative if an API is not available.

Before loading the data into the Personality Prediction Engine all features will be normalized to [0;1].

Instantiation and evaluation of the personality prediction engine

Next the instantiation and evaluation of the Personality Prediction Engine on the basis of a XING dataset will be presented. XING is an important career-oriented online social network site in Europe. In order to preserve data privacy (cf. Spiekermann and Acquisti2015) during the evaluation of the Personality Prediction Engine I did not grab OSN-features directly, but asked participants to knowingly provide this specific information.

Description of the empirical XING dataset and sample quality

Working professionals who studied extra-occupationally at our university were recruited. The participants were asked electronically to take part in a survey concerning social networks. The call for participation was sent out with a link to the online questionnaire via our Germany-wide university. Please note that our university specializes in extra-occupational MBA and Bachelor students who all have working experience.

Based on the most predictive personality indicators, presented in Table 2, I used those XING features which are readable by the participants on their own. The resulting features are shown in Table 3.

Table 3 Measured items for XING usage features

The personality traits were captured with the Ten Item Personality Inventory (TIPI) from Gosling et al. (2003) using a 5-point Likert scale ( r T I P I = 0.72) and normalized to [0,1]. Finally, demographics (gender and age) were requested.

Sample quality

760 completed questionnaires were received. Participants comprised 395 individuals ( ∼52 %) with a personal XING-profile and 365 ( ∼48 %) without any profile or activity on XING. Since I aim to evaluate the personality prediction engine, i.e., the possibility of predicting a user’s personality from social media data, I only use the 395 participants who have a XING-profile within the analysis. From these 395 individuals who have a XING-profile, 189 ( ∼48 %) were female, 206 ( ∼52 %) male. The age pattern was as follows: 4 of the questioned participants ( ∼1.0 %) were below 20 years old; 259 participants ( ∼65.6 %), the majority, between the ages of 21 and 30; 92 participants ( ∼23.3 %) between 31 and 40; 32 participants ( ∼8.1 %) between 41 and 50; 7 participants ( ∼1.8 %) between 51 and 60 and finally 1 participant ( ∼0.3 %) 61 or older. 45 ( ∼11.4 %) of the 395 XING-users are active daily-users of the platform. 98 ( 24.8 %) are using it on a weekly basis, 74 ( ∼18.7 %) use XING several times per month, 154 ( ∼40.0 %) at least once a month and 24 ( ∼6.1 %) never use this social network.

Compared to the personality traits of the general population I observed similar trait patterns by gender, but I found slightly higher conscientiousness and extraversion values in my sample (Table 4).

Table 4 Comparison of the [0;1]-normalized TIPI results and my sample by gender

Evaluation results

The R x64 3.2.2 environment (Core Team 2015) for machine learning analyses running on a 128 GB RAM HP Z840 Workstation was used.

In a first step it is necessary to analyze the relationships between the Big Five personality traits and the specific XING usage features, which can be found in Table 5.

Table 5 Significant Spearman-Rho correlations between Big Five traits and XING features ( 1 p<0.05, 2 p<0.01, 3 p<0.001; n =395)

The positive relationship discovered between openness and I 17 (XING premium membership) was not directly investigated before, but positive relationships between novel Facebook features and openness were coherently found (e.g., Hughes et al.2012; Skues et al.2012). The positive relationship between openness and I 20 (number of contacts) was not found on online social networks but it was found for offline networks (e.g., Lang et al.1998).

The positive relationship between conscientiousness and the number of contacts was also found by Amichai-Hamburger and Vinitzky (2010) on Facebook and offline between conscientiousness and network centrality (Liu and Ipe 2010). Correlations between conscientiousness and both I 7 (advantageous offers) and I 22 (page views from others) have not been evaluated on online social networks before. However, the latter result is in line with prior research (e.g., Amichai-Hamburger and Vinitzky2010; Liu and Ipe2010) revealing that conscientiousness people tend to have more contacts and potentially more people clicking on their profile page. The former result confirms the general negative relationship between conscientiousness and compulsive buying (Wang and Yang 2008).

I found also significant positive correlations between extraversion and XING usage at all ( I 1), which confirms the findings of Correa et al. (2010); Jenkins-Guarnieri et al. (2012). The positive relationship between extraversion and various XING features ( I 4, I 6, I 15, I 17) is also known from other online social networks (e.g., Moore and McElroy2012; Gosling et al.2011; Ryan and Xenos2011; Martin et al.2012). One of the best replicated findings concerns the positive relationship between extraversion and the number of contacts (I 20, e.g., Amichai-Hamburger and Vinitzky2010; Moore and McElroy2012; Thalmayer et al.2011; Pollet et al.2011). The correlation between extraversion and I 22 (page views from others) has not been directly evaluated on online social networks before, but this result can be explained by the fact that people scoring high on extraversion tend to have more (online) social contacts which enlarges the pool of people potentially clicking on their profile page.

The negative relationships found between agreeableness and some XING profile-related information fields (I 10, I 11, I 15) were also found in other online social networks. For example, Amichai-Hamburger and Vinitzky (2010) found a negative relationship between agreeableness and the uploading of personal information on Facebook. In addition, the negative relationships between agreeableness and XING groups (I 19, I 21) were also already found by Gosling et al. (2011) on Facebook.

What is surprising is the negative correlation between neuroticism and XING usage intensity (I 1, I 4, I 12, I 15, I 20, I 21). People scoring high on neuroticism are low in self-esteem and have more pessimistic attitudes than those who are emotionally stable (McCrae and Costa 1999). Because they feel more isolated and experience more psychological distress (Costa and McCrae 1992), neurotic individuals are more likely to use social media in general (Correa et al. 2010; Hughes et al. 2012). The usage intensity is also found to be positively correlated with neuroticism. Neurotic individuals spend more time on social media (Moore and McElroy 2012; Ryan and Xenos 2011), update their status more often (Wang et al. 2012b), belong to more groups (Skues et al. 2012) and are more addicted to social media usage (Karl et al. 2010). That is why research largely suggests neuroticism to be a positive predictive factor for social media usage and intensity (Correa et al. 2010; Amichai-Hamburger and Vinitzky 2010; Hughes et al. 2012). However, the negative correlations found between neuroticism and XING usage intensity in my study may be explained by the fact that XING is a career-oriented social networking site mainly used for business and job search purposes and not for private-oriented issues such as building and maintaining friendships (Buettner 2016a). Since the prior neuroticism-related investigations were only concerned with private-oriented online social networks (Facebook, MySpace, etc.) future research should investigate the role of usage purpose (business or private).

As shown in Table 5 only a few significant correlations between specific XING features and the personality traits can be found. All of them are weak, cf. Cohen (1988).

However, a critical mass of weak relationships could have a good level of predictive power. That is why I applied machine learning algorithms for personality trait prediction. Based on the TIPI results I built two mean-balanced classes for each personality trait. For machine learning and evaluation purposes I split the n=395 sample in a training partition (n T =261) and an evaluation partition (n E =134).

To evaluate the possibility of predicting a user’s personality from online social network features I applied generalized linear modeling (GLM, Dobson and Barnett (2008)) implemented in the R x64 3.2.2 environment. In this general linear personality model y i =β 0+β 1x 1i +β p x p i +𝜖 i the personality trait response y i ; i = 1..5 is modelled by a linear function of explanatory social media indicators x j ; j = 1..p plus an error term.

I subsequently evaluated the machine learning outputs in terms of accuracy (ACC), sensitivity (true positive rate, TPR), specificity (SPC), precision (positive predictive value, PPV) and negative predictive value (NPV) as quality criteria. Results are shown in Table 6.

Table 6 Quality criteria of the generalized linear model

Discussion of evaluation results

In line with prior research I found a few significant correlations between specific social media usage features and users’ personality traits (see Table 5). It is also in line with prior research that all of these significant correlations are small. However, despite this small amount of correlations I could predict all of the five personality traits with a predictive gain between 23.2 and 41.8 percent by applying a generalized linear model – which means that in fact the social media platform XING contains fruitful data for personality mining. In addition, my model outperforms prior personality prediction approaches based on linear models such as the work by Kosinski et al. (2013, 2014). Furthermore, the model outperforms in terms of accuracy, specificity, precision and negative predictive value on an average over all of the big five personality traits (see Table 6). In summary I can say that it is in principle possible to comprehensively determine a user’s personality from social media data.

Instantiation and evaluation of the personality-based product recommender engine

In order to test the personality-based Product Recommender Engine, I designed a system which recommends eight coffeemakers and evaluated this recommender system within an experimental setting. The experiment took place in a professional human-computer interaction laboratory. In order to avoid disturbance factors the laboratory room was controlled for lighting conditions and temperature. The lighting conditions were absolutely constant since only artificial light was used and the windows were professionally covered.

Apparatus and test procedure

In order to rigorously evaluate the effectiveness of product recommendations based on the personality-congruency theory I chose a two group design (algorithm group with treatment vs. control group without treatment, between-subject, completely randomized, double-blind, cf. Kirk (2013)). Using G* Power version 3.1.9.2 (Faul et al. 2007) I calculated an a priori sample size of 62 participants (one-tailed, Cohen’s d = 0.85, Cronbach’s α < 0.05) which was subsequently recruited to take part in a laboratory experiment.

Every participant was asked to fill out a personality questionnaire and she was asked to rate coffemakers concerning specific characteristics. The participant’s personality was measured using the short version BFI-S (Hahn et al. 2012) of the Big Five Inventory (BFI, John et al.1991). The coffeemaker characteristics were measured with the product personality scale (PPS) of Mugge et al. (2009) based on the product personality system of Govers and Schoormans (2005). All items were randomly presented.

In a next step the eight coffemakers were presented in a specific ranking order – which was generated by the computer program. For the control group the ranking order was randomized generated. The ranking order for the experimental group follows the product personality congruence idea by Govers and Schoormans (2005) and was based on the minimization of the Euclidean distance between the participant’s personality (BFI-S) and the product personality (PPS) over the three traits of extraversion (E), agreeableness (A) and conscientiousness (C), see formula 1.

$$ \text{Eucl. distance} \!= \!\sqrt{(\text{E}^{\text{BFI-S}}\,-\,\text{E}^{\text{PPS}})^{2} \,+\,(\text{A}^{\text{BFI-S}}\,-\,\text{A}^{\text{PPS}})^{2} \,+\, (\text{C}^{\text{BFI-S}}\,-\,\text{C}^{\text{PPS}})^{2}} $$
(1)

The coffeemaker with the smallest Euclidean distance was presented as rank one, the coffeemaker with the second smallest Euclidean distance as rank two and so on (see Fig. 3).

Fig. 3
figure 3

Ranking order [1 – most recommended, 8 – least recommended]

Next the participants were asked if the ranking order fitted their preferences and they were asked to correct the ranking order if it did not fit by moving the products in the order according the participants’ preferences (see Fig. 4).

Fig. 4
figure 4

Please correct the ranking order by mouse movement if the ranking does not fit your preferences [1 – most recommended, 8 – least recommended]

The allocation of a participant to the control ( n c = 32) or the experimental group ( n e = 30) was randomly and automatically managed by the computer program. In order to avoid any experimenter-expectancy effects neither the participant nor the laboratory assistant knew this allocation (double-blind experiment).

Evaluation results

62 participants (26 female, 36 male) aged from 19 to 61 years (M = 33.8, S.D. = 8.8) took part in the experiment. The algorithm group did not significantly differ from the control group concerning room temperature, age, sex or health status (p > 0.05, see also Table 7).

Table 7 Group characteristics: room temperature, age, sex and health status distribution

Both groups also did not significantly differ concerning their BFI-S evaluation (p > 0.05 for all 15 items, see also Table 8).

Table 8 BFI-S results, normalized to [0;1]

In order to evaluate the power of the personality-based algorithm the error scores for each participants were calculated. For each rank movement between the initially presented ranking order and the corrected/accepted ranking order the error score increased by one unit (see formula 2).

$$ \text{Error score} \,=\, \sum\limits_{\text{n}=1}^{\text{N}}{|\text{p}^{\text{recommended}}_{\text{n}} -\text{p}^{\text{corrected}}_{\text{n}}|} \text{, n \,-\, rank, p \,-\, rank position} $$
(2)

The comparison of the error scores between the algorithm (experimental) group and the control group are shown in Table 9. The boxplots are presented in Fig. 5.

Fig. 5
figure 5

The algorithm group significantly outperforms the control group

Table 9 Comparison of error scores between the algorithm group and the control group

The error scores within the algorithm group are significantly lower compared to the control group (T = 4.48, p < 0.001). The corresponding effect size (Cohen’s d = 1.1) is large (cf. Cohen1988). Moreover, the errors scores are negatively correlated with the participants’ satisfaction of the product recommendation ranking order (r = -0.727, p < 0.001).

Discussion of evaluation results

As shown in Fig. 5 the personality-based product recommendation algorithm substantially outperforms the randomized ranking order. In addition, the participants’ satisfaction with the product recommendation ranking order increased significantly.

These results are interesting and show that it makes sense to use a participant’s personality information for product recommendations.

Discussion

Theoretical implications

From a theoretical point of view I contribute to IS research by proposing a theory-based IT-artefact incorporating big data and social media analytics in electronic markets, which is based on the Five Factor personality theory of Goldberg (1990) and Costa and McCrae (1992) and the product personality – human personality congruency theory by Govers and Schoormans (2005). In addition, the artefact is based on the conceptional framework for social media application development introduced by Ngai et al. (2015a). I used these established theories and the existing framework to “make the [design science] building process more disciplined, rigorous, and transparent” (Hevner and Chatterjee 2010, p. 56).

This artefact may also help to deepen our understanding of personality-driven human behavior within electronic markets. After implementing the artefact, over time we can collect a lot of personality-relevant data and actual buying behavior which can be used to evaluate the product personality congruency theory in more detail which may contribute to marketing research.

Furthermore, this work also contributes to personality research. Psychology scholars found stable relationships between the big five personality traits and various online social networks such as Facebook (Kao and Craigie 2014; Kern et al. 2014), Twitter (Gou et al. 2014; Mohammad and Kiritchenko 2015), YouTube (Biel and Gatica-Perez 2013; Aran et al. 2014), MySpace (Muscanell and Guadagno 2012; Balmaceda et al. 2014), Renren (Yu and Wu 2010; Wang et al. 2012a), or LinkedIn (Faliagka et al. 2012b; Loiacono et al. 2012). I extended this research line to the XING social network, where I also found stable relationships to the big five personality traits.

In addition, through the Personality Prediction Engine I present a way to unobtrusively measure an individual’s personality using non-self reported measures (online social network data). This may also be of interest for personality research. While scholars have already found empirical evidence for predicting a user’s personality from online social network data for Facebook (Ortigosa et al. 2011, 2014; Kosinski et al. 2013, 2014), LinkedIn (Faliagka et al. 2012a, 2012b) and Renren (Bai et al. 2012) I not only demonstrate personality mining for another social network but also present the possibility of comprehensively predicting all of the big five personality traits – rather than just a few of them. For example, when mining Facebook data, Ortigosa et al. (2011, 2014) predicted the personality trait neuroticism at an accuracy above 63 percent. However, when also using a Facebook sample, Kosinski et al. (2014) could only predict the big five personality traits at an accuracy between 5 and 31 percent. Using LinkedIn data, Faliagka et al. (2012a, 2012b) predicted the trait extraversion with an accuracy between 28 and 65 percent. By only considering the two extreme personality cases (no middle group), within their Renren analysis Bai et al. (2012) reached a two class classification accuracy of 70 to 72 percent for every of the big five personality trait. My accuracy levels (62 to 71 percent) are in line with the accuracy levels of Ortigosa et al. (2011, 2014); Faliagka et al. (2012b, (Faliagka et al. 2012a)); Bai et al. (2012) but I am able to predict all of the big five traits at these accuracy levels and not just one trait or extreme personality cases.

With the unique XING dataset a predictive gain between 23.2 and 41.8 percent by applying a generalized linear model for personality trait prediction was reached. These evaluation results of the Personality Prediction Engine show that it is possible to predict a user’s personality comprehensively from online social network data. The Personality Prediction Engine outperforms prior approaches in terms of personality completeness.

Practical implications

From a practical point of view the PBRS framework may be useful for improving product recommendations in electronic markets. While psychology and marketing scholars recognised the importance of the influence of human personality on product preferences (see product personality congruency theory by Govers and Schoormans (2005)), marketing practitioners usually do not have enough information about the individual personality traits of their customers to use it to automatically derive customer preferences. But determining customer preferences is an important condition for effectively running automatic recommendation systems (Adomavicius and Tuzhilin 2005; Xiao and Benbasat 2007). With the PBRS framework I show that it is possible to determine the big five personality traits and subsequently the product preferences from online social network data alone.

The evaluation results for the Product Recommender Engine are interesting. This engine substantially outperforms the random control group in terms of accuracy (minimized error score). The error scores within the algorithm group were significantly lower compared to the control group (T = 4.48, p < 0.001). Moreover, the error scores were negatively correlated with the participants’ satisfaction with the product recommendation ranking order (r = -0.727, p < 0.001). These results are very promising and may significantly help businesses in electronic markets to create added value by improving product recommendations, for example by preselecting available products and services.

Conclusion

I applied the conceptional framework for social media application development originally introduced by Ngai et al. (2015a) through the use of the Five Factor personality theory of Goldberg (1990) and Costa and McCrae (1992) and the product personality – human personality congruency theory by Govers and Schoormans (2005) towards product recommendation. Consequently I proposed a personality-based product recommender framework analyzing large social networks and evaluated it with a unique XING dataset and a unique coffeemaker dataset. The evaluation results are promising for substantially creating added value by improving product recommendations in electronic markets.

Since this framework is built on a fundamental theoretical basis (Five Factor personality theory of Goldberg (1990) and Costa and McCrae (1992), product personality congruency theory by Govers and Schoormans (2005), conceptional framework by Ngai et al. (2015a)) I contribute to theory-based IT-artefacts incorporating big data and social media analytics in electronic markets. In addition I contribute to personality and marketing research.

Limitations and future work

I evaluated the Personality Prediction Engine and the Product Recommender Engine separately in order to avoid interference between the evaluation results of the engines. In addition, I did not use personal data directly retrieved from social networks without the knowledge of those concerned in order to avoid privacy violations. That is why I operated very carefully in terms of preserving data privacy (cf. Spiekermann and Acquisti2015) during the evaluation of the engines. However, the usage of potentially slightly biased self-reported data (i.e., OSN-features and personality traits) for evaluation purposes may be a limitation.

Following the guidelines by Kirk (2013) I observe not only age and sex as typical control variables (Campbell 1957; Wohlwill 1970) but also the participant’s personality, their state of health and the room temperature as experiment specific controls. I did not find any differences between the experimental and the control group (p > 0.05, cf. Tables 7 and 8). Following the argumentation by Chapanis (1967), other unobserved potential factors probably balance each other out, but they may also mutually reinforce each other. Since I did not control for other variables than age, sex, personality, health status and room temperature future work should try to replicate this study. Replication is the most effective means of preventing disturbing influences by uncontrolled/unobserved variables (Kirk 2013).

In future work I will systematically evaluate other machine learning approaches such as tree based models for the Personality Prediction Engine by applying Max Kuhn’s caret package. Furthermore, I will evaluate additional product personality – human personality congruence measures (cf. Govers and Schoormans2005) as an alternative to the Euclidean distance proposed here (cf. Eq. 1).

The negative relationships revealed between neuroticism and XING usage intensity are probably also a good starting point for future work concerning the role of online social network usage purpose (business or private).

Furthermore, the Personality Prediction Engine can also be used for an evaluation of the applicant’s personality – organizational culture congruency fit during e-recruiting activities in crowdsourcing markets (Buettner 2014; 2015).

Last but not least, future work should apply the proposed framework to various electronic market (structure) settings and concurrently retrieve data from different social networks to improve the Personality Prediction Engine.