New perspectives on gray sheep behavior in E-commerce recommendations

https://doi.org/10.1016/j.jretconser.2019.02.018Get rights and content

Abstract

With the exponential rise in the size of data being generated, personalization based on recommender systems has become an important aspect of digital marketing strategy of E-Commerce companies. Recommender systems also help these companies in cross-selling, up-selling and to increase the customer loyalty. However, presence of certain users, known as gray sheep users, with eccentric taste, minimizes the overall efficiency of the recommender systems. Hence, their identification and removal from the computation system is critical for more efficient recommendations. This work presents psychographic models-based approaches for gray sheep user identification with improved performance. It also studies gray sheep behavior across different domains and contexts, apart from introducing the idea of gray sheep items.

Introduction

The world of digital media is changing at a phenomenal pace (Ryan, 2016), and digital marketing strategies used by the companies to reach their customers demand more focused, interactional, quantifiable and personalized approaches (Adomavicius and Tuzhilin, 2005, Lamberton and Stephen, 2016, Nisar et al., 2018). We are living in the age of data deluge where information overload is an overwhelming problem for consumers during the decision-making process while using online electronic commerce applications for purchases (Yan et al., 2016). For decisions, like which movie to watch, which book to read, which song to hear, or which food to order online, users are flooded with wide range of choices. E-commerce companies too, need to have an inbuilt mechanism to present only the relevant products and services to the customer for better user experience and to increase their overall sales. To solve this ubiquitous problem of information overload, recommender systems (RS) are used, which primarily envisages the level of inclination a user might have towards a particular item that has not been previously used by them (Baier and Stüber, 2010, Lu et al., 2015). Such personalized recommendations also help companies in upselling, cross-selling and increasing overall customer loyalty (Das et al., 2007, Fleder et al., 2010, Hosanagar and Fleder, 2013, Walter et al., 2012). Since the seminal weaving of information tapestry around two and a half decades ago (Goldberg et al., 1992) RS have come a long way in terms of their design and implementation (Bouzekri et al., 2019). However, it continues to be an important area of research attracting the attention of scholars across the academia and industry. This rise can also be traced to their practical significance in implementing digital marketing strategy and proven success in the area of E-commerce and other online systems where information retrieval and filtering is required (Amatriain et al., 2016, Krzywicki et al., 2015, Schafer et al., 1999, Vargas et al., 2016).

Content-based filtering (CBF) and collaborative filtering (CF) are two popular techniques deployed in designing of RS (Valcarce et al., 2018; Wang et al., 2018). CBF makes recommendations to a user by analyzing the attributes of all the items that have been previously rated by that particular user. CF works on the fundamental assumption that “if user X and Y rate some items similarly, or have similar behaviors, they will rate or act on other items similarly” and it is the most successful and widely used approach in RS (Ahn, 2008, Choi and Suh, 2013, Kim and Ahn, 2008).

A typical CF framework involves a two-dimensional user-item matrix, also known as the utility matrix, that consists of ratings given by users to the items. In a real-world scenario, each user generally rates only a fraction of the total items present in the system, hence most of the user-item pairs in the utility matrix remain unrated. From this matrix, a set of similar people or nearest neighbors is found for the user to whom the recommendations are to be made. For neighborhood creation, a wide variety of similarity measures have been postulated among which cosine (Breese et al., 1998), Spearman correlations (Ahn, 2008) and Jaccard coefficient (Al-Shamri, 2014) based similarity are among the most common ones.

Irrespective of the widespread success of CF and its applications in commercially deployed RS (Najafabadi et al., 2017), CF suffers from some of the fundamental challenges, like, data sparsity (Guo et al., 2017, Pan et al., 2010), cold start (Fernández-Tobias et al., 2019, Lika et al., 2014), gray sheep (GS) problem (Ghazanfar and Prügel-Bennett, 2014, Ghorbani and Novin, 2016) and shilling attacks (Tong et al., 2018). Despite the vast size of the utility matrix, most of the elements in it are blank as those user-item pairs are unrated in the real world scenario. This happens because out of the hundreds of thousands of available items, a typical user might have come across and rated a very small number of them. Such sparsity in the available data for computation reduces the quality of recommendations and may also lead to overfitting. Closely related to it, is the cold start problem. It occurs for a new user who has never interacted with the system before, or has rated very few items (Camacho and Alves-Souza, 2018). Lack of any historical data for such users posses a significant challenge to CF in finding their neighbors so as to generate suitable recommendations accordingly.

To overcome the challenge of data sparsity and cold start, one of the most popular approach is the application of the transfer learning (TL) using auxiliary data sources (Pan and Yang, 2010, Pan, 2015). TL is a machine learning paradigm that makes use of knowledge learned in one task in a different but related source domain, to solve a different task in other target domain. This is particularly useful when there is scarcity of fewer high-quality training data (Zhang et al., 2018). In the context of RS, this auxiliary information for TL comes from various sources such as users’ demographic information, personality profiles, psychographic factors, and social media data and some other RS as in the case of cross domain RS.

Another fundamental issue with CF is GS problem. It has been found that CF does not work with equal efficiency for all the users. It is seen that among the entire user set, there are two peculiar kinds of users, white sheep (WS) and GS (Su and Khoshgoftaar, 2009). WS are those users who have high correlation value with many other users and GS are those users who have low correlation coefficient with almost all users. It is not only that the GS users themselves do not benefit much from the system, but also their presence diminishes the quality of the recommendations made to the rest of the users in the system. Hence to ensure better overall efficiency of CF based RS to execute digital marketing strategy, identification and removal of such GS users from the system, is an important issue in CF and attracts researchers’ attention.

The existing approaches for identification of GS utilize only the explicit ratings, which are usually sparse as discussed earlier in this section, to measure the similarity between the users. Relying solely on the ratings ignores the implicit taste of each user that can be learned through the auxiliary sources like product reviews, social network data, and personality profiles (Chen et al., 2015). Hence, existing techniques for GS identification are unable to capture the entire difference in taste and preference of the users. This is the major shortcoming of the existing approaches. To overcome this shortcoming, in the current work, we have explored a TL based approach that uses various models based on the personality of each user, to identify the GS users in the system. Furthermore, to demonstrate the effectiveness of this approach empirically, we have also used the latest advancement in natural language processing (NLP) which makes the techniques more practical and scalable. To the best of our knowledge, this is the first work that envisages the use of personality based user models for GS identification which also empirically demonstrates the comparison of performance of this approach with existing practices as per the available literature.

In recent times, cross domain RS (Zhang et al., 2017) and context aware RS (Panniello et al., 2014) have also been becoming increasingly popular. In cross domain RS, knowledge from one domain, like, movie domain, is used to make recommendations in different domain, like, books domain (Li et al., 2009a). Context aware RS is used to provide real-time personalized recommendations, by utilizing the contextual information about the individual at that moment (Braunhofer et al., 2014, Verbert et al., 2010).

Based on research gaps identified through a comprehensive literature review of RS and their applications, four research gaps have been identified. Although extant literature suggests that the process of GS identification makes use of utility matrix of ratings, psychographic factors have never been modeled to identify GS users. As the derived psychographic factors in RS do not suffer from sparsity problem, the first formal research question (RQ) being investigated in this paper is:

RQ1. Can GS users be identified using psychographic models?

Till now, all the previous studies have focused in GS user identification only in single domain, and without any contextual information. Moreover, consistency of GS behavior across different domains and contexts has remained unexplored, which leads to the following two research questions.

RQ2. Is the GS behavior unique to the individual domain or is consistent across domains?

RQ3. Is the GS behavior unique to a specific context or is consistent across different contexts?

While the previous three research questions are related to GS users, our work also makes an attempt to explore the concept of GS item, whose removal may help in increasing the efficiency of recommendation for the remaining items in the system. This leads to the fourth research question formally stated as:

RQ4. Do items exhibit GS behavior?

Following are the major contribution of this work:

  • i.

    Modeling of psychographic factors for GS identification has been proposed and experimentally compared with existing state of the art technique.

  • ii.

    GS behavior across different domains has been explored and empirically studied.

  • iii.

    Consistency of GS behavior across different contexts has been examined.

  • iv.

    A novel concept of GS item has been introduced and their existence has been empirically demonstrated.

The remainder of the paper is structured in the following manner. In Section 2, we provide the exhaustive literature review of the existing work done in the area of GS users and traditional collaborative filtering. This section also explains the need for personality-based user modeling approach for GS user identification with description about cross domain RS and context aware RS. Section 3 of this paper showcases modeling techniques for GS identification and in Section 4, the experimental work done as part of this work has been described. Section 5 summarizes the results of the experiment and Section 6 presents a detailed discussion along with theoretical contribution and managerial implication of this work. Section 7 contains the conclusion with limitations and future direction of this research.

Section snippets

Related work

In this section first, we discuss the origin and evolution of modern RS and various approaches. Then, we present a detailed literature review of psychographic frameworks used for TL in RS; cross domain RS and context aware RS; GS user problem; and our motivation behind the research question or hypotheses of this research.

Modeling

A typical CF based recommendation approach consists of three distinct parts. First is creation of utility matrix. Second is finding the similar neighbors to the active user for whom the recommendations are to be generated. Third is combining the preferences of the neighborhood users, along with the active user's preferences for predicting the ratings of items that have not been previously rated by the particular active user.

A utility matrix with ratings, denoted by R, is a p × q matrix, with p

Experimentation

In order to empirically study the four research questions identified in Section 1, we conducted four different experiments as described in the following sub-sections.

Results

In this section, we present the results of the experimental work performed as part of this work to answer the four research questions. For the first experiment to answer RQ1, i.e., “Can GS users be identified using psychographic models?”, MAE of the collaborative filtering model was computed based on similarity calculated using equation (ii) for all 1000 users and it is found to be 0.7808378. This acts as the baseline figure for various GS identifying approaches in our work, with each model

Discussion and contribution

In this section, we provide a detailed discussion of the results along with theoretical contributions and managerial implications of this research work.

Identification of GS users is an important area of research and practice in RS aimed at increasing the quality of recommendations to non-GS users in the system constituting the majority of online consumers. This study explores into four research questions related to GS behavior in RS. GS users are users with eccentric taste and preferences, they

Conclusion

RS are essential tools to achieve personalization for implementing digital marketing strategy. In devising strategies for presenting personalized recommendations to each individual user, enabling the managers to cross-sell, up-sell and increase overall customer loyalty, RS provide the essential input in digital marketing. To achieve this, it calls for a high degree of accuracy in making recommendations to the users. Unfortunately, there are users in the system with very poor similarity to the

References (150)

  • F. Colace et al.

    A collaborative user-centered framework for recommending items in online social networks

    Comput. Human. Behav.

    (2015)
  • X. Cui

    In-and extra-role knowledge sharing among information technology professionals: the five-factor model perspective

    Int. J. Inf. Manag.

    (2017)
  • S.C. Grunert et al.

    Values, environmental attitudes, and buying of organic foods

    J. Econ. Psychol.

    (1995)
  • G. Guo et al.

    Resolving data sparsity by multi-type auxiliary implicit feedback for recommender systems

    Knowl.-Based Syst.

    (2017)
  • S. Kamboj et al.

    Examining branding co-creation in brand communities on social media: applying the paradigm of stimulus-organism-response

    Int. J. Inf. Manag.

    (2018)
  • K.J. Kim et al.

    A recommender system using GA K-means clustering in an online shopping market

    Expert Syst. Appl.

    (2008)
  • M.V. Kosti et al.

    Personality, emotional intelligence and work preferences in software engineering: an empirical study

    Inf. Softw. Technol.

    (2014)
  • A. Krzywicki et al.

    Collaborative filtering for people-to-people recommendation in online dating: data analysis and user trial

    Int. J. Hum. Comput. Stud.

    (2015)
  • J. Lee et al.

    Predicting positive user responses to social media advertising: the roles of emotional appeal, informativeness, and creativity

    Int. J. Inf. Manag.

    (2016)
  • B. Lika et al.

    Facing the cold start problem in recommender systems

    Expert Syst. Appl.

    (2014)
  • J. Lu et al.

    Recommender system application developments: a survey

    Decis. Support Syst.

    (2015)
  • A. Mild et al.

    An improved collaborative filtering approach for predicting cross-category purchases based on binary market basket data

    J. Retail. Consum. Serv.

    (2003)
  • N. Misirlis et al.

    Social media metrics and analytics in marketing--S3M: a mapping literature review

    Int. J. Inf. Manag.

    (2018)
  • M.K. Najafabadi et al.

    Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data

    Comput. Hum. Behav.

    (2017)
  • T.M. Nisar et al.

    Sports clubs' use of social media to increase spectator interest

    Int. J. Inf. Manag.

    (2018)
  • Adamopoulos, P., Todri, V., 2015. Personality-based recommendations: evidence from Amazon. com., In: Proceedings of the...
  • G. Adomavicius et al.

    Context-aware recommender systems

  • G. Adomavicius et al.

    Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • Amatriain, X., Basilico, J., Way, A., 2016. Past, present, and future of recommender systems : an industry perspective,...
  • Badenes, H., Bengualid, M.N., Chen, J., Gou, L., Haber, E., Mahmud, J., Nichols, J.W., Pal, A., Schoudt, J., Smith,...
  • R. Bandari et al.

    The pulse of news in social media: forecasting popularity

    Comput. Sci.

    (2012)
  • Basu, C., Hirsh, H., Cohen, W., 1998. Recommendation as classification: Using social and content-based information in...
  • Boratto, L., Carta, S., 2010. State-of-the-art in group recommendation and new approaches for automatic identification...
  • Boratto, L., Spano, L.D., Carta, S., Fenu, G., 2016b. Workshop on engineering human-computer interaction in recommender...
  • R. Boudon

    The Origin of Values: Essays in the Sociology and Philosophy of Beliefs

    (2001)
  • M. Braunhofer et al.

    STS: a context-aware mobile recommender system for places of interest

    CEUR Workshop Proc.

    (2014)
  • Breese, J., Heckerman, D., Kadie, C., 1998. Empirical analysis of predictive algorithms for collaborative filtering,...
  • J.M. Burger

    Personality--Theory and Research

    (1986)
  • Burke, R., Adomavicius, G., Guy, I., Krasnodebski, J., Pizzato, L., Zhang, Y., Abdollahpouri, H., 2017. VAMS 2017:...
  • L.A.G. Camacho et al.

    Social network data to alleviate cold-start in recommender system: a systematic review

    Inf. Process. Manag.

    (2018)
  • I. Cantador et al.

    A multilayer ontology-based hybrid recommendation model

    AI Commun.

    (2008)
  • Cantador, I., Cremonesi, P., 2014. Tutorial on cross-domain recommender systems. In: Proceedings of the 8th ACM...
  • Chen, J., Hsieh, G., Mahmud, J., Nichols, J., 2014. Understanding Individuals ’ Personal Values from Social Media Word...
  • L. Chen et al.

    Recommender systems based on user reviews: the state of the art

    User Model User-Adapt. Interact.

    (2015)
  • Claypool, M., Gokhale, A., Miranda, T., Gokhale, A., Murnikov, P., Netes, D., Sartin, M., 1999. Combining content-based...
  • J. Crowther

    Oxford Advanced Learner's Dictionary of Current English

    (1995)
  • Das, A.S., Datar, M., Garg, A., Rajaram, S., 2007. Google news personalization: scalable online collaborative...
  • P.J. Denning

    ACM President's Letter: electronic junk

    Commun. ACM

    (1982)
  • T. Devos et al.

    Conflicts among human values and trust in institutions

    Br. J. Soc. Psychol.

    (2002)
  • Durkheim, E., 1951. Suicide, a study in sociology. Glencoe, IL. Free Press (1987,...
  • Cited by (33)

    • Consequences of personalized product recommendations and price promotions in online grocery shopping

      2022, Journal of Retailing and Consumer Services
      Citation Excerpt :

      Recommendation agent is an interactive decision aid which helps consumers in screening and evaluating different options that are available in an online store (Häubl and Trifts, 2000). The use of such systems is tempting for online retailers: online recommendation systems are known to be effective in engaging consumers (Senecal and Nantel, 2004; Xiao and Benbasat, 2007; Ampadu et al., 2022), as well as in cross-selling, up-selling and increasing customer loyalty (Srivastava et al., 2020). According to Zhang et al. (2011), the efficiency of the shopping process drives online customer loyalty—that is, consumers are more loyal to online stores that can offer greater efficiency to consumers in their online shopping.

    • Economic corollaries of personalized recommendations

      2022, Journal of Retailing and Consumer Services
    • User preference mining based on fine-grained sentiment analysis

      2022, Journal of Retailing and Consumer Services
    • Machine learning through the lens of e-commerce initiatives: An up-to-date systematic literature review

      2021, Computer Science Review
      Citation Excerpt :

      E-commerce drastically changed traditional buyer-seller relationships and the shopping process for many consumers [1].

    • A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations

      2021, Journal of Retailing and Consumer Services
      Citation Excerpt :

      E-commerce retailers often display products and their details on a web page and then allow users to buy products on the website. The power of data and machine learning enables companies to increase sales by implementing recommender systems on these websites (Srivastava et al., 2020). Recommender systems are AI-based software tools that provide suggestions for items to be of interest or of use to a user (Mahmood and Ricci, 2009; Resnick and Varian, 1997).

    View all citing articles on Scopus
    View full text