Measuring design-level information quality in online reviews

doi:10.1016/j.elerap.2018.05.010

Electronic Commerce Research and Applications

Volume 30, July–August 2018, Pages 102-110

https://doi.org/10.1016/j.elerap.2018.05.010 Get rights and content

Highlights

•
A model to estimate the design contextual information in online reviews is presented.
•
Links extracted nouns to ensure the relevancy of design features.
•
The model evaluates three information components: content, complexity, and relevancy.
•
The results confirmed that a lot of useful product design information exists in online reviews.

Abstract

Online product reviews are an important type of user-generated content. For product designers, they offer valuable information that identifies consumer likes, dislikes and desires. We investigate the volume and quality of product design information available in online reviews and introduce the design-level information quality (DLIQ) measure. DLIQ is indicative of the design contextual information stored in the online reviews for a given product. Three separate information components are evaluated: content, complexity, and relevancy. Key determinants of DLIQ are the number of reviews, sentences, words, noun words and feature matching noun words in a review database. DLIQ is formulated as an index and indicates information content relative to a sample of products. For a sample of ten products, RapidMiner was used to mine and illustrate the DLIQ. A validation test with a survey group confirms DLIQ is a reliable predictor of actionable design information in the review database.

Introduction

Online product reviews are an increasingly important type of user generated online content. For product designers they offer potentially valuable information that could identify consumer likes, dislikes and desires. Today, hundreds of consumer reviews for a wide range of products and services are recorded and publicly available on retailer, manufacturer and independent review websites. The reviews allow consumers to share their product and service experiences and are used by buyers to make purchase decisions (Anwer et al., 2010). Online reviews have rapidly evolved into an influential peer-to-peer opinion sharing platform and are now a key determinant of success in the online marketplace. It is therefore important for product designers to investigate these online reviews and integrate the embedded knowledge into the design process. This integration will allow designers to efficiently incorporate consumer trends, needs, and/or opinions into the design review process. Presently, this integration is very limited primarily because designers are unable to quantify the knowledge potential and thus justify the investigation. Further, the vast amount of review information and its widespread dissemination make it challenging for designers to collect the relevant information, interpret and summarize them, and organize them into a usable format. Traditionally, designers have used focus groups, interviews, surveys and customer service data to collect consumer feedback and identify exciting features and disappointments. Similar information is extractable from online reviews, and the feasibility has been reported in the literature (Yu et al., 2012, Yagci and Das, 2015, Yagci, 2014, Buyukozkana et al., 2007, Liu et al., 2013). Additionally, the reason for the review sentiment can be reliably identified, providing additional value to designers (Lee and Kim, 2017).

Yagci and Das (2015) showed how online review data is introducing large volumes of unstructured consumer experience information into the product knowledge space. In a survey of design engineers, they found significant consensus that design intelligence extracted from online reviews is complementary and different from that obtained via traditional methods. In many instances this knowledge may contradict or be unknown to designers. Frequently, though, designers are skeptical of the utility and validity of the online review knowledge. The free text nature of online reviews is not readily interpreted by designers, and the knowledge extraction process is often not obvious. Ghose and Ipeirotis (2007) observe that manufacturers are challenged to understand the true underlying quality information in online reviews. Further, Liu et al. (2013) caution that simply using the scaled evaluations of online reviews could mislead designers in identifying truly valuable and insightful opinions from a designers’ perspective. There is therefore a need to quantify the available design information in online reviews on a standard scale. Design teams can then more reliably estimate the likelihood that design contextual information will be successfully retrieved. Designers would be motivated to overcome their skepticism, and appropriately allocate tactical resources to the online review knowledge analysis effort.

Presently, online reviews are used by two groups, first by product buyers to investigate the opinions of other buyers, and second by marketing departments to evaluate consumer sentiment (positive to negative). Online reviews are a form of consumer intelligence, often including product use knowledge that is unknown to product designers and manufacturers. There is therefore a need to extend the utility of online reviews into the product designer group, and this research is motivated by the need for tools and models which facilitate this extension. Traditional design processes are focused on technology driven innovation, but addition of a consumer driven innovation component can lead to better and more successful products. Online reviews provide a conduit for this innovation by providing immediate and free access to feedback from hundreds, and possibly thousands, of customers. Liu et al. (2013) found that an efficient and timely design review process, with a focus on customer needs, is critical to market success. Our approach to access the consumer intelligence embedded in online reviews, is to mine the reviews for specific product intelligence that can lead to better designs. For example one may learn from the reviews that the image stabilization feature in a digital camera malfunctions in humid conditions. Frequently, such issues are not easily identified by consumer focus groups due to small group sizes and short study periods (Smithson, 2000, Button et al., 2013). Wang et al. (2018) have emphasized that online product reviews provide a good and reliable channel for not only understanding customer product needs but also analyzing a products’ competition in the market.

Yagci and Das (2015) introduced a method to efficiently extract or mine design intelligence from unstructured web reviews. The extraction process is complex and time consuming, and there is no guarantee their method will generate meaningful intelligence. Ideally then, an evaluation of the intelligence likely to be mined can justify the effort and motivate designers to integrate the method in the design process. The primary contribution of this research is the development of a quantitative measure which evaluates the design level intelligence likely to be mined from a given set of online reviews. The evaluation investigates the quantity and quality of the mined intelligence and proposes the design-level information quality (DLIQ) measure. DLIQ is indicative of the design contextual information that is stored in the online reviews for a given product, and likely to be efficiently extracted by the application of a mining method. The measure identifies product features mentioned in the online reviews and then evaluates the information attributes to assess quantity and quality (i.e., number of words, sentences, product features, etc.). The DLIQ measure contributes to the extension of web review intelligence in to the product design domain, by confirming the information value of the reviews in the context of product features. The measure supports the adoption and further development of web review intelligence mining methods.

Design information is referenced by features and feature-based opinion mining is widely reported in the literature. Feature based opinion mining can be performed at both the sentence or document level, and typically follows three main subtasks: identifying product features, identifying opinions (e.g., positive or negative sentiment) regarding the product features, and summarizing the results across features and opinions. Consider the following review of an automobile model: “High quality brakes, stops quickly even when road is wet. Since the vehicle has a low front bumper, in time they all get easily damaged, especially in driveways and bumps.” In this review, brake and front-bumper are two product features mentioned; and high quality and damaged are the associated opinion words. In the first case the sentiment is positive, while in the second it is negative.

In general, opinion mining research can be divided into two groups. the first focuses on opinion information while the second focuses on feature information (Binali et al., 2009, Li and Yang, 2005). Recently, an increasing number of researchers have proposed different methods to solve opinion mining problems and proposed different opinion-oriented information seeking systems (Hatzivassiloglou and Wiebe, 2000, Li, 2010, Hu and Liu, 2004a, Hu and Liu, 2004b, Anwer et al., 2010, Koen et al., 2002, Ding et al., 2008, Chen et al., 2015, Lee et al., 2017).

Turney (2002) introduced a semantic orientation approach, applying Web-based point-wise, mutual information statistics to determine review-polarity. Information content was calculated using hit counts from the reviews of multiple domains. Lee and Kim (2017) offered a semi-supervised sentiment labeling approach that adds predicted data to the training corpus in order to improve the initial classifier in self-training. Many researchers have presented research concerning sentiment orientation of opinion words at different granularity levels such as words, sentences, and entities in online reviews (Hu and Liu, 2004a, Hu and Liu, 2004b, Park and Lee, 2011, Mudambi and Schuff, 2010, Chechik et al., 2005, Lee and Kim, 2017, Lee et al., 2017). However, this information is not always useful to product designers who are looking to initiate design changes motivated by the opinions.

In addition, there has been limited research on measuring the design level value of online reviews. Feature based opinion mining is the starting point for such a measure, and recently there has been much reported activity. Zhen et al. (2014) identified opinion features by exploiting the difference in feature statistics across domain-specific corpus and domain-independent corpus. Jin and Liu (2010) introduced the helpfulness prediction technique, which focused on how to connect customer reviews to product designer ratings. They applied a regression algorithm to predict the online review’s helpfulness. The system has two phases. First, it creates the connection between the customer review and the designer rating with the help of the training set. Second, it extracts features from the reviews to aid in predicting helpfulness. This includes linguistic features, product features, and information quality (accuracy, timeliness, comparability, coverage, and relevance).

Park and Lee (2011) focused on how to design and utilize an online customer center in an effort to support new product concept generation. They introduced the decision support system that identifies customer needs and materializes them to develop R&D targets in the new product development process. Their system consists of four stages: extracting consumer reviews from the website, extracting a keywords list of entire documents and their frequencies, classifying the keywords into several groups based on customers’ expressed needs, applying the k-means clustering algorithm, and mapping customer needs with product specifications. A good review of feature-based opinion mining method is by Liu (2012). Further, they defined a quintuple model to describe aspects in a document. A method to extract the required aspects is then presented by Hu and Liu, 2004a, Hu and Liu, 2004b.

Other related work includes the design-feature-opinion-cause (DFOC) method by Yagci and Das (2015). DFOC integrates the evaluation of unstructured online reviews into the structured product design process. The key data element in DFOC is an online review and its associated opinion polarity. The DFOC method: (1) identifies a set of design features that are of interest to the product design community; (2) mines the online review database to identify, which features are of significance to customer evaluations; (3) extracts and estimates the sentiment or opinion of the set of significant features; and (4) identifies the likely design cause of the opinion.

The literature indicates the dominant theme in information measurement is entropy. Entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to Shannon entropy (Shannon, 1948), which quantifies the expected value of the information contained in a message, usually in units such as bits. It defines information as a purely quantitative measure of communication exchanges. Entropy assigns a score of uncertainty to a stochastic variable (Banerjee, 2007). Shannon defines entropy with a discrete random variable X, with possible states x₁,x₂, …x_n as: $H (X) = \sum_{i = 1}^{n} p (x_{i}) {log}_{2} (\frac{1}{p (x_{i})}) = - \sum_{i = 1}^{n} p (x_{i}) {log}_{2} p (x_{i})$ where $p (x_{i}) = Pr (X - x_{i})$ is the probability of the ith outcome of X. This formula implies that the more entropy a system has, the more information we can potentially gain once we know the outcome of the experiment. Maji and Pal (2010) used Shannon's entropy approach to measure the relevance and redundancy of features. Lee et al. (2017) adopted the entropy theory for their research to measure the ratio of the review text.

We investigate what amount of design-level-information is available in online reviews from the perspective of product designers. To develop the DLIQ measure, a Shannon entropy approach and the principle of maximum entropy were adopted. From the above equation, we know that entropy H(X) is a function of p(x_i). Maximum entropy occurs when all states are equally probable: p(x_i) = 1/x_i for all values of x_i in the range 0–1, therefore $\sum_{i = 1}^{n} p (x_{i}) = 1$ . Substituting from H(X) = 1 · log₂ p(x_i) = log₂ p(x_i). Similar to other approaches by Burch and Gull, 1983, Wu, 2003, in the DLIQ measure p(x_i) represents the frequency of key parameters, specifically nouns, in the extracted review database.

Section snippets

Mining design contextual features

The basic axiom of product design is governed by the relationship: Functional Requirement → Design Feature → Customer Satisfaction. Product designers are thus singularly focused on design features, which are the defining parameters of their specific product. A design feature is an attribute or characteristic of the design that is a controllable variable for the design community and a satisfaction focus for the customer community. Features link designers and users. In an online review design

Design level information quality measure

Product designers want an estimate of the relative amount of design-level-information available in online reviews, so that they can allocate analytical resources accordingly. Specifically, the need is for a measure that evaluates the quantity and quality of the information that can be expertly extracted. Further, the measure should be applicable to a wide variety of product classes (e.g., electronics, automobiles, service domain). The proposed DLIQ measure provides this estimate and consists of

Application to a set of products

The DLIQ measure described in Eq. (10) was applied on a set of M = 10 specific products and services. Each of these, belong to a class of products and services that are commonly reviewed online, and a review database was efficiently created. A key criterion in selecting the products was that M be a good representation of the product population since it will determine the benchmarks DLIQ_Cont(M), DLIQ_Cplx(M), and DLIQ_Relv(M). While the DLIQ measure is robust enough to evaluate products with a

Conclusion

This research investigated the volume and quality of product design information available in online reviews and introduces the DLIQ measure. The key contribution is a measure which evaluates the availability of design level information quality in online reviews where the design knowledge is bounded within a sentence. The evaluation is structured into three components: content, complexity, and relevancy. Key determinants of content are the number of reviews and the length of the reviews as

References (53)

S.F. Burch et al.
Image restoration by a powerful maximum entropy method
Comp. Vision, Graphics, Image Process.
(1983)
K. Chen et al.
Visualizing market structure through online product reviews: integrate topic modeling, TOPSIS, and multi-dimensional scaling approaches
Electron. Commer. Res. Appl.
(2015)
M. Chignell et al.
Combining multiple measures into a single figure of merit
Procedia Comput. Sci.
(2015)
N. Korfiatis et al.
Evaluating content quality and helpfulness of online product reviews: the interplay of review helpfulness vs review content
Electron. Commer. Res. Appl.
(2012)
J.H. Lee et al.
The role of entropy of review text sentiments on online WOM and movie box office sales
Electron. Commer. Res. Appl.
(2017)
S. Lee et al.
Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification
Electron. Commer. Res. Appl.
(2017)
Y. Liu et al.
Identifying helpful online reviews: a product designer’s perspective
Comput. Aided Des.
(2013)
Y. Park et al.
How to design and utilize online customer center to support new product concept generation
Expert Syst. Appl.
(2011)
W. Wang et al.
Topic analysis of online reviews for two competitive products using latent Dirichlet allocation
Electron. Commer. Res. Appl.
(2018)
X. Wu
Calculation of maximum entropy densities with application to income distribution
J. Econ.
(2003)

Y. Zhang et al.

Concept extraction and e-commerce applications

Electron. Commer. Res. Appl.

(2013)

N. Anwer et al.

Feature Based Opinion Mining of Online Free Format Customer Reviews using Frequency Distribution and Bayesian Statistics

(2010)

P. Banerjee

Measuring the quality of information in clustering protocols for sensor networks

Article no. 29 in Proceedings of the 3rd International Conference on Wireless Internet, Institute for Computer Sciences

(2007)

H. Binali et al.

A state of the art opinion mining and its application domain

K.S. Button et al.

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

(2013)

G. Buyukozkana et al.

Integration of Internet and online-based tools in new product development process

Prod. Plann. Control: Manage. Operat.

(2007)

G. Chechik et al.

Information bottleneck for Gaussian variables

J. Mach. Learn. Res.

(2005)

X. Ding et al.

A holistic lexicon-based approach to opinion mining

G. Ganu et al.

USRA: User Review Structure Analysis: Understanding Online Reviewing Trends

(2010)

A. Ghose et al.

Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews

(2007)

Z. Hai et al.

Identifying features in opinion mining via intrinsic and extrinsic domain relevance

IEEE Trans. Knowl. Data Eng.

(2013)

V. Hatzivassiloglou et al.

Effects of Adjective Orientation and Gradability on Sentence Subjectivity

(2000)

M. Hu et al.

Mining opinion features in customer reviews

M. Hu et al.

Mining and summarizing customer reviews

Proceedings 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

(2004)

E. Hullermeier et al.

Ranking by pairwise comparison: A note on risk minimization

Proceedings of the IEEE International Conference on Fuzzy Systems

(2004)

Cited by (30)

Examination of the adoption intention of new energy vehicles from the perspective of functional attributes and media richness
2024, Heliyon
Drawing on the theory of media richness, this paper aims to explore the impact of media richness on consumers' adoption intention through their perception of new energy vehicle (NEV) function attributes, and assess the moderation roles of brand familiarity and locus of control. A structural equation model is applied to analyze the data collected from 427 respondents. Empirical results demonstrate that consumers' perception of an electric attribute (i.e., charging efficiency) and two intelligent attributes (i.e., car networking and self-driving) are determinants of their adoption intention of NEVs. The other electric attribute (range) is trivial in consumers' perception. We also find that low, medium, and high-richness media significantly affect consumers' perception of NEVs' functional attributes. Compared to the high-richness, medium-richness correlates significantly with two types of NEV functional attributes. Regarding moderating effects, consumer familiarity with NEV's brand negatively impacts the relationship between media richness and adoption intention. Furthermore, low and medium-richness media effectively stimulate individuals with external control to adopt NEV, while high-richness media adversely influence individuals with internal control.
How to channel knowledge coproduction behavior in an online community: Combining machine learning and narrative analysis
2022, Technological Forecasting and Social Change
Citation Excerpt :
Therefore, we applied the knowledge quality of participants as the second dimension to investigate the distinctive participant role types in online communities. Many evaluation criteria for knowledge quality are commonly used in existing research, such as quality based on knowledge form (Yagci and Das, 2018; de Zubielqui et al., 2019), quality based on knowledge quantity (Yoo et al., 2011; Yagci and Das, 2018), quality based on knowledge flow (de Zubielqui et al., 2019), quality based on knowledge interaction (Yoo et al., 2011), quality based on knowledge application value (Yagci and Das, 2018), and intrinsic knowledge quality (de Zubielqui et al., 2019). From the view of epistemology, the key to the effectiveness of knowledge lies in being integrated into different application environments.
With ambiguous role definitions and without traditional organizational control or coordination mechanisms, how could online community channel knowledge coproduction behaviors among participants with heterogeneity roles? By using a novel theory-building method combining machine learning and narrative analysis, we investigate the heterogeneity of participant roles and knowledge coproduction channeling behaviors in a Chinese self-organizing online community—Miui.com, launched by Xiaomi Inc. Using latent Dirichlet allocation topic modeling analysis and a complementary qualitative analysis of community participants' online forum posts, we initially identify three types of participant role in the community: leaders, supporters, and integrators, and find that (a) conceptualizing narrative practice is adopted to promote knowledge collaboration for leaders; (b) serializing and anthologizing narrative practices are used to promote knowledge collaboration for supporters; and (c) anthologizing narrative practice is adopted to promote knowledge collaboration for integrators. Our study advances the theoretical understanding of the knowledge coproduction and value creation in online communities.
Online reviews and purchase intention: A cosmopolitanism perspective
2020, Tourism Management Perspectives
Citation Excerpt :
Prior research has acknowledged that a high level of online reviews can lead to online purchase intention (Floh et al., 2013). Yagci and Das (2018) indicated that online reviews contain data that offer a large amount of experienced-based information and shared product knowledge that consumers can use to make purchase decisions. Based on the U&G theory, in a digital-driven marketplace, this study examines the relationship between online reviews and purchase intention in different social models (e.g., democratic, constitutional monarchy, and socialist countries).
In recent years, marketing managers have realized that online reviews are an essential element in customer decision-making. However, there has been little valid measure of online reviews and no systematic analysis of their effect on consumer purchase intention. Drawing on the uses and gratification (U&G) and consumer culture (CCT) theories, this study develops a valid measure of online reviews and analyzes their effect on purchase intention. With a sample of 1112 hotel customers from three countries (Taiwan, Thailand, and Vietnam), this study validates the positive effect of online reviews on purchase intention through the perceived effectiveness of social media platforms (PESMP) and online trust while confirming the moderating role of cosmopolitanism.
Learning to rank products based on online product reviews using a hierarchical deep neural network
2019, Electronic Commerce Research and Applications
Citation Excerpt :
Given the rapid rate at which the volume of reviews is increasing, it would be a very tedious and time-consuming task for customers to comprehensively compile and compare these reviews to make an informed choice among alternative products. To support efficient purchase decisions, recent studies of aspect-based sentimental analysis (Appel et al., 2016; Chen et al., 2017), the abstractive summarization of product reviews (Gerani et al., 2014; Wang and Ling, 2016), and online review ranking (Liu et al., 2008; Saumya et al., 2018; Yagci and Das, 2018) have investigated how to handle such a large volume of online product reviews. Aspect-based sentimental analysis identifies the sentiment orientation toward aspect terms in online reviews, while the abstractive summarization generates an abstract summary from a large volume of online reviews.
Product ranking based on online product reviews is a task of inferring relative user preferences between different products as a variant of entity-level sentiment analysis. Despite the complex relationship between the overall user’s preference and individual diverse opinions, existing approaches generally employ empirical assumptions about sentiment features of the products of interest. In this paper, we propose a novel unified approach for learning to rank products based on online product reviews. Unlike existing approaches, it uses deep-learning techniques to extract the high-level latent review representation that contains the most semantic information in the learning process. For this approach, we extend the recently proposed hierarchical attention network to operate in the ranking domain. This network hierarchically learns optimal feature representations of the products and their reviews through the use of two-level attention-based encoders. To construct a more advanced ranking model, several features were added to give sufficient information about the relative user preferences, and two representative ranking loss functions, RankNet and ListNet, were applied. Furthermore, we demonstrate that this network outperforms the existing methods in sales rank prediction based on online product reviews.
A relation-aware representation approach for the question matching system
2024, World Wide Web
Is ChatGPT a Responsible Communication: A Study on the Credibility and Adoption of Conversational Artificial Intelligence
2024, Journal of Promotion Management

View all citing articles on Scopus

View full text

Measuring design-level information quality in online reviews

Highlights

Abstract

Introduction

Section snippets

Mining design contextual features

Design level information quality measure

Application to a set of products

Conclusion

Comp. Vision, Graphics, Image Process.

Electron. Commer. Res. Appl.

Procedia Comput. Sci.

Electron. Commer. Res. Appl.

Electron. Commer. Res. Appl.

Electron. Commer. Res. Appl.

Comput. Aided Des.

Expert Syst. Appl.

Electron. Commer. Res. Appl.

J. Econ.

Electron. Commer. Res. Appl.

Feature Based Opinion Mining of Online Free Format Customer Reviews using Frequency Distribution and Bayesian Statistics

Measuring the quality of information in clustering protocols for sensor networks

Article no. 29 in Proceedings of the 3rd International Conference on Wireless Internet, Institute for Computer Sciences

A state of the art opinion mining and its application domain

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

Integration of Internet and online-based tools in new product development process

Prod. Plann. Control: Manage. Operat.

Information bottleneck for Gaussian variables

J. Mach. Learn. Res.

A holistic lexicon-based approach to opinion mining

USRA: User Review Structure Analysis: Understanding Online Reviewing Trends

Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews

Identifying features in opinion mining via intrinsic and extrinsic domain relevance

IEEE Trans. Knowl. Data Eng.

Effects of Adjective Orientation and Gradability on Sentence Subjectivity

Mining opinion features in customer reviews

Mining and summarizing customer reviews

Proceedings 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Ranking by pairwise comparison: A note on risk minimization

Proceedings of the IEEE International Conference on Fuzzy Systems