research-article

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

Authors:
Hao Xin

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Rui Meng

BNU-HKBU United International College, Zhuhai, China

BNU-HKBU United International College, Zhuhai, China
View Profile

,
Lei Chen

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataMay 2018Pages 1349–1361https://doi.org/10.1145/3183713.3183732

Published:27 May 2018Publication History

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Pages 1349–1361

ABSTRACT

Knowledge base construction (KBC) has become a hot and in-time topic recently with the increasing application need of large-scale knowledge bases (KBs), such as semantic search, QA systems, the Google Knowledge Graph and IBM Watson QA System. Existing KBs mainly focus on encoding the factual facts of the world, e.g., city area and company product, which are regarded as the objective knowledge, whereas the subjective knowledge, which is frequently mentioned in Web queries, has been neglected. The subjective knowledge has no documented ground truth, instead, the truth relies on people's dominant opinion, which can be solicited from online crowd workers. In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. We develop a two-staged framework for subjective KB construction which consists of core subjective KB construction and subjective KB enrichment. Firstly, we try to build a core subjective KB mined from existing KBs, where every instance has rich objective properties. Then, we populate the core subjective KB with instances extracted from existing KBs, in which the crowd is leverage to annotate the subjective property of the instances. In order to optimize the crowd annotation process, we formulate the problem of subjective KB enrichment procedure as a cost-aware instance annotation problem and propose two instance annotation algorithms, i.e., adaptive instance annotation and batch-mode instance annotation algorithms. We develop a two-stage system for subjective KB construction which consists of core subjective KB construction and subjective knowledge enrichment. We evaluate our framework on real knowledge bases and a real crowdsourcing platform, the experimental results show that we can derive high quality subjective knowledge facts from existing KBs and crowdsourcing techniques through our proposed framework.

References

Yael Amsterdamer, Susan B. Davidson, Anna Kukliansky, Tova Milo, Slava Novgorodov, and Amit Somech. 2015. Managing General and Individual Knowledge in Crowd Mining Applications CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings.Google Scholar
Yael Amsterdamer, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. CrowdMiner: Mining association rules from the crowd. PVLDB, Vol. 6, 12 (2013), 1250--1253. Google ScholarDigital Library
Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge SIGMOD. 1247--1250. Google ScholarDigital Library
Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing Multi-Label Classification for Taxonomy Creation Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2013, November 7--9, 2013, Palm Springs, CA, USA.Google Scholar
Caleb Chen Cao, Jiayang Tu, Zheng Liu, Lei Chen, and H. V. Jagadish. 2017. Tuning Crowdsourced Human Computation. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1021--1032.Google Scholar
Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2016. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016. 969--984. Google ScholarDigital Library
Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, and James A. Landay. 2013. Cascade: crowdsourcing taxonomy creation. In 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27-May 2, 2013. 1999--2008. Google ScholarDigital Library
Minsoo Choy, Jae-Gil Lee, Gahgene Gweon, and Daehoon Kim. 2014. Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.Google Scholar
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, and Yin Ye. 2015. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1247--1261. Google ScholarDigital Library
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24-27, 2014. 601--610. Google ScholarDigital Library
Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results) WWW. 100--110. Google ScholarDigital Library
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, and Jianhua Feng. 2015. iCrowd: An Adaptive Crowdsourcing Framework. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 1015--1030. Google ScholarDigital Library
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, and Meihui Zhang. 2014. A hybrid machine-crowdsourcing system for matching web tables IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4, 2014. 976--987.Google Scholar
Yihan Gao and Aditya G. Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. PVLDB, Vol. 7, 14 (2014), 1965--1976. Google ScholarDigital Library
Daniel Golovin and Andreas Krause. 2011. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. JAIR (2011). Google ScholarDigital Library
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. Vol. 194 (2013), 28--61. Google ScholarDigital Library
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question Answering via Integer Programming over Semi-Structured Knowledge Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9--15 July 2016. 1145--1152. Google ScholarDigital Library
Sarath Kumar Kondreddi, Peter Triantafillou, and Gerhard Weikum. 2014. Combining information extraction and human computing for crowdsourced knowledge acquisition. In ICDE. 988--999.Google Scholar
Yen-Ling Kuo, J Hsu, and Fuming Shih. 2012. Contextual commonsense knowledge acquisition from social content by crowd-sourcing explanations Proceedings of the Fourth AAAI Workshop on Human Computation. 18--24.Google Scholar
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015), 167--195.Google Scholar
Hugo Liu and Push Singh. 2004. ConceptNet: A practical commonsense reasoning tool-kit. BT technology journal Vol. 22, 4 (2004), 211--226. Google ScholarDigital Library
Julian McAuley and Alex Yang. 2016. Addressing Complex and Subjective Product-Related Queries with Customer Reviews Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016. 625--635. Google ScholarDigital Library
Rui Meng, Lei Chen, Yongxin Tong, and Chen Jason Zhang. 2017. Knowledge Base Semantic Integration Using Crowdsourcing. IEEE Trans. Knowl. Data Eng. Vol. 29, 5 (2017), 1087--1100. Google ScholarDigital Library
Rui Meng, Yongxin Tong, Lei Chen, and Caleb Chen Cao. 2015. CrowdTC: Crowdsourced Taxonomy Construction. In 2015 IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, NJ, USA, November 14-17, 2015. 913--918. Google ScholarDigital Library
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995). Google ScholarDigital Library
Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey, August 31, 2012. 25--28.Google Scholar
Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang. 2016. Knowledge-Based Semantic Embedding for Machine Translation Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.Google Scholar
Amit Singhal. 2012. Introducing to the Knowledge Graph: things not strings. https://googleblog.blogspot.hk/2012/05/introducing-knowledge-graph-things-not.html, (2012).Google Scholar
Yongxin Tong, Lei Chen, Zimu Zhou, H.V. Jagadish, Lidan Shou, and Weifeng Lv . 2018. SLADE: A smart large-scale task decomposer in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering (2018).Google Scholar
Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. 2014. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing Proceedings of the 30th International Conference on Data Engineering (ICDE 2014). IEEE, 1182--1185.Google Scholar
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile micro-task allocation in spatial crowdsourcing Proceedings of the 32nd International Conference on Data Engineering (ICDE 2016). IEEE, 49--60.Google Scholar
Immanuel Trummer, Alon Y. Halevy, Hongrae Lee, Sunita Sarawagi, and Rahul Gupta. 2015. Mining Subjective Properties on the Web. In SIGMOD. 1745--1760. Google ScholarDigital Library
Norases Vesdapunt, Kedar Bellare, and Nilesh N. Dalvi. 2014. Crowdsourcing Algorithms for Entity Resolution. PVLDB, Vol. 7, 12 (2014), 1071--1082. Google ScholarDigital Library
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. PVLDB, Vol. 5, 11 (2012), 1483--1494. Google ScholarDigital Library
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012. 481--492. Google ScholarDigital Library
Omar Zaidan and Chris Callison-Burch. 2011. Crowdsourcing Translation: Professional Quality from Non-Professionals ACL. 1220--1229. Google ScholarDigital Library
Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Caleb Chen Cao. 2013. Reducing Uncertainty of Schema Matching via Crowdsourcing. PVLDB, Vol. 6, 9 (2013), 757--768. Google ScholarDigital Library
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, and Jianhua Feng. 2015. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1031--1046. Google ScholarDigital Library

Index Terms

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Incremental knowledge base construction using DeepDive

Populating a database with information from unstructured sources--also known as knowledge base construction (KBC)--is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. In this work, we ...
Read More
Techniques for automatic knowledge base acquisition: applications for expert systems
Read More
Knowledge acquisition for fuzzy knowledge base development in surface mount pwb assembly
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
General Chairs:
Gautam Das
University of Texas at Arlington, USA
,
Christopher Jermaine
Rice University, USA
,
Philip Bernstein
Microsoft Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
knowledge base construction
subjective knowledge
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 716
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental knowledge base construction using DeepDive

Techniques for automatic knowledge base acquisition: applications for expert systems

Knowledge acquisition for fuzzy knowledge base development in surface mount pwb assembly

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental knowledge base construction using DeepDive

Techniques for automatic knowledge base acquisition: applications for expert systems

Knowledge acquisition for fuzzy knowledge base development in surface mount pwb assembly

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media