ABSTRACT
Knowledge base construction (KBC) has become a hot and in-time topic recently with the increasing application need of large-scale knowledge bases (KBs), such as semantic search, QA systems, the Google Knowledge Graph and IBM Watson QA System. Existing KBs mainly focus on encoding the factual facts of the world, e.g., city area and company product, which are regarded as the objective knowledge, whereas the subjective knowledge, which is frequently mentioned in Web queries, has been neglected. The subjective knowledge has no documented ground truth, instead, the truth relies on people's dominant opinion, which can be solicited from online crowd workers. In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. We develop a two-staged framework for subjective KB construction which consists of core subjective KB construction and subjective KB enrichment. Firstly, we try to build a core subjective KB mined from existing KBs, where every instance has rich objective properties. Then, we populate the core subjective KB with instances extracted from existing KBs, in which the crowd is leverage to annotate the subjective property of the instances. In order to optimize the crowd annotation process, we formulate the problem of subjective KB enrichment procedure as a cost-aware instance annotation problem and propose two instance annotation algorithms, i.e., adaptive instance annotation and batch-mode instance annotation algorithms. We develop a two-stage system for subjective KB construction which consists of core subjective KB construction and subjective knowledge enrichment. We evaluate our framework on real knowledge bases and a real crowdsourcing platform, the experimental results show that we can derive high quality subjective knowledge facts from existing KBs and crowdsourcing techniques through our proposed framework.
- Yael Amsterdamer, Susan B. Davidson, Anna Kukliansky, Tova Milo, Slava Novgorodov, and Amit Somech. 2015. Managing General and Individual Knowledge in Crowd Mining Applications CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings.Google Scholar
- Yael Amsterdamer, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. CrowdMiner: Mining association rules from the crowd. PVLDB, Vol. 6, 12 (2013), 1250--1253. Google ScholarDigital Library
- Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge SIGMOD. 1247--1250. Google ScholarDigital Library
- Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing Multi-Label Classification for Taxonomy Creation Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2013, November 7--9, 2013, Palm Springs, CA, USA.Google Scholar
- Caleb Chen Cao, Jiayang Tu, Zheng Liu, Lei Chen, and H. V. Jagadish. 2017. Tuning Crowdsourced Human Computation. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1021--1032.Google Scholar
- Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2016. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016. 969--984. Google ScholarDigital Library
- Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, and James A. Landay. 2013. Cascade: crowdsourcing taxonomy creation. In 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27-May 2, 2013. 1999--2008. Google ScholarDigital Library
- Minsoo Choy, Jae-Gil Lee, Gahgene Gweon, and Daehoon Kim. 2014. Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.Google Scholar
- Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, and Yin Ye. 2015. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1247--1261. Google ScholarDigital Library
- Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24-27, 2014. 601--610. Google ScholarDigital Library
- Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results) WWW. 100--110. Google ScholarDigital Library
- Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, and Jianhua Feng. 2015. iCrowd: An Adaptive Crowdsourcing Framework. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 1015--1030. Google ScholarDigital Library
- Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, and Meihui Zhang. 2014. A hybrid machine-crowdsourcing system for matching web tables IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4, 2014. 976--987.Google Scholar
- Yihan Gao and Aditya G. Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. PVLDB, Vol. 7, 14 (2014), 1965--1976. Google ScholarDigital Library
- Daniel Golovin and Andreas Krause. 2011. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. JAIR (2011). Google ScholarDigital Library
- Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. Vol. 194 (2013), 28--61. Google ScholarDigital Library
- Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question Answering via Integer Programming over Semi-Structured Knowledge Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9--15 July 2016. 1145--1152. Google ScholarDigital Library
- Sarath Kumar Kondreddi, Peter Triantafillou, and Gerhard Weikum. 2014. Combining information extraction and human computing for crowdsourced knowledge acquisition. In ICDE. 988--999.Google Scholar
- Yen-Ling Kuo, J Hsu, and Fuming Shih. 2012. Contextual commonsense knowledge acquisition from social content by crowd-sourcing explanations Proceedings of the Fourth AAAI Workshop on Human Computation. 18--24.Google Scholar
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015), 167--195.Google Scholar
- Hugo Liu and Push Singh. 2004. ConceptNet: A practical commonsense reasoning tool-kit. BT technology journal Vol. 22, 4 (2004), 211--226. Google ScholarDigital Library
- Julian McAuley and Alex Yang. 2016. Addressing Complex and Subjective Product-Related Queries with Customer Reviews Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016. 625--635. Google ScholarDigital Library
- Rui Meng, Lei Chen, Yongxin Tong, and Chen Jason Zhang. 2017. Knowledge Base Semantic Integration Using Crowdsourcing. IEEE Trans. Knowl. Data Eng. Vol. 29, 5 (2017), 1087--1100. Google ScholarDigital Library
- Rui Meng, Yongxin Tong, Lei Chen, and Caleb Chen Cao. 2015. CrowdTC: Crowdsourced Taxonomy Construction. In 2015 IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, NJ, USA, November 14-17, 2015. 913--918. Google ScholarDigital Library
- George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995). Google ScholarDigital Library
- Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey, August 31, 2012. 25--28.Google Scholar
- Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang. 2016. Knowledge-Based Semantic Embedding for Machine Translation Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.Google Scholar
- Amit Singhal. 2012. Introducing to the Knowledge Graph: things not strings. https://googleblog.blogspot.hk/2012/05/introducing-knowledge-graph-things-not.html, (2012).Google Scholar
- Yongxin Tong, Lei Chen, Zimu Zhou, H.V. Jagadish, Lidan Shou, and Weifeng Lv . 2018. SLADE: A smart large-scale task decomposer in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering (2018).Google Scholar
- Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. 2014. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing Proceedings of the 30th International Conference on Data Engineering (ICDE 2014). IEEE, 1182--1185.Google Scholar
- Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile micro-task allocation in spatial crowdsourcing Proceedings of the 32nd International Conference on Data Engineering (ICDE 2016). IEEE, 49--60.Google Scholar
- Immanuel Trummer, Alon Y. Halevy, Hongrae Lee, Sunita Sarawagi, and Rahul Gupta. 2015. Mining Subjective Properties on the Web. In SIGMOD. 1745--1760. Google ScholarDigital Library
- Norases Vesdapunt, Kedar Bellare, and Nilesh N. Dalvi. 2014. Crowdsourcing Algorithms for Entity Resolution. PVLDB, Vol. 7, 12 (2014), 1071--1082. Google ScholarDigital Library
- Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. PVLDB, Vol. 5, 11 (2012), 1483--1494. Google ScholarDigital Library
- Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012. 481--492. Google ScholarDigital Library
- Omar Zaidan and Chris Callison-Burch. 2011. Crowdsourcing Translation: Professional Quality from Non-Professionals ACL. 1220--1229. Google ScholarDigital Library
- Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Caleb Chen Cao. 2013. Reducing Uncertainty of Schema Matching via Crowdsourcing. PVLDB, Vol. 6, 9 (2013), 757--768. Google ScholarDigital Library
- Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, and Jianhua Feng. 2015. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1031--1046. Google ScholarDigital Library
Index Terms
- Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base
Recommendations
Incremental knowledge base construction using DeepDive
Populating a database with information from unstructured sources--also known as knowledge base construction (KBC)--is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. In this work, we ...
Comments