skip to main content
10.1145/3183713.3183732acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

Authors Info & Claims
Published:27 May 2018Publication History

ABSTRACT

Knowledge base construction (KBC) has become a hot and in-time topic recently with the increasing application need of large-scale knowledge bases (KBs), such as semantic search, QA systems, the Google Knowledge Graph and IBM Watson QA System. Existing KBs mainly focus on encoding the factual facts of the world, e.g., city area and company product, which are regarded as the objective knowledge, whereas the subjective knowledge, which is frequently mentioned in Web queries, has been neglected. The subjective knowledge has no documented ground truth, instead, the truth relies on people's dominant opinion, which can be solicited from online crowd workers. In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. We develop a two-staged framework for subjective KB construction which consists of core subjective KB construction and subjective KB enrichment. Firstly, we try to build a core subjective KB mined from existing KBs, where every instance has rich objective properties. Then, we populate the core subjective KB with instances extracted from existing KBs, in which the crowd is leverage to annotate the subjective property of the instances. In order to optimize the crowd annotation process, we formulate the problem of subjective KB enrichment procedure as a cost-aware instance annotation problem and propose two instance annotation algorithms, i.e., adaptive instance annotation and batch-mode instance annotation algorithms. We develop a two-stage system for subjective KB construction which consists of core subjective KB construction and subjective knowledge enrichment. We evaluate our framework on real knowledge bases and a real crowdsourcing platform, the experimental results show that we can derive high quality subjective knowledge facts from existing KBs and crowdsourcing techniques through our proposed framework.

References

  1. Yael Amsterdamer, Susan B. Davidson, Anna Kukliansky, Tova Milo, Slava Novgorodov, and Amit Somech. 2015. Managing General and Individual Knowledge in Crowd Mining Applications CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings.Google ScholarGoogle Scholar
  2. Yael Amsterdamer, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. CrowdMiner: Mining association rules from the crowd. PVLDB, Vol. 6, 12 (2013), 1250--1253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge SIGMOD. 1247--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing Multi-Label Classification for Taxonomy Creation Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2013, November 7--9, 2013, Palm Springs, CA, USA.Google ScholarGoogle Scholar
  5. Caleb Chen Cao, Jiayang Tu, Zheng Liu, Lei Chen, and H. V. Jagadish. 2017. Tuning Crowdsourced Human Computation. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1021--1032.Google ScholarGoogle Scholar
  6. Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2016. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016. 969--984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, and James A. Landay. 2013. Cascade: crowdsourcing taxonomy creation. In 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27-May 2, 2013. 1999--2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Minsoo Choy, Jae-Gil Lee, Gahgene Gweon, and Daehoon Kim. 2014. Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.Google ScholarGoogle Scholar
  9. Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, and Yin Ye. 2015. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1247--1261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24-27, 2014. 601--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results) WWW. 100--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, and Jianhua Feng. 2015. iCrowd: An Adaptive Crowdsourcing Framework. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 1015--1030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, and Meihui Zhang. 2014. A hybrid machine-crowdsourcing system for matching web tables IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4, 2014. 976--987.Google ScholarGoogle Scholar
  14. Yihan Gao and Aditya G. Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. PVLDB, Vol. 7, 14 (2014), 1965--1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Daniel Golovin and Andreas Krause. 2011. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. JAIR (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. Vol. 194 (2013), 28--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question Answering via Integer Programming over Semi-Structured Knowledge Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9--15 July 2016. 1145--1152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sarath Kumar Kondreddi, Peter Triantafillou, and Gerhard Weikum. 2014. Combining information extraction and human computing for crowdsourced knowledge acquisition. In ICDE. 988--999.Google ScholarGoogle Scholar
  19. Yen-Ling Kuo, J Hsu, and Fuming Shih. 2012. Contextual commonsense knowledge acquisition from social content by crowd-sourcing explanations Proceedings of the Fourth AAAI Workshop on Human Computation. 18--24.Google ScholarGoogle Scholar
  20. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015), 167--195.Google ScholarGoogle Scholar
  21. Hugo Liu and Push Singh. 2004. ConceptNet: A practical commonsense reasoning tool-kit. BT technology journal Vol. 22, 4 (2004), 211--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Julian McAuley and Alex Yang. 2016. Addressing Complex and Subjective Product-Related Queries with Customer Reviews Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016. 625--635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rui Meng, Lei Chen, Yongxin Tong, and Chen Jason Zhang. 2017. Knowledge Base Semantic Integration Using Crowdsourcing. IEEE Trans. Knowl. Data Eng. Vol. 29, 5 (2017), 1087--1100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rui Meng, Yongxin Tong, Lei Chen, and Caleb Chen Cao. 2015. CrowdTC: Crowdsourced Taxonomy Construction. In 2015 IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, NJ, USA, November 14-17, 2015. 913--918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey, August 31, 2012. 25--28.Google ScholarGoogle Scholar
  27. Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang. 2016. Knowledge-Based Semantic Embedding for Machine Translation Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.Google ScholarGoogle Scholar
  28. Amit Singhal. 2012. Introducing to the Knowledge Graph: things not strings. https://googleblog.blogspot.hk/2012/05/introducing-knowledge-graph-things-not.html, (2012).Google ScholarGoogle Scholar
  29. Yongxin Tong, Lei Chen, Zimu Zhou, H.V. Jagadish, Lidan Shou, and Weifeng Lv . 2018. SLADE: A smart large-scale task decomposer in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering (2018).Google ScholarGoogle Scholar
  30. Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. 2014. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing Proceedings of the 30th International Conference on Data Engineering (ICDE 2014). IEEE, 1182--1185.Google ScholarGoogle Scholar
  31. Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile micro-task allocation in spatial crowdsourcing Proceedings of the 32nd International Conference on Data Engineering (ICDE 2016). IEEE, 49--60.Google ScholarGoogle Scholar
  32. Immanuel Trummer, Alon Y. Halevy, Hongrae Lee, Sunita Sarawagi, and Rahul Gupta. 2015. Mining Subjective Properties on the Web. In SIGMOD. 1745--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Norases Vesdapunt, Kedar Bellare, and Nilesh N. Dalvi. 2014. Crowdsourcing Algorithms for Entity Resolution. PVLDB, Vol. 7, 12 (2014), 1071--1082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. PVLDB, Vol. 5, 11 (2012), 1483--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012. 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Omar Zaidan and Chris Callison-Burch. 2011. Crowdsourcing Translation: Professional Quality from Non-Professionals ACL. 1220--1229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Caleb Chen Cao. 2013. Reducing Uncertainty of Schema Matching via Crowdsourcing. PVLDB, Vol. 6, 9 (2013), 757--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, and Jianhua Feng. 2015. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1031--1046. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
        May 2018
        1874 pages
        ISBN:9781450347037
        DOI:10.1145/3183713

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '18 Paper Acceptance Rate90of461submissions,20%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader