skip to main content
10.1145/3366423.3380284acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Published:20 April 2020Publication History

ABSTRACT

Given a small set of seed entities (e.g., “USA”, “Russia”), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user’s interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

References

  1. Carl Allen and Timothy M. Hospedales. 2019. Analogies Explained: Towards Understanding Word Embeddings. In ICML.Google ScholarGoogle Scholar
  2. Maurizio Atzori, Simone Balloccu, and Andrea Bellanti. 2018. Unsupervised Singleton Expansion from Free Text. In ICSC.Google ScholarGoogle Scholar
  3. Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS.Google ScholarGoogle Scholar
  4. Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. 2008. Context-aware query suggestion by mining click-through and session data. In KDD.Google ScholarGoogle Scholar
  5. Zhe Chen, Michael J. Cafarella, and H. V. Jagadish. 2016. Long-tail Vocabulary Dictionary Extraction from the Web. In WSDM.Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.Google ScholarGoogle Scholar
  7. Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(2005), 91–134.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sonal Gupta, Diana L. MacLean, Jeffrey Heer, and Christopher D. Manning. 2014. Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums. Journal of the American Medical Informatics Association : JAMIA 21 5 (2014), 902–9.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sonal Gupta and Christopher D. Manning. 2014. Improved Pattern Learning for Bootstrapped Entity Extraction. In CoNLL.Google ScholarGoogle Scholar
  10. Prateek Jindal and Dan Roth. 2011. Learning from Negative Examples in Set-Expansion. 2011 IEEE 11th International Conference on Data Mining (2011), 1110–1115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Winston Lin, Roman Yangarber, and Ralph Grishman. 2003. Bootstrapped learning of semantic classes from positive and negative examples.Google ScholarGoogle Scholar
  12. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI.Google ScholarGoogle Scholar
  13. Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, and Daniel Korat. 2018. Term Set Expansion based NLP Architect by Intel AI Lab. In EMNLP.Google ScholarGoogle Scholar
  14. Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han. 2020. Discriminative Topic Mining via Category-Name Guided Text Embedding. In WWW.Google ScholarGoogle Scholar
  15. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS.Google ScholarGoogle Scholar
  16. John M. Prager, Jennifer Chu-Carroll, and Krzysztof Czuba. 2004. Question Answering Using Constraint Satisfaction: QA-By-Dossier-With-Contraints. In ACL.Google ScholarGoogle Scholar
  17. Lior Rokach and Oded Maimon. 2005. Clustering Methods. In The Data Mining and Knowledge Discovery Handbook.Google ScholarGoogle Scholar
  18. Xin Rong, Zhe Chen, Qiaozhu Mei, and Eytan Adar. 2016. EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion. In WSDM.Google ScholarGoogle Scholar
  19. Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, and Jiawei Han. 2017. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. In ECML/PKDD.Google ScholarGoogle Scholar
  20. Michael Thelen and Ellen Riloff. 2002. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In EMNLP.Google ScholarGoogle Scholar
  21. S. Tong and J. Dean. 2008. System and methods for automatically creating lists. US Patent 7,350,187.Google ScholarGoogle Scholar
  22. Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39 (2013), 665–707.Google ScholarGoogle ScholarCross RefCross Ref
  23. Vishnu Vyas and Patrick Pantel. 2009. Semi-Automatic Entity Set Refinement. In HLT-NAACL.Google ScholarGoogle Scholar
  24. Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg. 2008. Automatic Set Expansion for List Question Answering. In EMNLP.Google ScholarGoogle Scholar
  25. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zhigang Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI.Google ScholarGoogle Scholar
  26. Puxuan Yu, Zhiqi Huang, Razieh Rahimi, and James Allan. 2019. Corpus-based Set Expansion with Lexical Features and Distributed Representations. In SIGIR.Google ScholarGoogle Scholar
  27. Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, and Jiawei Han. 2019. FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams. ArXiv abs/1910.04345(2019).Google ScholarGoogle Scholar

Index Terms

  1. Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '20: Proceedings of The Web Conference 2020
          April 2020
          3143 pages
          ISBN:9781450370233
          DOI:10.1145/3366423

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 April 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format