ABSTRACT
Given a small set of seed entities (e.g., “USA”, “Russia”), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user’s interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.
- Carl Allen and Timothy M. Hospedales. 2019. Analogies Explained: Towards Understanding Word Embeddings. In ICML.Google Scholar
- Maurizio Atzori, Simone Balloccu, and Andrea Bellanti. 2018. Unsupervised Singleton Expansion from Free Text. In ICSC.Google Scholar
- Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS.Google Scholar
- Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. 2008. Context-aware query suggestion by mining click-through and session data. In KDD.Google Scholar
- Zhe Chen, Michael J. Cafarella, and H. V. Jagadish. 2016. Long-tail Vocabulary Dictionary Extraction from the Web. In WSDM.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.Google Scholar
- Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(2005), 91–134.Google ScholarCross Ref
- Sonal Gupta, Diana L. MacLean, Jeffrey Heer, and Christopher D. Manning. 2014. Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums. Journal of the American Medical Informatics Association : JAMIA 21 5 (2014), 902–9.Google ScholarCross Ref
- Sonal Gupta and Christopher D. Manning. 2014. Improved Pattern Learning for Bootstrapped Entity Extraction. In CoNLL.Google Scholar
- Prateek Jindal and Dan Roth. 2011. Learning from Negative Examples in Set-Expansion. 2011 IEEE 11th International Conference on Data Mining (2011), 1110–1115.Google ScholarDigital Library
- Winston Lin, Roman Yangarber, and Ralph Grishman. 2003. Bootstrapped learning of semantic classes from positive and negative examples.Google Scholar
- Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI.Google Scholar
- Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, and Daniel Korat. 2018. Term Set Expansion based NLP Architect by Intel AI Lab. In EMNLP.Google Scholar
- Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han. 2020. Discriminative Topic Mining via Category-Name Guided Text Embedding. In WWW.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS.Google Scholar
- John M. Prager, Jennifer Chu-Carroll, and Krzysztof Czuba. 2004. Question Answering Using Constraint Satisfaction: QA-By-Dossier-With-Contraints. In ACL.Google Scholar
- Lior Rokach and Oded Maimon. 2005. Clustering Methods. In The Data Mining and Knowledge Discovery Handbook.Google Scholar
- Xin Rong, Zhe Chen, Qiaozhu Mei, and Eytan Adar. 2016. EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion. In WSDM.Google Scholar
- Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, and Jiawei Han. 2017. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. In ECML/PKDD.Google Scholar
- Michael Thelen and Ellen Riloff. 2002. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In EMNLP.Google Scholar
- S. Tong and J. Dean. 2008. System and methods for automatically creating lists. US Patent 7,350,187.Google Scholar
- Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39 (2013), 665–707.Google ScholarCross Ref
- Vishnu Vyas and Patrick Pantel. 2009. Semi-Automatic Entity Set Refinement. In HLT-NAACL.Google Scholar
- Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg. 2008. Automatic Set Expansion for List Question Answering. In EMNLP.Google Scholar
- Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zhigang Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI.Google Scholar
- Puxuan Yu, Zhiqi Huang, Razieh Rahimi, and James Allan. 2019. Corpus-based Set Expansion with Lexical Features and Distributed Representations. In SIGIR.Google Scholar
- Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, and Jiawei Han. 2019. FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams. ArXiv abs/1910.04345(2019).Google Scholar
Index Terms
- Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion
Recommendations
Corpus-based Set Expansion with Lexical Features and Distributed Representations
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalCorpus-based set expansion refers to mining "sibling" entities of some given seed entities from a corpus. Previous works are limited to using either textual context matching or semantic matching to fulfill this task. Neither matching method takes full ...
FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams
Machine Learning and Knowledge Discovery in DatabasesAbstractSet expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, ...
Iterative Set Expansion of Named Entities Using the Web
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data MiningSet expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a ...
Comments