research-article

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Authors:
Jiaxin Huang

University of Illinois Urbana-Champaign

University of Illinois Urbana-Champaign
View Profile

,
Yiqing Xie

The Hong Kong University of Science and Technology

The Hong Kong University of Science and Technology
View Profile

,
Yu Meng

University of Illinois Urbana-Champaign

University of Illinois Urbana-Champaign
View Profile

,
Jiaming Shen

University of Illinois Urbana-Champaign

University of Illinois Urbana-Champaign
View Profile

,
Yunyi Zhang

University of Illinois Urbana-Champaign

University of Illinois Urbana-Champaign
View Profile

,
Jiawei Han

University of Illinois Urbana-Champaign

University of Illinois Urbana-Champaign
View Profile

Authors Info & Claims

WWW '20: Proceedings of The Web Conference 2020April 2020Pages 2188–2198https://doi.org/10.1145/3366423.3380284

Published:20 April 2020Publication History

WWW '20: Proceedings of The Web Conference 2020

Pages 2188–2198

ABSTRACT

Given a small set of seed entities (e.g., “USA”, “Russia”), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user’s interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

References

Carl Allen and Timothy M. Hospedales. 2019. Analogies Explained: Towards Understanding Word Embeddings. In ICML.Google Scholar
Maurizio Atzori, Simone Balloccu, and Andrea Bellanti. 2018. Unsupervised Singleton Expansion from Free Text. In ICSC.Google Scholar
Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS.Google Scholar
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. 2008. Context-aware query suggestion by mining click-through and session data. In KDD.Google Scholar
Zhe Chen, Michael J. Cafarella, and H. V. Jagadish. 2016. Long-tail Vocabulary Dictionary Extraction from the Web. In WSDM.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.Google Scholar
Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(2005), 91–134.Google ScholarCross Ref
Sonal Gupta, Diana L. MacLean, Jeffrey Heer, and Christopher D. Manning. 2014. Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums. Journal of the American Medical Informatics Association : JAMIA 21 5 (2014), 902–9.Google ScholarCross Ref
Sonal Gupta and Christopher D. Manning. 2014. Improved Pattern Learning for Bootstrapped Entity Extraction. In CoNLL.Google Scholar
Prateek Jindal and Dan Roth. 2011. Learning from Negative Examples in Set-Expansion. 2011 IEEE 11th International Conference on Data Mining (2011), 1110–1115.Google ScholarDigital Library
Winston Lin, Roman Yangarber, and Ralph Grishman. 2003. Bootstrapped learning of semantic classes from positive and negative examples.Google Scholar
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI.Google Scholar
Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, and Daniel Korat. 2018. Term Set Expansion based NLP Architect by Intel AI Lab. In EMNLP.Google Scholar
Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han. 2020. Discriminative Topic Mining via Category-Name Guided Text Embedding. In WWW.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS.Google Scholar
John M. Prager, Jennifer Chu-Carroll, and Krzysztof Czuba. 2004. Question Answering Using Constraint Satisfaction: QA-By-Dossier-With-Contraints. In ACL.Google Scholar
Lior Rokach and Oded Maimon. 2005. Clustering Methods. In The Data Mining and Knowledge Discovery Handbook.Google Scholar
Xin Rong, Zhe Chen, Qiaozhu Mei, and Eytan Adar. 2016. EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion. In WSDM.Google Scholar
Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, and Jiawei Han. 2017. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. In ECML/PKDD.Google Scholar
Michael Thelen and Ellen Riloff. 2002. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In EMNLP.Google Scholar
S. Tong and J. Dean. 2008. System and methods for automatically creating lists. US Patent 7,350,187.Google Scholar
Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39 (2013), 665–707.Google ScholarCross Ref
Vishnu Vyas and Patrick Pantel. 2009. Semi-Automatic Entity Set Refinement. In HLT-NAACL.Google Scholar
Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg. 2008. Automatic Set Expansion for List Question Answering. In EMNLP.Google Scholar
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zhigang Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI.Google Scholar
Puxuan Yu, Zhiqi Huang, Razieh Rahimi, and James Allan. 2019. Corpus-based Set Expansion with Lexical Features and Distributed Representations. In SIGIR.Google Scholar
Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, and Jiawei Han. 2019. FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams. ArXiv abs/1910.04345(2019).Google Scholar

Index Terms

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
  2. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Corpus-based Set Expansion with Lexical Features and Distributed Representations
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Corpus-based set expansion refers to mining "sibling" entities of some given seed entities from a corpus. Previous works are limited to using either textual context matching or semantic matching to fulfill this task. Neither matching method takes full ...
Read More
FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams
Machine Learning and Knowledge Discovery in Databases
Abstract
Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, ...
Read More
Iterative Set Expansion of Named Entities Using the Web
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bootstrap Methods
Semantic Computing
Set Expansion
Web Mining
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 247
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

Corpus-based Set Expansion with Lexical Features and Distributed Representations

FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams

Iterative Set Expansion of Named Entities Using the Web

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

Corpus-based Set Expansion with Lexical Features and Distributed Representations

FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams

Iterative Set Expansion of Named Entities Using the Web

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media