skip to main content
10.1145/3394486.3403145acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

Authors Info & Claims
Published:20 August 2020Publication History

ABSTRACT

Taxonomies are important knowledge ontologies that underpin numerous applications on a daily basis, but many taxonomies used in practice suffer from the low coverage issue. We study the taxonomy expansion problem, which aims to expand existing taxonomies with new concept terms. We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion. To generate natural self-supervision signals, STEAM samples mini-paths from the existing taxonomy, and formulates a node attachment prediction task between anchor mini-paths and query terms. To solve the node attachment task, it learns feature representations for query-anchor pairs from multiple views and performs multi-view co-training for prediction. Extensive experiments show that STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank on three public benchmarks. The code and data for STEAM can be found at https://github.com/yueyu1030/STEAM.

References

  1. Daniele Alfarone and Jesse Davis. 2015. Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus. In IJCAL. 1434--1441.Google ScholarGoogle Scholar
  2. Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, and Alexander Panchenko. 2019. Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings. In ACL. 4811--4817.Google ScholarGoogle Scholar
  3. Mohit Bansal, David Burkett, Gerard De Melo, and Dan Klein. 2014. Structured learning for taxonomy induction with belief propagation. In ACL. 1041--1051.Google ScholarGoogle Scholar
  4. Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh Shan. 2012. Entailment above the word level in distributional semantics. In EACL. 23--32.Google ScholarGoogle Scholar
  5. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In SIGMOD. ACM, 1247--1250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In SemEval-2016. ACL, 1081--1091.Google ScholarGoogle Scholar
  7. Haw-Shiuan Chang, Ziyun Wang, Luke Vilnis, and Andrew McCallum. 2018. Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection. In NAACL. 485--495.Google ScholarGoogle Scholar
  8. Anne Cocos, Marianna Apidianaki, and Chris Callison-Burch. 2018. Comparing constraints for taxonomic organization. In NAACL. 323--333.Google ScholarGoogle Scholar
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.Google ScholarGoogle Scholar
  10. Doug Downey, Chandra Bhagavatula, and Yi Yang. 2015. Efficient methods for inferring large sparse topic hierarchies. In the ACL. 774--784.Google ScholarGoogle Scholar
  11. Nicolas Rodolfo Fauceglia, Alfio Gliozzo, Sarthak Dash, Md Faisal Mahbub Chowdhury, and Nandana Mihindukulasooriya. 2019. Automatic Taxonomy Induction and Expansion. In EMNLP-IJCNLP Demo. 25--30.Google ScholarGoogle Scholar
  12. Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In ACL. 1199--1209.Google ScholarGoogle Scholar
  13. Amit Gupta, Rémi Lebret, Hamza Harkous, and Karl Aberer. 2017. Taxonomy induction using hypernym subsequences. In CIKM. 1329--1338.Google ScholarGoogle Scholar
  14. Sanda M Harabagiu, Steven J Maiorano, and Marius A Pacs ca. 2003. Open-domain textual question answering techniques. Natural Language Engineering, Vol. 9, 3 (2003), 231--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In COLING. ACL, 539--545.Google ScholarGoogle Scholar
  16. Giannis Karamanolakis, Jun Ma, and Xin Luna Dong. 2020. TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories. In ACL.Google ScholarGoogle Scholar
  17. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  18. Zornitsa Kozareva and Eduard Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In EMNLP. 1110--1118.Google ScholarGoogle Scholar
  19. Zornitsa Kozareva, Ellen Riloff, and Eduard Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In ACL. 1048--1056.Google ScholarGoogle Scholar
  20. Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association, Vol. 88, 3 (2000), 265.Google ScholarGoogle Scholar
  21. Xueqing Liu, Yangqiu Song, Shixia Liu, and Haixun Wang. 2012. Automatic taxonomy construction from keywords. In SIGKDD. 1433--1441.Google ScholarGoogle Scholar
  22. Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec. 2020. Expanding Taxonomies with Implicit Edge Semantics. In The Web Conference 2020. 2044--2054.Google ScholarGoogle Scholar
  23. Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, and Jiawei Han. 2018. End-to-end reinforcement learning for automatic taxonomy induction. In ACL. 2462--2472.Google ScholarGoogle Scholar
  24. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS. 3111--3119.Google ScholarGoogle Scholar
  25. George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (Nov. 1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In NIPS. 6338--6347.Google ScholarGoogle Scholar
  27. Alexander Panchenko, Stefano Faralli, Eugen Ruppert, Steffen Remus, Hubert Naets, Cédrick Fairon, Simone Paolo Ponzetto, and Chris Biemann. 2016. TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling. In SemEval-2016. ACL, 1320--1327.Google ScholarGoogle Scholar
  28. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP. ACL, 1532--1543.Google ScholarGoogle Scholar
  29. Stephen Roller, Douwe Kiela, and Maximilian Nickel. 2018. Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora. In ACL. 358--363.Google ScholarGoogle Scholar
  30. Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, and Alfio Gliozzo. 2020 a. Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer. In ACL. ACL.Google ScholarGoogle Scholar
  31. Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li, and Jiawei Han. 2020 b. NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network. In The Web Conference.Google ScholarGoogle Scholar
  32. Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, and Jiawei Han. 2020. TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network. In The Web Conference 2020. 486--497.Google ScholarGoogle Scholar
  33. Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T Vanni, Brian M Sadler, and Jiawei Han. 2018. Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion. In SIGKDD. 2180--2189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, and Jiawei Han. 2019. Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity. In CIKM. ACM, 599--608.Google ScholarGoogle Scholar
  35. Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving Hypernymy Detection with an Integrated Path-based and Distributional Method. In ACL. ACL, 2389--2398.Google ScholarGoogle Scholar
  36. Nikhita Vedula, Patrick K Nicholson, Deepak Ajwani, Sourav Dutta, Alessandra Sala, and Srinivasan Parthasarathy. 2018. Enriching taxonomies with functional domain knowledge. In SIGIR. 745--754.Google ScholarGoogle Scholar
  37. Denny Vrandevciundefined. 2012. Wikidata: A New Platform for Collaborative Data Collection. In WWW Companion. ACM, 1063--1064.Google ScholarGoogle Scholar
  38. Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, Thrivikrama Taula, and Jiawei Han. 2013. A phrase mining framework for recursive construction of a topical hierarchy. In SIGKDD. 437--445.Google ScholarGoogle Scholar
  39. Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. 481--492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xiaoxin Yin and Sarthak Shah. 2010. Building taxonomy of web search intents for name entity queries. In WWW. 1001--1010.Google ScholarGoogle Scholar
  41. Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, and Jiawei Han. 2018. Taxogen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering. In SIGKDD. 2701--2709.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, and Eric Xing. 2016. Learning Concept Taxonomies from Multi-modal Data. In ACL. 1791--1801.Google ScholarGoogle Scholar
  43. Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander Smola. 2014. Taxonomy discovery for personalized recommendation. In WSDM. 243--252.Google ScholarGoogle Scholar

Index Terms

  1. STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      August 2020
      3664 pages
      ISBN:9781450379984
      DOI:10.1145/3394486

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader