skip to main content
research-article

Neural Networks for Entity Matching: A Survey

Published:21 April 2021Publication History
Skip Abstract Section

Abstract

Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging problem, and there is still generous room for improvement. In recent years, we have seen new methods based upon deep learning techniques for natural language processing emerge.

In this survey, we present how neural networks have been used for entity matching. Specifically, we identify which steps of the entity matching process existing work have targeted using neural networks, and provide an overview of the different techniques used at each step. We also discuss contributions from deep learning in entity matching compared to traditional methods, and propose a taxonomy of deep neural networks for entity matching.

References

  1. 2003. Duplicate Detection, Record Linkage, and Identity Uncertainty: Datasets. Retrieved from http://www.cs.utexas.edu/users/ml/riddle/data.html.Google ScholarGoogle Scholar
  2. Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1225–1233.Google ScholarGoogle Scholar
  3. A. Arasu, S. Chaudhuri, and R. Kaushik. 2008. Transformation-based framework for record matching. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. IEEE, 40–49.Google ScholarGoogle Scholar
  4. Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 783–794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. (Sept. 2014). arxiv:cs.CL/1409.0473Google ScholarGoogle Scholar
  6. Carlo Batini and Monica Scannapieco. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, Berlin.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zohra Bellahsene, Angela Bonifati, and Erhard Rahm (Eds.). 2011. Schema Matching and Mapping. Springer, Berlin.Google ScholarGoogle Scholar
  8. J. Bennett and S. Lanning. 2007. The Netflix prize. In Proceedings of the KDD Cup Workshop 2007. ACM, New York, 3–6.Google ScholarGoogle Scholar
  9. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. In Transactions of the Association for Computional Linguistics.Google ScholarGoogle Scholar
  10. Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. 272–303.Google ScholarGoogle Scholar
  11. Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In Proceedings of the International Conference on Extending Database Technology.Google ScholarGoogle Scholar
  12. Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). ACM, New York, NY, 380–388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18). 3998--4004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarGoogle ScholarCross RefCross Ref
  15. Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (Sept. 2012), 1537–1555.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2019. End-to-end entity resolution for big data: A survey. (May 2019). arxiv:cs.DB/1905.06397Google ScholarGoogle Scholar
  18. Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very deep convolutional networks for text classification. In European Chapter of the Association for Computational Linguistics (EACL'17).Google ScholarGoogle Scholar
  19. Sanjib Das, A. Doan, C. Gokhale Psgc, P. Konda, Y. Govind, and D. Paulsen. 2016. The magellan data repository. https://sites.google.com/site/anhaidgroup/projects/data.Google ScholarGoogle Scholar
  20. J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. (Oct. 2018). arxiv:cs.CL/1810.04805Google ScholarGoogle Scholar
  22. Anhai Doan, Adel Ardalan, Jeffrey Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, G. C. Paul Suganthan, and Haojun Zhang. 2017. Human-in-the-loop challenges for entity matching: A midterm report. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics (HILDA’17). ACM, New York, NY, 12:1–12:6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anhai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proceedings VLDB Endowment 11, 11 (Jul. 2018), 1454–1467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1 (Jan. 2007), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (Mar. 1990), 179–211.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ivan P. Fellegi and Alan B. Sunter. 1969. A theory for record linkage. J. Am. Stat. Assoc. 64, 328 (Dec. 1969), 1183–1210.Google ScholarGoogle ScholarCross RefCross Ref
  28. Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-end multi-perspective matching for entity resolution. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). AAAI Press, Macao, China, 4961–4967.Google ScholarGoogle ScholarCross RefCross Ref
  29. Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. Proceedings VLDB Endowment 5, 12 (Aug. 2012), 2018–2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yoav Goldberg. 2017. Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10, 1 (Apr. 2017), 1–309.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ram Deepak Gottapu, Cihan Dagli, and Bharami Ali. 2016. Entity resolution using convolutional neural network. Procedia Comput. Sci. 95 (Jan. 2016), 153–158. https://www.sciencedirect.com/science/article/pii/S1877050916324796.Google ScholarGoogle Scholar
  33. Yash Govind, Erik Paulson, Palaniappan Nagarajan, Paul Suganthan G. C., An Hai Doan, Youngchoon Park, Glenn M. Fung, Devin Conathan, Marshall Carter, and Mingju Sun. 2018. CloudMatcher: A hands-off cloud/crowd service for entity matching. Proceedings VLDB Endowment 11, 12 (Aug. 2018), 2042–2045.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alex Graves. 2013. Generating sequences with recurrent neural networks. (Aug. 2013). arxiv:cs.NE/1308.0850Google ScholarGoogle Scholar
  35. Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 15th International Conference on Artificial Neural Networks. Springer, Berlin, 799–804.Google ScholarGoogle Scholar
  36. Sairam Gurajada, Lucian Popa, Kun Qian, and Prithviraj Sen. 2019. Learning-based methods with human-in-the-loop for entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2969–2970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. (Dec. 2015). In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle Scholar
  38. Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ekaterini Ioannou, Nataliya Rassadko, and Yannis Velegrakis. 2013. On generating benchmark data for entity matching. J. Data Semant. 2, 1 (Mar. 2013), 37–56.Google ScholarGoogle ScholarCross RefCross Ref
  42. Prateek Jain, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2010. Hashing hyperplane queries to near points with applications to large-scale active learning. In Proceedings of the Advances in Neural Information Processing Systems. J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 928–936.Google ScholarGoogle Scholar
  43. Mandar Joshi, Omer Levy, Luke Zettlemoyer, and Daniel S. Weld. 2019. BERT for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5803–5808.Google ScholarGoogle Scholar
  44. Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Speech Recognition, Computational Linguistics and Natural Language Processing. Prentice Hall.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed. draft). Retrieved February 10, 2020 from https://web.stanford.edu/ jurafsky/slp3/.Google ScholarGoogle Scholar
  46. Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource deep entity resolution with transfer and active learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 5851–5861.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  48. T. Kohonen. 1987. Adaptive, associative, and self-organizing functions in neural computing. Appl. Opt. 26, 23 (Dec. 1987), 4910–4918.Google ScholarGoogle ScholarCross RefCross Ref
  49. Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL'18). 519--529.Google ScholarGoogle ScholarCross RefCross Ref
  50. Pradap Konda, Sanjib Das, Paul Suganthan G. C., Anhai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2016. Magellan: Toward building entity matching management systems. Proceedings VLDB Endowment 9, 12 (Aug. 2016), 1197–1208.Google ScholarGoogle Scholar
  51. Nihel Kooli, Robin Allesiardo, and Erwan Pigneul. 2018. Deep learning based approach for entity resolution in databases. In Intelligent Information and Database Systems. Springer International Publishing, 3–12.Google ScholarGoogle Scholar
  52. Hanna Köpcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings VLDB Endowment 3, 1--2 (Sept. 2010), 484–493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18). System Demonstrations, 66--71.Google ScholarGoogle ScholarCross RefCross Ref
  54. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260–270.Google ScholarGoogle ScholarCross RefCross Ref
  55. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (May 2015), 436–444.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP'17). 188--197.Google ScholarGoogle ScholarCross RefCross Ref
  57. G. Li, J. Wang, Y. Zheng, and M. J. Franklin. 2016. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 28, 9 (Sept. 2016), 2296–2319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Wen-Syan Li and Chris Clifton. 1994. Semantic integration in heterogeneous databases using neural networks. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Wen-Syan Li and Chris Clifton. 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33, 1 (Apr. 2000), 49–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yuliang Li, Jinfeng Li, Yoshihiko Suhara, Anhai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings VLDB Endowment 14, 1 (September 2020), 50--60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 27–35.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jiaheng Lu, Chunbin Lin, Jin Wang, and Chen Li. 2019. Synergy of database techniques and machine learning models for string similarity search and join. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2975–2976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119.Google ScholarGoogle Scholar
  64. Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. (May 2017). arxiv:cs.IR/1705.01509Google ScholarGoogle Scholar
  65. Sidharth Mudgal, Han Li, Theodoros Rekatsinas, Anhai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’18). ACM, New York, NY, 19–34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Felix Naumann and Melanie Herschel. 2010. An Introduction to Duplicate Detection. Morgan and Claypool Publishers.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. 1959. Automatic linkage of vital records. Science 130, 3381 (Oct. 1959), 954–959.Google ScholarGoogle ScholarCross RefCross Ref
  68. Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 629–638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jordi Nin and Vicenç Torra. 2006. New approach to the re-identification problem using neural networks. In Modeling Decisions for Artificial Intelligence. Springer, Berlin, 251–261.Google ScholarGoogle Scholar
  70. K. Nozaki, T. Hochin, and H. Nomiya. 2019. Semantic schema matching for string attribute with word vectors. In Proceedings of the 2019 6th International Conference on Computational Science/Intelligence and Applied Informatics (CSII’19). 25–30.Google ScholarGoogle Scholar
  71. Christopher Olah. 2015. Understanding LSTM networks. Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/.Google ScholarGoogle Scholar
  72. George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2019. A survey of blocking and filtering techniques for entity resolution. (May 2019). arxiv:cs.DB/1905.06167Google ScholarGoogle Scholar
  73. George Papadakis, Jonathan Svirsky, Avigdor Gal, and Themis Palpanas. 2016. Comparative analysis of approximate blocking techniques for entity resolution. Proceedings VLDB Endowment 9, 9 (May 2016), 684–695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.Google ScholarGoogle ScholarCross RefCross Ref
  75. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). aclweb.org, 1532–1543.Google ScholarGoogle ScholarCross RefCross Ref
  76. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2227–2237.Google ScholarGoogle ScholarCross RefCross Ref
  77. Burdette Pixton and Christophe Giraud-Carrier. 2006. Using structured neural networks for record linkage. In Proceedings of the 6th Annual Workshop on Technology for Family History and Genealogical Research.Google ScholarGoogle Scholar
  78. Kun Qian, Lucian Popa, and Prithviraj Sen. 2017. Active learning for large-scale entity resolution. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). ACM, New York, NY, 1379–1388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  80. Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J. 10, 4 (Dec. 2001), 334–350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jonathan Raiman and Olivier Raiman. 2018. DeepType: Multilingual entity linking by neural type system evolution. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018).Google ScholarGoogle Scholar
  82. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91–99.Google ScholarGoogle Scholar
  83. Orion F. Reyes-Galaviz, Witold Pedrycz, Ziyue He, and Nick J. Pizzi. 2017. A supervised gradient-based learning algorithm for optimized entity resolution. Data Knowl. Eng. 112 (Nov. 2017), 106–129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Sebastian Ruder. 2019. Neural Transfer Learning for Natural Language Processing. Ph.D. Dissertation. NUI Galway.Google ScholarGoogle Scholar
  85. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (Dec. 2015), 211–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (Nov. 1997), 2673–2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Stroudsburg, PA, 1715–1725.Google ScholarGoogle ScholarCross RefCross Ref
  88. Z. Shang, Y Liu, G. Li, and J. Feng. 2016. K-Join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28, 12 (Dec. 2016), 3293–3308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. (May 2015). arxiv:cs.LG/1505.00387Google ScholarGoogle Scholar
  90. Kostas Stefanidis, Vasilis Efthymiou, Melanie Herschel, and Vassilis Christophides. 2014. Entity resolution in the web of data. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14). ACM, New York, NY, 203–204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping entity alignment with knowledge graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California, 4396–4402.Google ScholarGoogle ScholarCross RefCross Ref
  92. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104–3112.Google ScholarGoogle Scholar
  93. John R. Talburt. 2011. Entity Resolution and Information Quality. Elsevier.Google ScholarGoogle Scholar
  94. Saravanan Thirumuruganathan, Shameem A. Puthiya Parambath, Mourad Ouzzani, Nan Tang, and Shafiq Joty. 2018. Reuse and adaptation for entity resolution through transfer learning. (Sept. 2018). arxiv:cs.DB/1809.11084Google ScholarGoogle Scholar
  95. Hung Nghiep Tran, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Intelligent Information and Database Systems. Springer International Publishing, 123–132.Google ScholarGoogle Scholar
  96. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ł. Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008.Google ScholarGoogle Scholar
  97. Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings VLDB Endowment 7, 12 (Aug. 2014), 1071–1082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing entity resolution. Proceedings VLDB Endowment 5, 11 (Jul. 2012), 1483–1494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. (Aug. 2014). arxiv:cs.DS/1408.2927Google ScholarGoogle Scholar
  100. Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question selection for crowd entity resolution. Proceedings VLDB Endowment 6, 6 (Apr. 2013), 349–360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. D. R. Wilson. 2011. Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In Proceedings of the 2011 International Joint Conference on Neural Networks. 9–14.Google ScholarGoogle ScholarCross RefCross Ref
  102. William E. Winkler. 1995. Matching and record linkage. In Business Survey Methods. 355–384. https://www.wiley.com/en-us/Business+Survey+Methods+-p-9780471598527.Google ScholarGoogle Scholar
  103. William E. Winkler. 2006. Overview of Record Linkage and Current Research Directions. Technical Report. Statistical Research Division, U.S. Census Bureau.Google ScholarGoogle Scholar
  104. L. Wolcott, W. Clements, and P. Saripalli. 2018. Scalable record linkage. In Proceedings of the 2018 IEEE International Conference on Big Data. 4268–4275.Google ScholarGoogle Scholar
  105. Lin Wu, Yang Wang, and Ling Shao. 2019. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 4 (Apr. 2019), 1602–1612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  107. You Li, Dong-Bo Liu, and Wei-Ming Zhang. 2005. Schema matching using neural network. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 743–746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Minghe Yu, Guoliang Li, Dong Deng, and Jianhua Feng. 2016. String similarity search and join: A survey. Front. Comput. Sci. 10, 3 (Jun. 2016), 399–417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. J. Zhang, J. Li, S. Wang, and J. Bian. 2014. A neural network based schema matching method for web service matching. In Proceedings of the 2014 IEEE International Conference on Services Computing. 448–455.Google ScholarGoogle Scholar
  110. Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, and David Page. 2019. AutoBlock: A hands-off blocking framework for entity matching. In Proceedings of the 13th International Conference on Web Search and Data Mining. 744--752.Google ScholarGoogle Scholar
  111. Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In Proceedings of the World Wide Web Conference (WWW’19). ACM, New York, NY, 2413–2424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Zexuan Zhong, Yong Cao, Mu Guo, and Zaiqing Nie. 2018. CoLink: An unsupervised framework for user identity linkage. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  113. Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via joint knowledge embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 4258–4264.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Neural Networks for Entity Matching: A Survey

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 3
          June 2021
          533 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/3454120
          Issue’s Table of Contents

          Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2021
          • Accepted: 1 December 2020
          • Revised: 1 September 2020
          • Received: 1 March 2020
          Published in tkdd Volume 15, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format