research-article

Neural Networks for Entity Matching: A Survey

Authors:
Nils Barlaug

Cognite and NTNU, Trondheim, Norway

Cognite and NTNU, Trondheim, Norway

0000-0003-4618-9702
View Profile

,
Jon Atle Gulla

NTNU

NTNU
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 3Article No.: 52pp 1–37https://doi.org/10.1145/3442200

Published:21 April 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging problem, and there is still generous room for improvement. In recent years, we have seen new methods based upon deep learning techniques for natural language processing emerge.

In this survey, we present how neural networks have been used for entity matching. Specifically, we identify which steps of the entity matching process existing work have targeted using neural networks, and provide an overview of the different techniques used at each step. We also discuss contributions from deep learning in entity matching compared to traditional methods, and propose a taxonomy of deep neural networks for entity matching.

References

2003. Duplicate Detection, Record Linkage, and Identity Uncertainty: Datasets. Retrieved from http://www.cs.utexas.edu/users/ml/riddle/data.html.Google Scholar
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1225–1233.Google Scholar
A. Arasu, S. Chaudhuri, and R. Kaushik. 2008. Transformation-based framework for record matching. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. IEEE, 40–49.Google Scholar
Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 783–794.Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. (Sept. 2014). arxiv:cs.CL/1409.0473Google Scholar
Carlo Batini and Monica Scannapieco. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, Berlin.Google ScholarDigital Library
Zohra Bellahsene, Angela Bonifati, and Erhard Rahm (Eds.). 2011. Schema Matching and Mapping. Springer, Berlin.Google Scholar
J. Bennett and S. Lanning. 2007. The Netflix prize. In Proceedings of the KDD Cup Workshop 2007. ACM, New York, 3–6.Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. In Transactions of the Association for Computional Linguistics.Google Scholar
Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. 272–303.Google Scholar
Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In Proceedings of the International Conference on Extending Database Technology.Google Scholar
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). ACM, New York, NY, 380–388.Google ScholarDigital Library
Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18). 3998--4004.Google ScholarDigital Library
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarCross Ref
Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science & Business Media.Google ScholarDigital Library
P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (Sept. 2012), 1537–1555.Google ScholarDigital Library
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2019. End-to-end entity resolution for big data: A survey. (May 2019). arxiv:cs.DB/1905.06397Google Scholar
Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very deep convolutional networks for text classification. In European Chapter of the Association for Computational Linguistics (EACL'17).Google Scholar
Sanjib Das, A. Doan, C. Gokhale Psgc, P. Konda, Y. Govind, and D. Paulsen. 2016. The magellan data repository. https://sites.google.com/site/anhaidgroup/projects/data.Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. (Oct. 2018). arxiv:cs.CL/1810.04805Google Scholar
Anhai Doan, Adel Ardalan, Jeffrey Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, G. C. Paul Suganthan, and Haojun Zhang. 2017. Human-in-the-loop challenges for entity matching: A midterm report. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics (HILDA’17). ACM, New York, NY, 12:1–12:6.Google ScholarDigital Library
Anhai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA.Google ScholarDigital Library
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proceedings VLDB Endowment 11, 11 (Jul. 2018), 1454–1467.Google ScholarDigital Library
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1 (Jan. 2007), 1–16.Google ScholarCross Ref
Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (Mar. 1990), 179–211.Google ScholarCross Ref
Ivan P. Fellegi and Alan B. Sunter. 1969. A theory for record linkage. J. Am. Stat. Assoc. 64, 328 (Dec. 1969), 1183–1210.Google ScholarCross Ref
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-end multi-perspective matching for entity resolution. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). AAAI Press, Macao, China, 4961–4967.Google ScholarCross Ref
Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. Proceedings VLDB Endowment 5, 12 (Aug. 2012), 2018–2019.Google ScholarDigital Library
Yoav Goldberg. 2017. Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10, 1 (Apr. 2017), 1–309.Google ScholarCross Ref
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarDigital Library
Ram Deepak Gottapu, Cihan Dagli, and Bharami Ali. 2016. Entity resolution using convolutional neural network. Procedia Comput. Sci. 95 (Jan. 2016), 153–158. https://www.sciencedirect.com/science/article/pii/S1877050916324796.Google Scholar
Yash Govind, Erik Paulson, Palaniappan Nagarajan, Paul Suganthan G. C., An Hai Doan, Youngchoon Park, Glenn M. Fung, Devin Conathan, Marshall Carter, and Mingju Sun. 2018. CloudMatcher: A hands-off cloud/crowd service for entity matching. Proceedings VLDB Endowment 11, 12 (Aug. 2018), 2042–2045.Google ScholarDigital Library
Alex Graves. 2013. Generating sequences with recurrent neural networks. (Aug. 2013). arxiv:cs.NE/1308.0850Google Scholar
Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 15th International Conference on Artificial Neural Networks. Springer, Berlin, 799–804.Google Scholar
Sairam Gurajada, Lucian Popa, Kun Qian, and Prithviraj Sen. 2019. Learning-based methods with human-in-the-loop for entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2969–2970.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. (Dec. 2015). In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.Google ScholarDigital Library
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing.Google ScholarDigital Library
Ekaterini Ioannou, Nataliya Rassadko, and Yannis Velegrakis. 2013. On generating benchmark data for entity matching. J. Data Semant. 2, 1 (Mar. 2013), 37–56.Google ScholarCross Ref
Prateek Jain, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2010. Hashing hyperplane queries to near points with applications to large-scale active learning. In Proceedings of the Advances in Neural Information Processing Systems. J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 928–936.Google Scholar
Mandar Joshi, Omer Levy, Luke Zettlemoyer, and Daniel S. Weld. 2019. BERT for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5803–5808.Google Scholar
Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Speech Recognition, Computational Linguistics and Natural Language Processing. Prentice Hall.Google ScholarDigital Library
Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed. draft). Retrieved February 10, 2020 from https://web.stanford.edu/ jurafsky/slp3/.Google Scholar
Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource deep entity resolution with transfer and active learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 5851–5861.Google ScholarCross Ref
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). 1746--1751.Google ScholarCross Ref
T. Kohonen. 1987. Adaptive, associative, and self-organizing functions in neural computing. Appl. Opt. 26, 23 (Dec. 1987), 4910–4918.Google ScholarCross Ref
Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL'18). 519--529.Google ScholarCross Ref
Pradap Konda, Sanjib Das, Paul Suganthan G. C., Anhai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2016. Magellan: Toward building entity matching management systems. Proceedings VLDB Endowment 9, 12 (Aug. 2016), 1197–1208.Google Scholar
Nihel Kooli, Robin Allesiardo, and Erwan Pigneul. 2018. Deep learning based approach for entity resolution in databases. In Intelligent Information and Database Systems. Springer International Publishing, 3–12.Google Scholar
Hanna Köpcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings VLDB Endowment 3, 1--2 (Sept. 2010), 484–493.Google ScholarDigital Library
Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18). System Demonstrations, 66--71.Google ScholarCross Ref
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260–270.Google ScholarCross Ref
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (May 2015), 436–444.Google ScholarCross Ref
Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP'17). 188--197.Google ScholarCross Ref
G. Li, J. Wang, Y. Zheng, and M. J. Franklin. 2016. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 28, 9 (Sept. 2016), 2296–2319.Google ScholarDigital Library
Wen-Syan Li and Chris Clifton. 1994. Semantic integration in heterogeneous databases using neural networks. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1–12.Google ScholarDigital Library
Wen-Syan Li and Chris Clifton. 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33, 1 (Apr. 2000), 49–84.Google ScholarDigital Library
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, Anhai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings VLDB Endowment 14, 1 (September 2020), 50--60.Google ScholarDigital Library
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 27–35.Google ScholarCross Ref
Jiaheng Lu, Chunbin Lin, Jin Wang, and Chen Li. 2019. Synergy of database techniques and machine learning models for string similarity search and join. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2975–2976.Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119.Google Scholar
Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. (May 2017). arxiv:cs.IR/1705.01509Google Scholar
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, Anhai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’18). ACM, New York, NY, 19–34.Google ScholarDigital Library
Felix Naumann and Melanie Herschel. 2010. An Introduction to Duplicate Detection. Morgan and Claypool Publishers.Google ScholarDigital Library
H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. 1959. Automatic linkage of vital records. Science 130, 3381 (Oct. 1959), 954–959.Google ScholarCross Ref
Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 629–638.Google ScholarDigital Library
Jordi Nin and Vicenç Torra. 2006. New approach to the re-identification problem using neural networks. In Modeling Decisions for Artificial Intelligence. Springer, Berlin, 251–261.Google Scholar
K. Nozaki, T. Hochin, and H. Nomiya. 2019. Semantic schema matching for string attribute with word vectors. In Proceedings of the 2019 6th International Conference on Computational Science/Intelligence and Applied Informatics (CSII’19). 25–30.Google Scholar
Christopher Olah. 2015. Understanding LSTM networks. Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/.Google Scholar
George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2019. A survey of blocking and filtering techniques for entity resolution. (May 2019). arxiv:cs.DB/1905.06167Google Scholar
George Papadakis, Jonathan Svirsky, Avigdor Gal, and Themis Palpanas. 2016. Comparative analysis of approximate blocking techniques for entity resolution. Proceedings VLDB Endowment 9, 9 (May 2016), 684–695.Google ScholarDigital Library
Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.Google ScholarCross Ref
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). aclweb.org, 1532–1543.Google ScholarCross Ref
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2227–2237.Google ScholarCross Ref
Burdette Pixton and Christophe Giraud-Carrier. 2006. Using structured neural networks for record linkage. In Proceedings of the 6th Annual Workshop on Technology for Family History and Genealogical Research.Google Scholar
Kun Qian, Lucian Popa, and Prithviraj Sen. 2017. Active learning for large-scale entity resolution. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). ACM, New York, NY, 1379–1388.Google ScholarDigital Library
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J. 10, 4 (Dec. 2001), 334–350.Google ScholarDigital Library
Jonathan Raiman and Olivier Raiman. 2018. DeepType: Multilingual entity linking by neural type system evolution. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018).Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91–99.Google Scholar
Orion F. Reyes-Galaviz, Witold Pedrycz, Ziyue He, and Nick J. Pizzi. 2017. A supervised gradient-based learning algorithm for optimized entity resolution. Data Knowl. Eng. 112 (Nov. 2017), 106–129.Google ScholarDigital Library
Sebastian Ruder. 2019. Neural Transfer Learning for Natural Language Processing. Ph.D. Dissertation. NUI Galway.Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (Dec. 2015), 211–252.Google ScholarDigital Library
M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (Nov. 1997), 2673–2681.Google ScholarDigital Library
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Stroudsburg, PA, 1715–1725.Google ScholarCross Ref
Z. Shang, Y Liu, G. Li, and J. Feng. 2016. K-Join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28, 12 (Dec. 2016), 3293–3308.Google ScholarDigital Library
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. (May 2015). arxiv:cs.LG/1505.00387Google Scholar
Kostas Stefanidis, Vasilis Efthymiou, Melanie Herschel, and Vassilis Christophides. 2014. Entity resolution in the web of data. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14). ACM, New York, NY, 203–204.Google ScholarDigital Library
Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping entity alignment with knowledge graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California, 4396–4402.Google ScholarCross Ref
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104–3112.Google Scholar
John R. Talburt. 2011. Entity Resolution and Information Quality. Elsevier.Google Scholar
Saravanan Thirumuruganathan, Shameem A. Puthiya Parambath, Mourad Ouzzani, Nan Tang, and Shafiq Joty. 2018. Reuse and adaptation for entity resolution through transfer learning. (Sept. 2018). arxiv:cs.DB/1809.11084Google Scholar
Hung Nghiep Tran, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Intelligent Information and Database Systems. Springer International Publishing, 123–132.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ł. Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008.Google Scholar
Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings VLDB Endowment 7, 12 (Aug. 2014), 1071–1082.Google ScholarDigital Library
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing entity resolution. Proceedings VLDB Endowment 5, 11 (Jul. 2012), 1483–1494.Google ScholarDigital Library
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. (Aug. 2014). arxiv:cs.DS/1408.2927Google Scholar
Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question selection for crowd entity resolution. Proceedings VLDB Endowment 6, 6 (Apr. 2013), 349–360.Google ScholarDigital Library
D. R. Wilson. 2011. Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In Proceedings of the 2011 International Joint Conference on Neural Networks. 9–14.Google ScholarCross Ref
William E. Winkler. 1995. Matching and record linkage. In Business Survey Methods. 355–384. https://www.wiley.com/en-us/Business+Survey+Methods+-p-9780471598527.Google Scholar
William E. Winkler. 2006. Overview of Record Linkage and Current Research Directions. Technical Report. Statistical Research Division, U.S. Census Bureau.Google Scholar
L. Wolcott, W. Clements, and P. Saripalli. 2018. Scalable record linkage. In Proceedings of the 2018 IEEE International Conference on Big Data. 4268–4275.Google Scholar
Lin Wu, Yang Wang, and Ling Shao. 2019. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 4 (Apr. 2019), 1602–1612.Google ScholarDigital Library
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32 (2019).Google Scholar
You Li, Dong-Bo Liu, and Wei-Ming Zhang. 2005. Schema matching using neural network. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 743–746.Google ScholarDigital Library
Minghe Yu, Guoliang Li, Dong Deng, and Jianhua Feng. 2016. String similarity search and join: A survey. Front. Comput. Sci. 10, 3 (Jun. 2016), 399–417.Google ScholarDigital Library
J. Zhang, J. Li, S. Wang, and J. Bian. 2014. A neural network based schema matching method for web service matching. In Proceedings of the 2014 IEEE International Conference on Services Computing. 448–455.Google Scholar
Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, and David Page. 2019. AutoBlock: A hands-off blocking framework for entity matching. In Proceedings of the 13th International Conference on Web Search and Data Mining. 744--752.Google Scholar
Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In Proceedings of the World Wide Web Conference (WWW’19). ACM, New York, NY, 2413–2424.Google ScholarDigital Library
Zexuan Zhong, Yong Cao, Mu Guo, and Zaiqing Nie. 2018. CoLink: An unsupervised framework for user identity linkage. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via joint knowledge embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 4258–4264.Google ScholarCross Ref

Index Terms

Neural Networks for Entity Matching: A Survey
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution

Recommendations

Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Entity Resolution (ER) identifies records from different data sources that refer to the same real-world entity. Conventional ER approaches usually employ a structure matching mechanism, where attributes are aligned, compared and aggregated for ER ...
Read More
Deep Entity Matching: Challenges and Opportunities
On the Horizon, On the Horizon and Experience Papers

Entity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to ...
Read More
A taxonomy of privacy-preserving record linkage techniques

The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 3
June 2021
533 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3454120
Issue’s Table of Contents

Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2021
- Accepted: 1 December 2020
- Revised: 1 September 2020
- Received: 1 March 2020
Published in tkdd Volume 15, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
entity matching
entity resolution
record linkage
data matching
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 989
  Total Downloads
- Downloads (Last 12 months)258
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Neural Networks for Entity Matching: A Survey

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution

Deep Entity Matching: Challenges and Opportunities

A taxonomy of privacy-preserving record linkage techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Neural Networks for Entity Matching: A Survey

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution

Deep Entity Matching: Challenges and Opportunities

A taxonomy of privacy-preserving record linkage techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media