Abstract
Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging problem, and there is still generous room for improvement. In recent years, we have seen new methods based upon deep learning techniques for natural language processing emerge.
In this survey, we present how neural networks have been used for entity matching. Specifically, we identify which steps of the entity matching process existing work have targeted using neural networks, and provide an overview of the different techniques used at each step. We also discuss contributions from deep learning in entity matching compared to traditional methods, and propose a taxonomy of deep neural networks for entity matching.
- 2003. Duplicate Detection, Record Linkage, and Identity Uncertainty: Datasets. Retrieved from http://www.cs.utexas.edu/users/ml/riddle/data.html.Google Scholar
- Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1225–1233.Google Scholar
- A. Arasu, S. Chaudhuri, and R. Kaushik. 2008. Transformation-based framework for record matching. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. IEEE, 40–49.Google Scholar
- Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 783–794.Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. (Sept. 2014). arxiv:cs.CL/1409.0473Google Scholar
- Carlo Batini and Monica Scannapieco. 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, Berlin.Google ScholarDigital Library
- Zohra Bellahsene, Angela Bonifati, and Erhard Rahm (Eds.). 2011. Schema Matching and Mapping. Springer, Berlin.Google Scholar
- J. Bennett and S. Lanning. 2007. The Netflix prize. In Proceedings of the KDD Cup Workshop 2007. ACM, New York, 3–6.Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. In Transactions of the Association for Computional Linguistics.Google Scholar
- Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. 272–303.Google Scholar
- Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In Proceedings of the International Conference on Extending Database Technology.Google Scholar
- Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). ACM, New York, NY, 380–388.Google ScholarDigital Library
- Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18). 3998--4004.Google ScholarDigital Library
- Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarCross Ref
- Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science & Business Media.Google ScholarDigital Library
- P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (Sept. 2012), 1537–1555.Google ScholarDigital Library
- Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2019. End-to-end entity resolution for big data: A survey. (May 2019). arxiv:cs.DB/1905.06397Google Scholar
- Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very deep convolutional networks for text classification. In European Chapter of the Association for Computational Linguistics (EACL'17).Google Scholar
- Sanjib Das, A. Doan, C. Gokhale Psgc, P. Konda, Y. Govind, and D. Paulsen. 2016. The magellan data repository. https://sites.google.com/site/anhaidgroup/projects/data.Google Scholar
- J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. (Oct. 2018). arxiv:cs.CL/1810.04805Google Scholar
- Anhai Doan, Adel Ardalan, Jeffrey Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, G. C. Paul Suganthan, and Haojun Zhang. 2017. Human-in-the-loop challenges for entity matching: A midterm report. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics (HILDA’17). ACM, New York, NY, 12:1–12:6.Google ScholarDigital Library
- Anhai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA.Google ScholarDigital Library
- Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proceedings VLDB Endowment 11, 11 (Jul. 2018), 1454–1467.Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1 (Jan. 2007), 1–16.Google ScholarCross Ref
- Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (Mar. 1990), 179–211.Google ScholarCross Ref
- Ivan P. Fellegi and Alan B. Sunter. 1969. A theory for record linkage. J. Am. Stat. Assoc. 64, 328 (Dec. 1969), 1183–1210.Google ScholarCross Ref
- Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-end multi-perspective matching for entity resolution. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). AAAI Press, Macao, China, 4961–4967.Google ScholarCross Ref
- Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. Proceedings VLDB Endowment 5, 12 (Aug. 2012), 2018–2019.Google ScholarDigital Library
- Yoav Goldberg. 2017. Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10, 1 (Apr. 2017), 1–309.Google ScholarCross Ref
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarDigital Library
- Ram Deepak Gottapu, Cihan Dagli, and Bharami Ali. 2016. Entity resolution using convolutional neural network. Procedia Comput. Sci. 95 (Jan. 2016), 153–158. https://www.sciencedirect.com/science/article/pii/S1877050916324796.Google Scholar
- Yash Govind, Erik Paulson, Palaniappan Nagarajan, Paul Suganthan G. C., An Hai Doan, Youngchoon Park, Glenn M. Fung, Devin Conathan, Marshall Carter, and Mingju Sun. 2018. CloudMatcher: A hands-off cloud/crowd service for entity matching. Proceedings VLDB Endowment 11, 12 (Aug. 2018), 2042–2045.Google ScholarDigital Library
- Alex Graves. 2013. Generating sequences with recurrent neural networks. (Aug. 2013). arxiv:cs.NE/1308.0850Google Scholar
- Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 15th International Conference on Artificial Neural Networks. Springer, Berlin, 799–804.Google Scholar
- Sairam Gurajada, Lucian Popa, Kun Qian, and Prithviraj Sen. 2019. Learning-based methods with human-in-the-loop for entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2969–2970.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. (Dec. 2015). In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
- Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.Google ScholarDigital Library
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing.Google ScholarDigital Library
- Ekaterini Ioannou, Nataliya Rassadko, and Yannis Velegrakis. 2013. On generating benchmark data for entity matching. J. Data Semant. 2, 1 (Mar. 2013), 37–56.Google ScholarCross Ref
- Prateek Jain, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2010. Hashing hyperplane queries to near points with applications to large-scale active learning. In Proceedings of the Advances in Neural Information Processing Systems. J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 928–936.Google Scholar
- Mandar Joshi, Omer Levy, Luke Zettlemoyer, and Daniel S. Weld. 2019. BERT for coreference resolution: Baselines and analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5803–5808.Google Scholar
- Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Speech Recognition, Computational Linguistics and Natural Language Processing. Prentice Hall.Google ScholarDigital Library
- Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed. draft). Retrieved February 10, 2020 from https://web.stanford.edu/ jurafsky/slp3/.Google Scholar
- Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource deep entity resolution with transfer and active learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 5851–5861.Google ScholarCross Ref
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). 1746--1751.Google ScholarCross Ref
- T. Kohonen. 1987. Adaptive, associative, and self-organizing functions in neural computing. Appl. Opt. 26, 23 (Dec. 1987), 4910–4918.Google ScholarCross Ref
- Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-end neural entity linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL'18). 519--529.Google ScholarCross Ref
- Pradap Konda, Sanjib Das, Paul Suganthan G. C., Anhai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2016. Magellan: Toward building entity matching management systems. Proceedings VLDB Endowment 9, 12 (Aug. 2016), 1197–1208.Google Scholar
- Nihel Kooli, Robin Allesiardo, and Erwan Pigneul. 2018. Deep learning based approach for entity resolution in databases. In Intelligent Information and Database Systems. Springer International Publishing, 3–12.Google Scholar
- Hanna Köpcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings VLDB Endowment 3, 1--2 (Sept. 2010), 484–493.Google ScholarDigital Library
- Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18). System Demonstrations, 66--71.Google ScholarCross Ref
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260–270.Google ScholarCross Ref
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (May 2015), 436–444.Google ScholarCross Ref
- Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP'17). 188--197.Google ScholarCross Ref
- G. Li, J. Wang, Y. Zheng, and M. J. Franklin. 2016. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 28, 9 (Sept. 2016), 2296–2319.Google ScholarDigital Library
- Wen-Syan Li and Chris Clifton. 1994. Semantic integration in heterogeneous databases using neural networks. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1–12.Google ScholarDigital Library
- Wen-Syan Li and Chris Clifton. 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33, 1 (Apr. 2000), 49–84.Google ScholarDigital Library
- Yuliang Li, Jinfeng Li, Yoshihiko Suhara, Anhai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings VLDB Endowment 14, 1 (September 2020), 50--60.Google ScholarDigital Library
- Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 27–35.Google ScholarCross Ref
- Jiaheng Lu, Chunbin Lin, Jin Wang, and Chen Li. 2019. Synergy of database techniques and machine learning models for string similarity search and join. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 2975–2976.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119.Google Scholar
- Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. (May 2017). arxiv:cs.IR/1705.01509Google Scholar
- Sidharth Mudgal, Han Li, Theodoros Rekatsinas, Anhai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’18). ACM, New York, NY, 19–34.Google ScholarDigital Library
- Felix Naumann and Melanie Herschel. 2010. An Introduction to Duplicate Detection. Morgan and Claypool Publishers.Google ScholarDigital Library
- H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. 1959. Automatic linkage of vital records. Science 130, 3381 (Oct. 1959), 954–959.Google ScholarCross Ref
- Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). ACM, New York, NY, 629–638.Google ScholarDigital Library
- Jordi Nin and Vicenç Torra. 2006. New approach to the re-identification problem using neural networks. In Modeling Decisions for Artificial Intelligence. Springer, Berlin, 251–261.Google Scholar
- K. Nozaki, T. Hochin, and H. Nomiya. 2019. Semantic schema matching for string attribute with word vectors. In Proceedings of the 2019 6th International Conference on Computational Science/Intelligence and Applied Informatics (CSII’19). 25–30.Google Scholar
- Christopher Olah. 2015. Understanding LSTM networks. Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/.Google Scholar
- George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2019. A survey of blocking and filtering techniques for entity resolution. (May 2019). arxiv:cs.DB/1905.06167Google Scholar
- George Papadakis, Jonathan Svirsky, Avigdor Gal, and Themis Palpanas. 2016. Comparative analysis of approximate blocking techniques for entity resolution. Proceedings VLDB Endowment 9, 9 (May 2016), 684–695.Google ScholarDigital Library
- Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.Google ScholarCross Ref
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). aclweb.org, 1532–1543.Google ScholarCross Ref
- Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2227–2237.Google ScholarCross Ref
- Burdette Pixton and Christophe Giraud-Carrier. 2006. Using structured neural networks for record linkage. In Proceedings of the 6th Annual Workshop on Technology for Family History and Genealogical Research.Google Scholar
- Kun Qian, Lucian Popa, and Prithviraj Sen. 2017. Active learning for large-scale entity resolution. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). ACM, New York, NY, 1379–1388.Google ScholarDigital Library
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
- Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB J. 10, 4 (Dec. 2001), 334–350.Google ScholarDigital Library
- Jonathan Raiman and Olivier Raiman. 2018. DeepType: Multilingual entity linking by neural type system evolution. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91–99.Google Scholar
- Orion F. Reyes-Galaviz, Witold Pedrycz, Ziyue He, and Nick J. Pizzi. 2017. A supervised gradient-based learning algorithm for optimized entity resolution. Data Knowl. Eng. 112 (Nov. 2017), 106–129.Google ScholarDigital Library
- Sebastian Ruder. 2019. Neural Transfer Learning for Natural Language Processing. Ph.D. Dissertation. NUI Galway.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (Dec. 2015), 211–252.Google ScholarDigital Library
- M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (Nov. 1997), 2673–2681.Google ScholarDigital Library
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Stroudsburg, PA, 1715–1725.Google ScholarCross Ref
- Z. Shang, Y Liu, G. Li, and J. Feng. 2016. K-Join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28, 12 (Dec. 2016), 3293–3308.Google ScholarDigital Library
- Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. (May 2015). arxiv:cs.LG/1505.00387Google Scholar
- Kostas Stefanidis, Vasilis Efthymiou, Melanie Herschel, and Vassilis Christophides. 2014. Entity resolution in the web of data. In Proceedings of the 23rd International Conference on World Wide Web (WWW’14). ACM, New York, NY, 203–204.Google ScholarDigital Library
- Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping entity alignment with knowledge graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California, 4396–4402.Google ScholarCross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104–3112.Google Scholar
- John R. Talburt. 2011. Entity Resolution and Information Quality. Elsevier.Google Scholar
- Saravanan Thirumuruganathan, Shameem A. Puthiya Parambath, Mourad Ouzzani, Nan Tang, and Shafiq Joty. 2018. Reuse and adaptation for entity resolution through transfer learning. (Sept. 2018). arxiv:cs.DB/1809.11084Google Scholar
- Hung Nghiep Tran, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Intelligent Information and Database Systems. Springer International Publishing, 123–132.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ł. Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008.Google Scholar
- Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings VLDB Endowment 7, 12 (Aug. 2014), 1071–1082.Google ScholarDigital Library
- Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing entity resolution. Proceedings VLDB Endowment 5, 11 (Jul. 2012), 1483–1494.Google ScholarDigital Library
- Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. (Aug. 2014). arxiv:cs.DS/1408.2927Google Scholar
- Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question selection for crowd entity resolution. Proceedings VLDB Endowment 6, 6 (Apr. 2013), 349–360.Google ScholarDigital Library
- D. R. Wilson. 2011. Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage. In Proceedings of the 2011 International Joint Conference on Neural Networks. 9–14.Google ScholarCross Ref
- William E. Winkler. 1995. Matching and record linkage. In Business Survey Methods. 355–384. https://www.wiley.com/en-us/Business+Survey+Methods+-p-9780471598527.Google Scholar
- William E. Winkler. 2006. Overview of Record Linkage and Current Research Directions. Technical Report. Statistical Research Division, U.S. Census Bureau.Google Scholar
- L. Wolcott, W. Clements, and P. Saripalli. 2018. Scalable record linkage. In Proceedings of the 2018 IEEE International Conference on Big Data. 4268–4275.Google Scholar
- Lin Wu, Yang Wang, and Ling Shao. 2019. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 4 (Apr. 2019), 1602–1612.Google ScholarDigital Library
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- You Li, Dong-Bo Liu, and Wei-Ming Zhang. 2005. Schema matching using neural network. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 743–746.Google ScholarDigital Library
- Minghe Yu, Guoliang Li, Dong Deng, and Jianhua Feng. 2016. String similarity search and join: A survey. Front. Comput. Sci. 10, 3 (Jun. 2016), 399–417.Google ScholarDigital Library
- J. Zhang, J. Li, S. Wang, and J. Bian. 2014. A neural network based schema matching method for web service matching. In Proceedings of the 2014 IEEE International Conference on Services Computing. 448–455.Google Scholar
- Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, and David Page. 2019. AutoBlock: A hands-off blocking framework for entity matching. In Proceedings of the 13th International Conference on Web Search and Data Mining. 744--752.Google Scholar
- Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In Proceedings of the World Wide Web Conference (WWW’19). ACM, New York, NY, 2413–2424.Google ScholarDigital Library
- Zexuan Zhong, Yong Cao, Mu Guo, and Zaiqing Nie. 2018. CoLink: An unsupervised framework for user identity linkage. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via joint knowledge embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 4258–4264.Google ScholarCross Ref
Index Terms
- Neural Networks for Entity Matching: A Survey
Recommendations
Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementEntity Resolution (ER) identifies records from different data sources that refer to the same real-world entity. Conventional ER approaches usually employ a structure matching mechanism, where attributes are aligned, compared and aggregated for ER ...
Deep Entity Matching: Challenges and Opportunities
On the Horizon, On the Horizon and Experience PapersEntity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to ...
A taxonomy of privacy-preserving record linkage techniques
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity ...
Comments