Abstract
How to quickly measure the similarity of addresses has become an urgent need in various fields including financial anti-fraud. Traditional string-based similarity calculation methods have not completed this task perfectly. Taking into account the hierarchical nature of addresses, we constructed a framework for calculating the similarity of Chinese addresses. First, the whole address strings are split and annotated with proper level by a LM-LSTM-CRF model, and then sub-string level similarities are calculated. Last, similarity scores are combining by BP neural networks. This framework has achieved good results in practice for financial anti-fraud tasks.
This work is supported by joint project of Beijing Normal University and Credit Harmony Research, and in part by the National Natural Science Foundation of China under grant 71701018. Jing Liu, Jianbin Wang and Changqing Zhang contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Budanitsky, A., Hirst, G.: Semantic distance in wordnet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources 2, 2–2 (2001)
Chang, C.H., Huang, C.Y., Su, Y.S.: On chinese postal address and associated information extraction. In: The 26th Annual Conference of the Japanese Society for Artificial Intelligence, pp. 1–7 (2012)
Chen, Z., Lee, K.F.: A new statistical approach to chinese pinyin input. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 241–247. Association for Computational Linguistics (2000)
Fellbaum, C.: WordNet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with lstm. Neural Comput. 12(10), 2451–2471 (2000)
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. Neural Net. 1, 347–352 (1996)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Hou, X., Gao, Z., Wang, Q.: Internet finance development and banking market discipline: evidence from china. J. Financ. Stab. 22, 88–100 (2016)
Julstrom, B.A., Hinkemeyer, B.: Starting from scratch: growing longest common subsequences with evolution. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 930–938. Springer, Heidelberg (2006). https://doi.org/10.1007/11844297_94
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Pearson, London (2014)
Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13
Liu, L., et al.: Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109 (2017)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. pp. 1064–1074 (2016)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Perkins, J.: Python Text Processing With NLTK 2.0 Cookbook. Packt Publishing Ltd, Birmingham (2010)
Ta, L.: The risk and prevention of internet finance. In: 2017 4th International Conference on Industrial Economics System and Industrial Security Engineering, pp. 1–5 (2017)
Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)
Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and svmperf. Expert Syst. Appl. 42(4), 1857–1863 (2015)
Zhao, Y., Wang, L., Chou, A.: A fusion method of marine sub-bottom acoustic spatial data based on features and applications. Sci. Surv. Map. 38(5), 74–76 (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, J. et al. (2019). Chinese Address Similarity Calculation Based on Auto Geological Level Tagging. In: Lu, H., Tang, H., Wang, Z. (eds) Advances in Neural Networks – ISNN 2019. ISNN 2019. Lecture Notes in Computer Science(), vol 11555. Springer, Cham. https://doi.org/10.1007/978-3-030-22808-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-22808-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22807-1
Online ISBN: 978-3-030-22808-8
eBook Packages: Computer ScienceComputer Science (R0)