Skip to main content
Log in

Named Entity Recognition by Using XLNet-BiLSTM-CRF

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is the basis for many natural language processing (NLP) tasks such as information extraction and question answering. The accuracy of the NER directly affects the results of downstream tasks. Most of the relevant methods are implemented using neural networks, however, the word vectors obtained from a small data set cannot describe unusual, previously-unseen entities accurately and the results are not sufficiently accurate. Recently, the use of XLNet as a new pre-trained model has yielded satisfactory results in many NLP tasks, integration of XLNet embeddings in existent NLP tasks is not straightforward. In this paper, a new neural network model is proposed to improve the effectiveness of the NER by using a pre-trained XLNet, bi-directional long-short term memory (Bi-LSTM) and conditional random field (CRF). Pre-trained XLNet model is used to extract sentence features, then the classic NER neural network model is combined with the obtained features. In addition, the superiority of XLNet in NER tasks is demonstrated. We evaluate our model on the CoNLL-2003 English dataset and WNUT-2017 and show that the XLNet-BiLSTM-CRF obtains state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of Data and Material

Two public datasets are used in the experiment. One is CoNLL-2003 English dataset, the other is WNUT-2017 dataset.

Code Availability Statement

The code and datasets can be found at https://github.com/dadadaray/xlnet-bilstm-crf.

Notes

  1. AE is the abbreviation of autoencoder. AE is a kind of neural network used in semi-supervised learning and unsupervised learning.

  2. AR is the abbreviation of autoregressive. Predict the next possible word based on the above content. This type of language model is called an AR language model.

  3. Generative Pre-Training. It is a typical autoregressive language model.

  4. Available at https://github.com/zihangdai/xlnet.

References

  1. Aguilar G, López-Monroy AP, González F, Solorio T (2018) Modeling noisiness to recognize named entities using multitask neural networks on social media. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp. 1401–1412. https://doi.org/10.18653/v1/N18-1127. https://www.aclweb.org/anthology/N18-1127

  2. Akbik A, Blythe D, Vollgraf R (2018a) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp. 1638–1649

  3. Akbik A, Blythe D, Vollgraf R (2018b) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649. https://www.aclweb.org/anthology/C18-1139

  4. Akbik A, Bergmann T, Vollgraf R (2019a) Pooled contextualized embeddings for named entity recognition. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), pp. 724–728

  5. Akbik A, Bergmann T, Vollgraf R (2019b) Pooled contextualized embeddings for named entity recognition. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 724–728, https://doi.org/10.18653/v1/N19-1078. https://www.aclweb.org/anthology/N19-1078

  6. Chen H, Lin Z, Ding G, Lou J, Zhang Y, Karlsson B (2019) GRN: gated relation network to enhance convolutional neural network for named entity recognition. Proc AAAI Conf Artif Intell 33:6236–6243

    Google Scholar 

  7. Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNS. Trans Assoc Comput Linguist 4:357–370

    Article  Google Scholar 

  8. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(1):2493–2537

    MATH  Google Scholar 

  9. Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860

  10. Derczynski L, Nichols E, Erp MV, Limsopatham N (2017) Results of the wnut2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd workshop on noisy user-generated text

  11. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  12. Edunov S, Baevski A, Auli M (2019) Pre-trained language model representations for language generation. arXiv preprint arXiv:1903.09722

  13. Gong C, Tang J, Zhou S, Hao Z, Wang J (2019) Chinese named entity recognition with bert. DEStech transactions on computer science and engineering (cisnrc)

  14. Gunawan W, Suhartono D, Purnomo F, Ongko A (2018) Named-entity recognition for Indonesian language using bidirectional LSTM-CNNs. Proc Comput Sci 135:425–432

    Article  Google Scholar 

  15. Habibi M, Weber L, Neves M, Wiegandt DL, Leser U (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14):i37–i48

    Article  Google Scholar 

  16. Hammerton J (2003) Named entity recognition with long short-term memory. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL 2003-vol. 4, Association for Computational Linguistics, pp. 172–175

  17. Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 1446–1459

  18. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data

  19. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016a) Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, California, pp. 260–270. https://doi.org/10.18653/v1/N16-1030. https://www.aclweb.org/anthology/N16-1030

  20. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016b) Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360

  21. Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: ACL 2009, proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the AFNLP, 2–7 Aug 2009, Singapore

  22. Luo G, Huang X, Lin CY, Nie Z (2016) Joint entity recognition and disambiguation. In: Conference on empirical methods in natural language processing

  23. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp. 1064–1074. https://doi.org/10.18653/v1/P16-1101. https://www.aclweb.org/anthology/P16-1101

  24. McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

  25. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, Curran Associates, Inc., vol. 26, pp. 3111–3119. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

  27. Moreno JG, Pontes EL, Coustaty M, Doucet A (2019) TLR at BSNLP2019: a multilingual named entity recognition system. In: Proceedings of the 7th workshop on Balto-Slavic natural language processing, pp. 83–88

  28. Naseem U, Khushi M, Reddy V, Rajendran S, Razzak I, Kim J (2020) Bioalbert: A simple and effective pre-trained language model for biomedical named entity recognition. arXiv:2009.09223

  29. Oudah M, Shaalan K (2017) Nera 2.0: Improving coverage and performance of rule-based named entity recognition for arabic. Nat Lang Eng 23(3):441–472

  30. Passos A, Kumar V, McCallum A (2014) Lexicon infused phrase embeddings for named entity resolution. arxiv: 1404.5367

  31. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543

  32. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018a) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp. 2227–2237. https://doi.org/10.18653/v1/N18-1202. https://www.aclweb.org/anthology/N18-1202

  33. Peters ME, Ammar W, Bhagavatula C, Power R (2017) Semi-supervised sequence tagging with bidirectional language models. arXiv preprint arXiv:1705.00108

  34. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018b) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

  35. Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: a survey. arXiv:2003.08271

  36. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. URL https://www.s3-us-west-2amazonawscom/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf

  37. Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009), pp. 147–155

  38. Reimers N, Gurevych I (2019) Alternative weighting schemes for elmo embeddings. arXiv preprint arXiv:1904.02954

  39. Salah RE et al (2017) Arabic rule-based named entity recognition systems progress and challenges. Int J Adv Sci Eng Inf Technol 7(3):815–821

    Article  Google Scholar 

  40. Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv:cs/0306050

  41. Shen Y, Yun H, Lipton ZC, Kronrod Y, Anandkumar A (2017) Deep active learning for named entity recognition. arXiv preprint arXiv:1707.05928

  42. Souza F, Nogueira R, Lotufo R (2019) Portuguese named entity recognition using BERT-CRF. arXiv preprint arXiv:1909.10649

  43. Strubell E, Verga P, Belanger D, McCallum A (2017) Fast and accurate entity recognition with iterated dilated convolutions. arXiv preprint arXiv:1702.02098

  44. Sundermeyer M, Schluter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association

  45. Szarvas G, Farkas R, Kocsor A (2006) A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In: Discovery science, 9th international conference, DS 2006, Barcelona, Spain, 7–10 Oct 2006, Proceedings

  46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008

  47. Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470

  48. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) XLNET: Generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, pp. 5754–5764

  49. Ye ZX, Ling ZH (2018) Hybrid semi-markov crf for neural sequence labeling. arXiv preprint arXiv:1805.03838

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China under Grant No. 61672102, No. 61073034, No. 61370064 and No. 60940032; the National Social Science Foundation of China under Grant No. BCA150050; the Program for New Century Excellent Talents in the University of Ministry of Education of China under Grant No. NCET-10-0239; the Open Project Sponsor of Beijing Key Laboratory of Intelligent Communication Software and Multimedia under Grant No. ITSM201493. This work is financially supported by the Beijing Advanced Innovation Center for Materials Genome Engineering(No.21-2019-02629, 21-2020-02902).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Depeng Dang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xue Jiang: Co-first author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, R., Jiang, X. & Dang, D. Named Entity Recognition by Using XLNet-BiLSTM-CRF. Neural Process Lett 53, 3339–3356 (2021). https://doi.org/10.1007/s11063-021-10547-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10547-1

Keywords

Navigation