Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Zhou, Houquan; Li, Yang; Li, Zhenghua; Zhang, Min

Computer Science > Computation and Language

arXiv:2203.10315 (cs)

[Submitted on 19 Mar 2022]

Title:Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Authors:Houquan Zhou, Yang Li, Zhenghua Li, Min Zhang

View PDF

Abstract:In recent years, large-scale pre-trained language models (PLMs) have made extraordinary progress in most NLP tasks. But, in the unsupervised POS tagging task, works utilizing PLMs are few and fail to achieve state-of-the-art (SOTA) performance. The recent SOTA performance is yielded by a Guassian HMM variant proposed by He et al. (2018). However, as a generative model, HMM makes very strong independence assumptions, making it very challenging to incorporate contexualized word representations from PLMs. In this work, we for the first time propose a neural conditional random field autoencoder (CRF-AE) model for unsupervised POS tagging. The discriminative encoder of CRF-AE can straightforwardly incorporate ELMo word representations. Moreover, inspired by feature-rich HMM, we reintroduce hand-crafted features into the decoder of CRF-AE. Finally, experiments clearly show that our model outperforms previous state-of-the-art models by a large margin on Penn Treebank and multilingual Universal Dependencies treebank v2.0.

Comments:	Accept to Findings of ACL 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2203.10315 [cs.CL]
	(or arXiv:2203.10315v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.10315

Submission history

From: Houquan Zhou [view email]
[v1] Sat, 19 Mar 2022 12:33:38 UTC (45 KB)

Computer Science > Computation and Language

Title:Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators