ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

WavBERT: Exploiting Semantic and Non-Semantic Speech Using Wav2vec and BERT for Dementia Detection

Youxiang Zhu, Abdelrahman Obyat, Xiaohui Liang, John A. Batsis, Robert M. Roth

In this paper, we exploit semantic and non-semantic information from patient’s speech data using Wav2vec and Bidirectional Encoder Representations from Transformers (BERT) for dementia detection. We first propose a basic WavBERT model by extracting semantic information from speech data using Wav2vec, and analyzing the semantic information using BERT for dementia detection. While the basic model discards the non-semantic information, we propose extended WavBERT models that convert the output of Wav2vec to the input to BERT for preserving the non-semantic information in dementia detection. Specifically, we determine the locations and lengths of inter-word pauses using the number of blank tokens from Wav2vec where the threshold for setting the pauses is automatically generated via BERT. We further design a pre-trained embedding conversion network that converts the output embedding of Wav2vec to the input embedding of BERT, enabling the fine-tuning of WavBERT with non-semantic information. Our evaluation results using the ADReSSo dataset showed that the WavBERT models achieved the highest accuracy of 83.1% in the classification task, the lowest Root-Mean-Square Error (RMSE) score of 4.44 in the regression task, and a mean F1 of 70.91% in the progression task. We confirmed the effectiveness of WavBERT models exploiting both semantic and non-semantic speech.


doi: 10.21437/Interspeech.2021-332

Cite as: Zhu, Y., Obyat, A., Liang, X., Batsis, J.A., Roth, R.M. (2021) WavBERT: Exploiting Semantic and Non-Semantic Speech Using Wav2vec and BERT for Dementia Detection. Proc. Interspeech 2021, 3790-3794, doi: 10.21437/Interspeech.2021-332

@inproceedings{zhu21e_interspeech,
  author={Youxiang Zhu and Abdelrahman Obyat and Xiaohui Liang and John A. Batsis and Robert M. Roth},
  title={{WavBERT: Exploiting Semantic and Non-Semantic Speech Using Wav2vec and BERT for Dementia Detection}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3790--3794},
  doi={10.21437/Interspeech.2021-332}
}