Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164019

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Sinha, Ragini; Scherer, Ann-Christin; Rollwage, Christian; Rennies, Jan (Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg Branch for Hearing, Speech and Audio Technology HSA, Germany)
Doclo, Simon (Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg Branch for Hearing, Speech and Audio Technology HSA, Germany & Dept. of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Germany)

Inhalt:
This study investigates the performance of speakerconditioned target speaker extraction algorithms. While previous studies mostly focused on instrumental measures, this paper employs three different subjective performance measurement methods for two algorithms, namely: paired comparison, speech intelligibility measurement, and categorically scaled listening effort. The subjective evaluations with 15 normal-hearing subjects for different mixtures show a clear benefit of the time-domain-based algorithm compared to the magnitude-based algorithm and the unprocessed mixtures, i.e., it is clearly preferred in direct comparisons and produces significantly lower listening effort and better intelligibility. The time-domainbased algorithm also improves SRTs compared to the unprocessed mixtures even though unprocessed reference SRTs were very low. In contrast, the magnitude-based algorithm shows no improvement over the unprocessed mixtures in any evaluation method.