Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech

doi:10.1016/j.csl.2015.06.002

Computer Speech & Language

Volume 35, January 2016, Pages 73-92

https://doi.org/10.1016/j.csl.2015.06.002 Get rights and content

Highlights

•
Algorithmically modified speech is used to assess objective intelligibility metrics.
•
Reduced predictive power of the metrics for the given speech is demonstrated.
•
Metrics show two opposite predictive patterns in fluctuating and stationary maskers.
•
The glimpse proportion metric is extended.

Abstract

Several modification algorithms that alter natural or synthetic speech with the goal of improving intelligibility in noise have been proposed recently. A key requirement of many modification techniques is the ability to predict intelligibility, both offline during algorithm development, and online, in order to determine the optimal modification for the current noise context. While existing objective intelligibility metrics (OIMs) have good predictive power for unmodified natural speech in stationary and fluctuating noise, little is known about their effectiveness for other forms of speech. The current study evaluated how well seven OIMs predict listener responses in three large datasets of modified and synthetic speech which together represent 396 combinations of speech modification, masker type and signal-to-noise ratio. The chief finding is a clear reduction in predictive power for most OIMs when faced with modified and synthetic speech. Modifications introducing durational changes are particularly harmful to intelligibility predictors. OIMs that measure masked audibility tend to over-estimate intelligibility in the presence of fluctuating maskers relative to stationary maskers, while OIMs that estimate the distortion caused by the masker to a clean speech prototype exhibit the reverse pattern.

Introduction

Spoken language applications using recorded natural¹ or synthetic speech can be made more robust through algorithmic speech modification. Unlike traditional speech enhancement techniques (e.g., Hu and Loizou, 2004, Martin, 2005, Chen et al., 2006, Srinivasan et al., 2007) which focus on the noise-corrupted speech signal, the speech modification approach (e.g., Sauert and Vary, 2006, Bonardo and Zovato, 2007, Yoo et al., 2007, Brouckxon et al., 2008, Tang and Cooke, 2010) alters the clean speech signal prior to output or transmission. A recent evaluation (Cooke et al., 2013b) demonstrated that speech modification can result in intelligibility gains in noise equivalent to increases of more than 5 dB in output level.

A key ingredient in the design of effective modification strategies is the estimation of listener performance at frequent intervals during the development cycle. However, while subjective intelligibility scores remain the ultimate reference, continuous behavioural testing during algorithm design is usually infeasible. An alternative is to use objective intelligibility metrics (OIMs) to predict listener scores. OIMs not only avoid the need for extensive subjective testing, but can also be used at the core of the algorithm optimisation process. A number of speech modification algorithms (e.g., Sauert and Vary, 2010a, Tang and Cooke, 2011, Taal et al., 2013, Valentini-Botinhao et al., 2014) have been developed and optimised based on maximising intelligibility predictions made by OIMs such as the Speech Intelligibility Index (SII; ANSI, 1997) or the glimpse proportion metric (GP; Cooke, 2006).

OIMs have been motivated by two distinct approaches to account for the effect of noise on speech. In addition to the aforementioned SII and GP metrics, the Articulation Index (AI; French and Steinberg, 1947, Fletcher and Galt, 1950, Kryter, 1962a, Kryter, 1962b), and the extended Speech Intelligibility Index (ESII; Rhebergen and Versfeld, 2005) focus on quantifying the masked audibility of speech in the presence of noise. On the other hand, techniques such as the Normalised-Covariance Measure (NCM; Holube and Kollmeier, 1996, Ma et al., 2009), the Christiansen–Pedersen–Dau metric (henceforth referred to as CPD for brevity; Christiansen et al., 2010) and the Short-Time Objective Intelligibility metric (STOI; Taal et al., 2010) correlate representations of the clean reference speech and the speech-plus-noise signal in an attempt to measure the distortion caused by the masker. Another distortion-based approach is the Coherence Speech Intelligibility Index (CSII) proposed by Kates and Arehart (2005). The CSII measures the similarity between clean and noisy speech using magnitude-square coherence (Carter et al., 1973, Kates, 1992) which quantifies the degree to which the output of a system is linearly related to its input.

Both audibility- and distortion-based approaches target spectro-temporal regions least affected by the noise, but differ in their assumptions. While techniques based on audibility require separated estimates of speech and noise in order to estimate masking, distortion-based OIMs assume that human listeners possess a template of the clean speech which is compared to the incoming noisy speech.

When an OIM is employed as the objective function to be maximised, the predictive accuracy of the OIM is critical in determining the validity and effectiveness of the optimisation process. Most of the OIMs mentioned above have been evaluated with recorded natural speech or speech processed by noise reduction techniques. Relatively few studies have investigated their predictive power for modified natural speech or synthetic speech in noise: most OIMs were originally proposed to predict the intelligibility of distorted natural speech, for distortions caused by additive noise together with artefacts introduced by suppression algorithms applied to the noisy speech signal.

Predicting the intelligibility impact of modification algorithms is likely to be challenging since the most successful methods (in terms of improving masked intelligibility) modify the signal in diverse domains – durational and spectral/formant – and possibly through non-linear operations. While the alterations benefit intelligibility, they may also introduce artefacts to the speech signal, leading to degraded speech quality. Nevertheless, the relation between speech intelligibility and quality is complex, and factors such as listening effort and loudness interact. Intelligibility and quality are not simply negatively or positively correlated, especially across listeners (Preminger and Tasell, 1995). For synthetic speech it might be expected that the OIMs’ task is even more challenging because the natural speech reference signal is not available, i.e., distortions introduced by the text-to-speech (TTS) system cannot be taken into account. Consequently, predicting the intelligibility of poor quality synthetic speech may be even more difficult.

In two initial studies, which concerned solely the ability of OIMs to predict the masked intelligibility of modified and synthetic speech regardless of the perceptual speech quality, we observed a large reduction in the predictive accuracy of several OIMs on modified and synthetic speech relative to unmodified speech (Tang and Cooke, 2011, Valentini-Botinhao et al., 2011). The current study extends these pilots to a larger range of objective intelligibility metrics and includes behavioural data from recent extensive evaluations of 30 forms of modified and synthetic speech (Cooke et al., 2013a, Cooke et al., 2013b). Specifically, we evaluate the performance of one standard (SII) and six recent objective intelligibility metrics (ESII, GP, NCM, CSII, CPD, STOI) in predicting subjective intelligibility scores for both modified and synthetic speech in additive noise. The evaluation makes use of three datasets which together contain 396 combinations of speech modification, masker type and signal-to-noise ratio (SNR). The seven metrics are introduced in Section 2 while Section 3 describes the evaluation datasets. The outcome of a comparison of model predictions against behavioural data from large-scale listening tests is presented in Section 4.

Section snippets

Speech Intelligibility Index (SII)

SII and AI share a common underlying idea: speech intelligibility is dependent on the audibility of the signal in each frequency band. The AI can be expressed as a function of the masking level represented by the SNR ( ${SNR}_{f}^{AI}$ ) in each frequency channel as

$AI = \sum_{f = 1}^{F} W_{f} \cdot {SNR}_{f}^{AI}, \sum_{f = 1}^{F} W_{f} = 1$ where W_f denotes the band importance function (BIF) in channel f and ${SNR}_{f}^{AI}$ is a value in the interval [0, 1] based on a piecewise-linear transformation of the actual SNR level SNR_f in band f

${SNR}_{f}^{AI} = min (15, max (- 15, {SNR}_{f}))$

Datasets

The OIMs described above were evaluated based on listeners’ responses to speech from three datasets (Table 1). One – natural – consists of unmodified and modified natural speech. A second dataset, tts, contains speech generated by an HMM-based synthesiser. The third dataset, hurricane, is made up of both natural and synthetic speech. Further details of the listening tests are provided in the articles mentioned in Table 1.

Objective intelligibility predictions

All OIMs were evaluated by inspecting both the Pearson correlation coefficient ρ between mean listener scores and the raw output of the metric, and the standard deviation of the error σ_e, computed as

$σ_{e} = σ_{d} \cdot \sqrt{1 - ρ^{2}}$ where σ_d is the standard deviation of subjective intelligibility scores for a given experimental condition. Statistical comparisons among dependent correlations were conducted using a method described in Meng et al. (1992) based on Chi-squared tests on z-transformed scores.

Discussion

Compared to model-listener correlations reported in the literature for unmodified natural speech or speech processed by noise reduction techniques, the current study highlights a clear reduction in the performance of a representative range of OIMs for modified and synthetic speech. One contributing factor for most OIMs is their inability to predicting intelligibility across different maskers, especially for stationary versus highly fluctuating maskers. Additionally, many OIMs were adversely

Conclusions

In the current study state-of-the-art OIMs that provide good predictions of natural speech performed less well for modified and synthetic speech, especially for those modifications introducing temporal changes. While many OIMs produced reasonable estimates for modified speech in the presence of single masker types, across-noise predictions were generally poor. Methods motivated by masked audibility tended to over-estimate intelligibility for fluctuating maskers and under-estimate

Acknowledgements

This study was supported by the LISTA Project (http://listening-talker.org), funded by the Future and Emerging Technologies programme within the 7th Framework Programme for Research of the European Commission, FET-Open Grant Number 256230. We thank Yannis Stylianou for sharing a MATLAB implementation of ESII, and Cees Taal for making the MATLAB implementation of STOI available online for free access. The implementation of SII is available online at http://www.sii.to while MATLAB implementations

References (87)

C. Christiansen et al.
Prediction of speech intelligibility based on an auditory preprocessing model
Speech Commun.
(2010)
M. Cooke et al.
Evaluating the intelligibility benefit of speech modifications in known noise conditions
Speech Commun.
(2013)
M.P. Cooke et al.
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Commun.
(2001)
A. Gomez et al.
Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio
Speech Commun.
(2012)
J. Ma et al.
SNR loss: a new objective measure for predicting the intelligibility of noise-suppressed speech
Speech Commun.
(2011)
C. Valentini-Botinhao et al.
Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion
Comput. Speech Lang.
(2014)
H. Zen et al.
Statistical parametric speech synthesis
Speech Commun.
(2009)
ANSI S3.5
Methods for the Calculation of the Speech Intelligibility Index
(1997)
V. Aubanel et al.
Information-preserving temporal reallocation of speech in the presence of fluctuating maskers
T. Barnwell
Correlation analysis of subjective and objective measures for speech quality

D. Bonardo et al.

Speech synthesis enhancement in noisy environments

A.R. Bradlow et al.

Speaking clearly for learning-impaired children: sentence perception in noise

J. Speech Hear. Res.

(2003)

H. Brouckxon et al.

An overview of the VUB entry for the 2012 Hurricane Challenge

H. Brouckxon et al.

Time and frequency dependent amplification for speech intelligibility enhancement in noisy environments

G.C. Carter et al.

Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing

IEEE Trans. Audio Electroacoust.

(1973)

J. Chen et al.

New insights into the noise reduction Wiener filter

IEEE Trans. Audio Speech Lang. Process.

(2006)

L.-H. Chen et al.

DNN-based stochastic postfilter for HMM-based speech synthesis

M. Cooke

Modelling Auditory Processing and Organisation

(1993)

M. Cooke

A glimpsing model of speech perception in noise

J. Acoust. Soc. Am.

(2006)

M. Cooke et al.

An audio–visual corpus for speech perception and automatic speech recognition

J. Acoust. Soc. Am.

(2006)

M. Cooke et al.

Intelligibility-enhancing speech modifications: the Hurricane Challenge

T. Dau et al.

A quantitative model of the “effective” signal processing in the auditory system. I. Model structure

J. Acoust. Soc. Am.

(1996)

D. Erro et al.

Implementation of simple spectral techniques to enhance the intelligibility of speech using a harmonic model

D. Erro et al.

Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise

H. Fletcher et al.

The perception of speech and its relation to telephony

J. Acoust. Soc. Am.

(1950)

N.R. French et al.

Factors governing the intelligibility of speech sounds

J. Acoust. Soc. Am.

(1947)

E. Godoy et al.

Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement

N. Hodoshima et al.

Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments

J. Acoust. Soc. Am.

(2006)

I. Holube et al.

Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model

J. Acoust. Soc. Am.

(1996)

Y. Hu et al.

Evaluation of objective quality measures for speech enhancement

IEEE Trans. Audio Speech Lang. Process.

(2008)

Y. Hu et al.

Speech enhancement based on wavelet thresholding the multitaper spectrum

Y. Hu et al.

Evaluation of objective measures for speech enhancement

ISO 389-7

Acoustics – Reference Zero for the Calibration of Audiometric Equipment – Part 7: Reference Threshold of Hearing Under Free-field and Diffuse-field Listening Conditions

(2006)

J. Kates et al.

Coherence and the speech intelligibility index

J. Acoust. Soc. Am.

(2005)

J.M. Kates

On using coherence to measure distortion in hearing aids

J. Acoust. Soc. Am.

(1992)

S. King et al.

The Blizzard Challenge 2010

(2010, September)

K. Kokkinakis et al.

Evaluation of objective measures for quality assessment of reverberant speech

J.C. Krause et al.

Acoustic properties of naturally produced clear speech at normal speaking rates

J. Acoust. Soc. Am.

(2004)

K.D. Kryter

Methods for the calculation and use of the Articulation Index

J. Acoust. Soc. Am.

(1962)

K.D. Kryter

Validation of the articulation index

J. Acoust. Soc. Am.

(1962)

R. Kubichek et al.

Advances in objective voice quality assessment

P.C. Loizou

Speech Enhancement: Theory and Practice

(2013)

J. Ma et al.

Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions

J. Acoust. Soc. Am.

(2009)

Cited by (22)

ASR-based speech intelligibility prediction: A review
2022, Hearing Research
Citation Excerpt :
In contrast to STOI, CSTI computes the per-frequency-band correlations over the entire signal rather than over the short time segments used in STOI. Despite the fact that the STOI has become a common benchmark in the field of speech processing (Gao and Tew, 2015; Marcinek et al., 2021; Van Kuyk et al., 2018), many studies have shown it has a poor performance in conditions like fluctuating noise and reverberation (Relaño-Iborra et al., 2016), modified and synthesized speech (Tang et al., 2016), and additive noise with strong temporal modulation content (Jensen and Taal, 2016; Jørgensen et al., 2015). In addition to these works, there have also been many efforts to tackle STOI’s deficits from different angles and to improve on its SIP performance in different noise and acoustic conditions (Andersen et al., 2017; Jensen and Taal, 2016; Karbasi et al., 2016b; Taghia and Martin, 2014).
Various types of methods and approaches are available to predict the intelligibility of speech signals, but many of these still suffer from two major problems: first, their required prior knowledge, which itself could limit the applicability and lower the objectivity of the method, and second, a low generalization capacity, e.g. across noise types, degradation conditions, and speech material. Automatic speech recognition (ASR) has been suggested as a machine-learning-based component of speech intelligibility prediction (SIP), aiming to ameliorate the shortcomings of other SIP methods. Since their first introduction, ASR-based SIP approaches have been developing at an increasingly rapid pace, were deployed in a range of contexts, and have shown promising performance in many scenarios. Our article provides an overview of this body of research. The main differences between competing methods are highlighted and their benefits are explained next to their limitations. We conclude with an outlook on future work and new related directions.
Nonintrusive objective measurement of speech intelligibility: A review of methodology
2022, Biomedical Signal Processing and Control
Citation Excerpt :
Similar to NISI, NISA has not been evaluated relative to subjective data and thus it is unknown how well it correlates with subjective SI. This requires further investigation because STOI has been reported to have poor accuracy performance when deployed for algorithmically modified speech [99,100]. Karbasi et al. proposed a statistics-based approach that synthesized clean speech features using a statistical model trained with clean speech and incorporated features into an intrusive framework for nonintrusive SI measurement [47].
Speech intelligibility (SI) measurement has attracted great attention in the speech communication community over the last decade. It is a critical consideration for speech enhancement, coding, and transmission, as well as for diagnostics. In particular, nonintrusive SI measures that are realistically applicable without reference speech signals have been growing rapidly. This paper gives a review of methodology of nonintrusive SI measures and aims to show how nonintrusive SI metrics perform relative to intrusive ones, as well as their potential in future speech communication applications. In addition, this paper provides a systematic classification of historical and recently introduced methods in a comprehensive framework with critical comments and comparisons of their advantages and limitations. It considers an extensive and up-to-date bibliography to provide a suitable background and overview of recent advancements. The current SI metric development status is presented in the context of an organized framework with associated analyses and examples of the utility of SI metrics in physiological research and clinical applications. Finally, this paper discusses important emergent and potential future research directions.
Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks
2021, Computer Speech and Language
Citation Excerpt :
The first 150 sentences were used to train the ANN for glimpse detection. While standard measures, such as the SII, and other methods have shown robust accuracy in temporally-stationary noise maskers, their performance tends to decline when handling noises whose intensity significantly varies over time (Rhebergen et al., 2006; Tang et al., 2016). In order to examine the capacity of the proposed method in challenging conditions, nine temporally-fluctuating noise maskers were generated and tested along with speech-shaped noise (SSN) – the only stationary noise masker.
While human listeners can, to some extent, understand the information conveyed by the speech signal when it is mixed with noise, traditional objective intelligibility measures usually fail to operate without a priori knowledge of the clean speech signal. This hence limits the usability of those measures in situations where the clean speech signal is inaccessible. In this paper a glimpse-based method is extended to make speech intelligibility predictions directly from speech-plus-noise mixtures. Using a neural network, the proposed method estimates the time-frequency regions with a local speech-to-noise ratio above a given threshold – known as glimpses – from the mixture signal, instead of separately comparing the speech signal against the noise signal. The number and locations of the glimpses can then be used to produce an intelligibility score. In Experiment I where listener intelligibilities were measured in one stationary and nine fluctuating noise maskers, the predictions produced by the proposed method were highly correlated with the subjective data, with correlation coefficients above 0.90. In Experiment II, with the same neural network trained on normal natural speech as in Experiment I, the proposed method was used to predict the intelligibility of speech signals modified by intelligibility-enhancement algorithms and synthetic speech. The method can still maintain its predictive power by demonstrating a similar performance to its intrusive counterpart with an overall correlation coefficient of 0.81, which is superior to many modern traditional measures evaluated under the same conditions. Therefore, the proposed method can be used to estimate speech intelligibility in place of traditional measures in conditions where their capacity falls short.
Learning static spectral weightings for speech intelligibility enhancement in noise
2018, Computer Speech and Language
Citation Excerpt :
Based on common features of the spectral weightings discovered via optimisation, Section 5 describes the results of a second intelligibility experiment using a number of generic, masker-independent spectral weightings. Tang et al. (2016) reported further significant improvements in the predictive power of the HEGP metric by removing inaudible (sub-threshold) glimpses, and by applying a quasi-logarithmic transformation to the GP value, based on the finding that subjective intelligibility scores reach ceiling for relatively low values of GP (Barker and Cooke, 2007). These extensions increased listener-model correlations from 0.79, 0.71 and 0.53 for the original GP metric to 0.92, 0.83 and 0.87 across three large-scale datasets.
Near-end speech enhancement works by modifying speech prior to presentation in a noisy environment, typically operating under a constraint of limited or no increase in speech level. One issue is the extent to which near-end enhancement techniques require detailed estimates of the masking environment to function effectively. The current study investigated speech modification strategies based on reallocating energy statically across the spectrum using masker-specific spectral weightings. Weighting patterns were learned offline by maximising a glimpse-based objective intelligibility metric. Keyword scores in sentences in the presence of stationary and fluctuating maskers increased, in some cases by very substantial amounts, following the application of masker- and SNR-specific spectral weighting. A second experiment using generic masker-independent spectral weightings that boosted all frequencies above 1 kHz also led to significant gains in most conditions. These findings indicate that energy-neutral spectral weighting is a highly-effective near-end speech enhancement approach that places minimal demands on detailed masker estimation.
A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones
2018, Speech Communication
Citation Excerpt :
This leaves the question of whether the high correlation with the objective scores can be translated to a good match with subjective intelligibility unanswered. There is some evidence (Tang and Cooke, 2012; Tang et al., 2016b) suggesting that STOI lacks predictive accuracy when making predictions for algorithmically-modified speech or across different types of maskers. Based on full-band clarity index C50 (Naylor and Gaubitch, 2010), a data-driven non-intrusive room acoustic estimation method for predicting ASR performance in reverberant conditions was introduced (Peso Parada et al., 2016).
A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation and localisation, with an intrusive objective intelligibility measure (OIM). Therefore, unlike classic intrusive OIMs, this method does not require a clean reference speech signal and knowing the location of the sources to operate. The proposed approach is able to estimate intelligibility in stationary and fluctuating noises, when the noise masker is presented as a point or diffused source, and is spatially separated from the target speech source on a horizontal plane. The performance of the proposed method was evaluated in two rooms. When predicting subjective intelligibility measured as word recognition rate, this method showed reasonable predictive accuracy with correlation coefficients above 0.82, which is comparable to that of a reference intrusive OIM in most of the conditions. The proposed approach offers a solution for fast binaural intelligibility prediction, and therefore has practical potential to be deployed in situations where on-site speech intelligibility is a concern.
Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms
2016, Speech Communication
Citation Excerpt :
The monaural DWGP metric incorporates a distortion weighting factor with the glimpse proportion metric (GP, Cooke, 2006; Tang, 2014). This weighting factor was initially introduced in Tang (2014) to increase the consistency of predictions by the GP metric across different noise maskers, especially between stationary (e.g. speech-shaped noise) and fluctuating (e.g. single-talker competing speech) maskers (Tang et al., 2016). The calculation of the distortion weighting factor was inspired by a STI-based metric, the normalise-covariance metric (Holube and Kollmeier, 1996), which uses the cross-correlation coefficient of the reference clean and noise-corrupted speech envelopes within each frequency band to determine the speech-to-distortion level.
A distortion-weighted glimpse proportion metric (BiDWGP) for predicting binaural speech intelligibility were evaluated in simulated anechoic and reverberant conditions, with and without a noise masker. The predictive performance of BiDWGP was compared to four reference binaural intelligibility metrics, which were extended from the Speech Intelligibility Index (SII) and the Speech Transmission Index (STI). In the anechoic sound field, BiDWGP demonstrated high accuracy in predicting binaural intelligibility for individual maskers (ρ ≥ 0.95) and across maskers (ρ ≥ 0.94). The reference metrics however performed less well in across-masker prediction (0.54 ≤ ρ ≤ 0.86) despite reasonable accuracy for individual maskers. In reverberant rooms, BiDWGP was more stable in all test conditions (ρ ≥ 0.87) than the reference metrics, which showed different predictive patterns: the binaural STIs were more robust for the stationary than for the fluctuating noise masker, whilst the binaural SII displayed the opposite behaviour. The study shows that the new BiDWGP metric can provide similar or even more robust predictive power than the current standard metrics.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Roger K. Moore.

View full text

Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech☆

Highlights

Abstract

Introduction

Section snippets

Speech Intelligibility Index (SII)

Datasets

Objective intelligibility predictions

Discussion

Conclusions

Acknowledgements

Speech Commun.

Speech Commun.

Speech Commun.

Speech Commun.

Speech Commun.

Comput. Speech Lang.

Speech Commun.

Methods for the Calculation of the Speech Intelligibility Index

Information-preserving temporal reallocation of speech in the presence of fluctuating maskers

Correlation analysis of subjective and objective measures for speech quality

Speech synthesis enhancement in noisy environments

Speaking clearly for learning-impaired children: sentence perception in noise

J. Speech Hear. Res.

An overview of the VUB entry for the 2012 Hurricane Challenge

Time and frequency dependent amplification for speech intelligibility enhancement in noisy environments

Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing

IEEE Trans. Audio Electroacoust.

New insights into the noise reduction Wiener filter

IEEE Trans. Audio Speech Lang. Process.

DNN-based stochastic postfilter for HMM-based speech synthesis

Modelling Auditory Processing and Organisation

A glimpsing model of speech perception in noise

J. Acoust. Soc. Am.

An audio–visual corpus for speech perception and automatic speech recognition

J. Acoust. Soc. Am.

Intelligibility-enhancing speech modifications: the Hurricane Challenge

A quantitative model of the “effective” signal processing in the auditory system. I. Model structure

J. Acoust. Soc. Am.

Implementation of simple spectral techniques to enhance the intelligibility of speech using a harmonic model

Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise

The perception of speech and its relation to telephony

J. Acoust. Soc. Am.

Factors governing the intelligibility of speech sounds

J. Acoust. Soc. Am.

Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement

Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments

J. Acoust. Soc. Am.

Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model

J. Acoust. Soc. Am.

Evaluation of objective quality measures for speech enhancement

IEEE Trans. Audio Speech Lang. Process.

Speech enhancement based on wavelet thresholding the multitaper spectrum

Evaluation of objective measures for speech enhancement

Acoustics – Reference Zero for the Calibration of Audiometric Equipment – Part 7: Reference Threshold of Hearing Under Free-field and Diffuse-field Listening Conditions

Coherence and the speech intelligibility index

J. Acoust. Soc. Am.

On using coherence to measure distortion in hearing aids

J. Acoust. Soc. Am.

The Blizzard Challenge 2010

Evaluation of objective measures for quality assessment of reverberant speech

Acoustic properties of naturally produced clear speech at normal speaking rates

J. Acoust. Soc. Am.

Methods for the calculation and use of the Articulation Index

J. Acoust. Soc. Am.

Validation of the articulation index

J. Acoust. Soc. Am.

Advances in objective voice quality assessment

Speech Enhancement: Theory and Practice

Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions

J. Acoust. Soc. Am.