Evaluation of deep-learning speech recognizer in reverberation
Loading...
Authors
Welch, Marcus
Issue Date
2025-08
Type
Electronic thesis
Thesis
Thesis
Language
en_US
Keywords
Architecture
Alternative Title
Abstract
In the last decade, automatic speech recognition (ASR) systems have improved dramatically,even achieving human parity or superhuman ability for recognition accuracy in some cases.
This is due in part to the widespread adoption of deep-learning techniques. As this technology
continues to better approximate the human auditory system, it becomes more valuable
as a stand-in for human test subjects. In this study, OpenAI’s Whisper, an industry-leading
ASR system with deep learning architecture is assessed in reverberation in an effort to map
predictive speech intelligibility measures onto measured intelligibility, at a scale too big for
running listening tests with humans. Various measured and synthetic room impulse responses
with different levels of early and late reverberant energy are used to test the ASR
systems. These impulse responses are assessed on various established speech intelligibility
prediction approaches used in the architectural acoustics science, including the popular
speech transmission index (STI) and clarity (C50). The word error rates (WERs) produced
by the ASR system give an objective measurement of recognizer accuracy, and allow a step
toward the unification of these predictive metrics onto a much more understandable common
scale. For the measured, binaural room impulse responses, WERs were shown to correlate
very strongly with reverberation time (r_left = 0.9979, r_right = 0.998 and STI (r_left = −0.9783,
r_right = −0.9669). The correlation with clarity C50 was relatively weaker, but still strong
(r_left = −0.7841, r_right = −0.7509). In the synthetic room impulse response test, WERs
correlated less strongly with reverberation time (r = 0.7524), but very strongly with clarity
and STI (r = −0.9896, r = −0.9545).
Description
August2025
School of Architecture
School of Architecture
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY