Quantifying the degradation of automatic speech recognition for reverberant environments
Author
Secules, Stephen D.Other Contributors
Braasch, Jonas; Calamia, Paul T.; Xiang, Ning;Date Issued
2008-08Subject
AcousticsDegree
MS;Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.; Attribution-NonCommercial-NoDerivs 3.0 United StatesMetadata
Show full item recordAbstract
In general, automatic speech recognition (ASR) algorithms are designed for use with pure, anechoic signals. Although ASR systems approach a human’s intelligibility for clean signals, it is well acknowledged that a downfall of speech recognition systems is that recognition accuracy depreciates dramatically for signals with reverberation. Little is known about the specific character of this depreciation, its primary causes (e.g. room geometry, reverberation strength) or effects (blurring of syllables, plosives, consonants). The focus of this study is to precisely quantify the depreciation of speech recognition accuracy for reverberant signals using a black box experiment to vary reverberation characteristics and observe speech recognition accuracy. The methodology tests two speech recognition platforms on a standard recognition task for human speech perception, testing small vocabulary sets of similar sounding words. A range of reverberant settings was simulated by convolution with an impulse response. The artificial reverberant settings were developed using an image source model of simple geometries and small rooms, varying the average room absorption. The impulse response was also divided into various energy balances of early reflections and late energy, to determine the contribution to recognition depreciation from each component. The recognizers had the least reverberant recognition accuracy for words which only differed by their ending consonants. The depreciation of recognition accuracy from early reflections alone was lower than the overall room effect; however the overall depreciation with respect to the absorption coefficient was well predicted by the strength of the reverberant tail. The results were compared to the results of prior research. The reported results will help to characterize the problem for the automatic speech recognition community, and serve as a model for further precise investigation of the effects of room acoustics on developed algorithms.;Description
August 2008; School of ArchitectureDepartment
School of Architecture;Publisher
Rensselaer Polytechnic Institute, Troy, NYRelationships
Rensselaer Theses and Dissertations Online Collection;Access
CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.;Collections
Except where otherwise noted, this item's license is described as CC BY-NC-ND. Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.