• Login
    View Item 
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Open Access
    • View Item
    •   DSpace@RPI Home
    • Rensselaer Libraries
    • RPI Theses Open Access
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Quantifying the degradation of automatic speech recognition for reverberant environments

    Author
    Secules, Stephen D.
    Thumbnail
    View/Open
    11447_Abstract_official.pdf (60.00Kb)
    11448_Thesis_062808.pdf (2.451Mb)
    Other Contributors
    Braasch, Jonas; Calamia, Paul T.; Xiang, Ning;
    Date Issued
    2008-08
    Subject
    Acoustics
    Degree
    MS;
    Terms of Use
    This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.;
    Metadata
    Show full item record
    URI
    https://hdl.handle.net/20.500.13015/558
    Abstract
    In general, automatic speech recognition (ASR) algorithms are designed for use with pure, anechoic signals. Although ASR systems approach a human’s intelligibility for clean signals, it is well acknowledged that a downfall of speech recognition systems is that recognition accuracy depreciates dramatically for signals with reverberation. Little is known about the specific character of this depreciation, its primary causes (e.g. room geometry, reverberation strength) or effects (blurring of syllables, plosives, consonants). The focus of this study is to precisely quantify the depreciation of speech recognition accuracy for reverberant signals using a black box experiment to vary reverberation characteristics and observe speech recognition accuracy. The methodology tests two speech recognition platforms on a standard recognition task for human speech perception, testing small vocabulary sets of similar sounding words. A range of reverberant settings was simulated by convolution with an impulse response. The artificial reverberant settings were developed using an image source model of simple geometries and small rooms, varying the average room absorption. The impulse response was also divided into various energy balances of early reflections and late energy, to determine the contribution to recognition depreciation from each component. The recognizers had the least reverberant recognition accuracy for words which only differed by their ending consonants. The depreciation of recognition accuracy from early reflections alone was lower than the overall room effect; however the overall depreciation with respect to the absorption coefficient was well predicted by the strength of the reverberant tail. The results were compared to the results of prior research. The reported results will help to characterize the problem for the automatic speech recognition community, and serve as a model for further precise investigation of the effects of room acoustics on developed algorithms.;
    Description
    August 2008; School of Architecture
    Department
    School of Architecture;
    Publisher
    Rensselaer Polytechnic Institute, Troy, NY
    Relationships
    Rensselaer Theses and Dissertations Online Collection;
    Access
    Users may download and share copies with attribution in accordance with a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. No commercial use or derivatives are permitted without the explicit approval of the author.;
    Collections
    • RPI Theses Online (Complete)
    • RPI Theses Open Access

    Browse

    All of DSpace@RPICommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2022  DuraSpace
    Contact Us | Send Feedback
    DSpace Express is a service operated by 
    Atmire NV