Immersive soundscape reconstruction using contextualized visual recognition with deep neural network

Authors
Huang, Mincong
ORCID
Loading...
Thumbnail Image
Other Contributors
Braasch, Jonas
Xiang, Ning
Krueger, Ted (Theodore Edward), 1954-
Issue Date
2020-08
Keywords
Architectural sciences
Degree
MS
Terms of Use
This electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
Full Citation
Abstract
The use of visual environments to generate corresponding acoustic environments has been of interest in audiovisual fusion research. The scope of works involved are currently limited by user-centered virtual environments with high computational demands. In this work, an immersive soundscape rendering system is developed using machine-learning-based visual recognition techniques. This system utilizes a hand-crafted panoramic image dataset, with their contents identified using pre-trained neural network models for semantic segmentation and object detection. The recognition process extracts spatial information of sound-generating elements in visual environments that are used to position and orient virtual sound sources and locate corresponding contents in pre-assembled audio datasets that consist of both synthetic sounds and pre-recorded audio. This process facilitates a plausible audiovisual rendering schema that could be presented both in binaural format and at the Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab) at Rensselaer Polytechnic Institute. This work intends to situate and enhance audiovisual fusion in human-scale and immersive context.
Description
August 2020
School of Architecture
Department
School of Architecture
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Relationships
Rensselaer Theses and Dissertations Online Collection
Access
Restricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.