Immersive soundscape reconstruction using contextualized visual recognition with deep neural network

Loading...
Thumbnail Image
Authors
Huang, Mincong
Issue Date
2020-08
Type
Electronic thesis
Thesis
Language
ENG
Keywords
Architectural sciences
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
The use of visual environments to generate corresponding acoustic environments has been of interest in audiovisual fusion research. The scope of works involved are currently limited by user-centered virtual environments with high computational demands. In this work, an immersive soundscape rendering system is developed using machine-learning-based visual recognition techniques. This system utilizes a hand-crafted panoramic image dataset, with their contents identified using pre-trained neural network models for semantic segmentation and object detection. The recognition process extracts spatial information of sound-generating elements in visual environments that are used to position and orient virtual sound sources and locate corresponding contents in pre-assembled audio datasets that consist of both synthetic sounds and pre-recorded audio. This process facilitates a plausible audiovisual rendering schema that could be presented both in binaural format and at the Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab) at Rensselaer Polytechnic Institute. This work intends to situate and enhance audiovisual fusion in human-scale and immersive context.
Description
August 2020
School of Architecture
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN