Simultaneous Execution of Dereverberation, Denoising, and Speaker Separation Using a Neural Beamformer for Adapting Robots to Real Environments

Daichi Nagano, Kazuo Nakazawa

研究成果: Article査読

抄録

It remains challenging for robots to accurately per-form sound source localization and speech recognition in a real environment with reverberation, noise, and the voices of multiple speakers. Accordingly, we propose “U-TasNet-Beam,” a speech extraction method for extracting only the target speaker’s voice from all ambient sounds in a real environment. U-TasNet-Beam is a neural beamformer comprising three ele-ments: a neural network for removing reverberation and noise, a second neural network for separating the voices of multiple speakers, and a minimum variance distortionless response (MVDR) beamformer. Experiments with simulated data and recorded data show that the proposed U-TasNet-Beam can improve the accuracy of sound source localization and speech recognition in robots compared to the conventional methods in a noisy, reverberant, and multi-speaker environ-ment. In addition, we propose the spatial correlation matrix loss (SCM loss) as a loss function for the neural network learning the spatial information of the sound. By using the SCM loss, we can improve the speech extraction performance of the neural beamformer.

本文言語English
ページ(範囲)1399-1410
ページ数12
ジャーナルJournal of Robotics and Mechatronics
34
6
DOI
出版ステータスPublished - 2022 12月

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 電子工学および電気工学

フィンガープリント

「Simultaneous Execution of Dereverberation, Denoising, and Speaker Separation Using a Neural Beamformer for Adapting Robots to Real Environments」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル