TY - GEN
T1 - Joint analysis of acoustic events and scenes based on multitask learning
AU - Tonami, Noriyuki
AU - Imoto, Keisuke
AU - Niitsuma, Masahiro
AU - Yamanishi, Ryosuke
AU - Yamashita, Yoichi
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP19K20304.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene "office, " the acoustic events "mouse clicking" and "keyboard typing" are likely to occur. In this paper, we propose multitask learning for joint analysis of acoustic events and scenes, which shares the parts of the networks holding information on acoustic events and scenes in common. By integrating the two networks, we also expect that information on acoustic scenes will improve the performance of acoustic event detection. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of acoustic event detection by 10.66 percentage points in terms of the F-score, compared with a conventional method based on a convolutional recurrent neural network.
AB - Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene "office, " the acoustic events "mouse clicking" and "keyboard typing" are likely to occur. In this paper, we propose multitask learning for joint analysis of acoustic events and scenes, which shares the parts of the networks holding information on acoustic events and scenes in common. By integrating the two networks, we also expect that information on acoustic scenes will improve the performance of acoustic event detection. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of acoustic event detection by 10.66 percentage points in terms of the F-score, compared with a conventional method based on a convolutional recurrent neural network.
KW - Acoustic event detection
KW - acoustic scene classification
KW - convolutional recurrent neural network
KW - multitask learning
UR - http://www.scopus.com/inward/record.url?scp=85078021593&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078021593&partnerID=8YFLogxK
U2 - 10.1109/WASPAA.2019.8937196
DO - 10.1109/WASPAA.2019.8937196
M3 - Conference contribution
AN - SCOPUS:85078021593
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 338
EP - 342
BT - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
Y2 - 20 October 2019 through 23 October 2019
ER -