TY - GEN
T1 - LDNN
T2 - 22nd ACM International Conference on Multimodal Interaction, ICMI 2020
AU - Wang, Yanan
AU - Wu, Jianming
AU - Huang, Jinfa
AU - Hattori, Gen
AU - Takishima, Yasuhiro
AU - Wada, Shinya
AU - Kimura, Rui
AU - Chen, Jie
AU - Kurihara, Satoshi
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/21
Y1 - 2020/10/21
N2 - Group cohesiveness reflects the level of intimacy that people feel with each other, and the development of a dialogue robot that can understand group cohesiveness will lead to the promotion of human communication. However, group cohesiveness is a complex concept that is difficult to predict based only on image pixels. Inspired by the fact that humans intuitively associate linguistic knowledge accumulated in the brain with the visual images they see, we propose a linguistic knowledge injectable deep neural network (LDNN) that builds a visual model (visual LDNN) for predicting group cohesiveness that can automatically associate the linguistic knowledge hidden behind images. LDNN consists of a visual encoder and a language encoder, and applies domain adaptation and linguistic knowledge transition mechanisms to transform linguistic knowledge from a language model to the visual LDNN. We train LDNN by adding descriptions to the training and validation sets of the Group AFfect Dataset 3.0 (GAF 3.0), and test the visual LDNN without any description. Comparing visual LDNN with various fine-tuned DNN models and three state-of-the-art models in the test set, the results demonstrate that the visual LDNN not only improves the performance of the fine-tuned DNN model leading to an MSE very similar to the state-of-the-art model, but is also a practical and efficient method that requires relatively little preprocessing. Furthermore, ablation studies confirm that LDNN is an effective method to inject linguistic knowledge into visual models.
AB - Group cohesiveness reflects the level of intimacy that people feel with each other, and the development of a dialogue robot that can understand group cohesiveness will lead to the promotion of human communication. However, group cohesiveness is a complex concept that is difficult to predict based only on image pixels. Inspired by the fact that humans intuitively associate linguistic knowledge accumulated in the brain with the visual images they see, we propose a linguistic knowledge injectable deep neural network (LDNN) that builds a visual model (visual LDNN) for predicting group cohesiveness that can automatically associate the linguistic knowledge hidden behind images. LDNN consists of a visual encoder and a language encoder, and applies domain adaptation and linguistic knowledge transition mechanisms to transform linguistic knowledge from a language model to the visual LDNN. We train LDNN by adding descriptions to the training and validation sets of the Group AFfect Dataset 3.0 (GAF 3.0), and test the visual LDNN without any description. Comparing visual LDNN with various fine-tuned DNN models and three state-of-the-art models in the test set, the results demonstrate that the visual LDNN not only improves the performance of the fine-tuned DNN model leading to an MSE very similar to the state-of-the-art model, but is also a practical and efficient method that requires relatively little preprocessing. Furthermore, ablation studies confirm that LDNN is an effective method to inject linguistic knowledge into visual models.
KW - affective computing
KW - human interaction
KW - machine learning for multimodal interaction
KW - multimodal fusion and representation
UR - http://www.scopus.com/inward/record.url?scp=85096668417&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096668417&partnerID=8YFLogxK
U2 - 10.1145/3382507.3418830
DO - 10.1145/3382507.3418830
M3 - Conference contribution
AN - SCOPUS:85096668417
T3 - ICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction
SP - 343
EP - 350
BT - ICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction
PB - Association for Computing Machinery, Inc
Y2 - 25 October 2020 through 29 October 2020
ER -