LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding

Yanan Wang, Jianming Wu, Jinfa Huang, Gen Hattori, Yasuhiro Takishima, Shinya Wada, Rui Kimura, Jie Chen, Satoshi Kurihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Group cohesiveness reflects the level of intimacy that people feel with each other, and the development of a dialogue robot that can understand group cohesiveness will lead to the promotion of human communication. However, group cohesiveness is a complex concept that is difficult to predict based only on image pixels. Inspired by the fact that humans intuitively associate linguistic knowledge accumulated in the brain with the visual images they see, we propose a linguistic knowledge injectable deep neural network (LDNN) that builds a visual model (visual LDNN) for predicting group cohesiveness that can automatically associate the linguistic knowledge hidden behind images. LDNN consists of a visual encoder and a language encoder, and applies domain adaptation and linguistic knowledge transition mechanisms to transform linguistic knowledge from a language model to the visual LDNN. We train LDNN by adding descriptions to the training and validation sets of the Group AFfect Dataset 3.0 (GAF 3.0), and test the visual LDNN without any description. Comparing visual LDNN with various fine-tuned DNN models and three state-of-the-art models in the test set, the results demonstrate that the visual LDNN not only improves the performance of the fine-tuned DNN model leading to an MSE very similar to the state-of-the-art model, but is also a practical and efficient method that requires relatively little preprocessing. Furthermore, ablation studies confirm that LDNN is an effective method to inject linguistic knowledge into visual models.

Original languageEnglish
Title of host publicationICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery, Inc
Pages343-350
Number of pages8
ISBN (Electronic)9781450375818
DOIs
Publication statusPublished - 2020 Oct 21
Event22nd ACM International Conference on Multimodal Interaction, ICMI 2020 - Virtual, Online, Netherlands
Duration: 2020 Oct 252020 Oct 29

Publication series

NameICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction

Conference

Conference22nd ACM International Conference on Multimodal Interaction, ICMI 2020
Country/TerritoryNetherlands
CityVirtual, Online
Period20/10/2520/10/29

Keywords

  • affective computing
  • human interaction
  • machine learning for multimodal interaction
  • multimodal fusion and representation

ASJC Scopus subject areas

  • Hardware and Architecture
  • Human-Computer Interaction
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding'. Together they form a unique fingerprint.

Cite this