VAE-Based Adversarial Multimodal Domain Transfer for Video-Level Sentiment Analysis

Yanan Wang, Jianming Wu, Kazuaki Furumai, Shinya Wada, Satoshi Kurihara

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


Video-level sentiment analysis is a challenging task and requires systems to obtain discriminative multimodal representations that can capture difference in sentiments across various modalities. However, due to diverse distributions of various modalities and the unified multimodal labels are not always adaptable to unimodal learning, the distance difference between unimodal representations increases, and prevents systems from learning discriminative multimodal representations. In this paper, to obtain more discriminative multimodal representations that can further improve systems' performance, we propose a VAE-based adversarial multimodal domain transfer (VAE-AMDT) and jointly train it with a multi-attention module to reduce the distance difference between unimodal representations. We first perform variational autoencoder (VAE) to make visual, linguistic and acoustic representations follow a common distribution, and then introduce adversarial training to transfer all unimodal representations to a joint embedding space. As a result, we fuse various modalities on this joint embedding space via the multi-attention module, which consists of self-attention, cross-attention and triple-attention for highlighting important sentimental representations over time and modality. Our method improves F1-score of the state-of-the-art by 3.6% on MOSI and 2.9% on MOSEI datasets, and prove its efficacy in obtaining discriminative multimodal representations for video-level sentiment analysis.

Original languageEnglish
Pages (from-to)51315-51324
Number of pages10
JournalIEEE Access
Publication statusPublished - 2022


  • Adversarial training
  • Domain adaptation
  • Multimodal representation learning
  • Variational auto-encoder (VAE)

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)
  • Electrical and Electronic Engineering


Dive into the research topics of 'VAE-Based Adversarial Multimodal Domain Transfer for Video-Level Sentiment Analysis'. Together they form a unique fingerprint.

Cite this