Latent-Space Data Augmentation for Visually-Grounded Language Understanding

Aly Magassouba, Komei Sugiura, Hisashi Kawai

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This is an extension from a selected paper from JSAI2019. In this paper, we study data augmentation for visually-grounded language understanding in the context of picking task. A typical picking task consists of predicting a target object specified by an ambiguous instruction,e.g., “Pick up the yellow toy near the bottle”. We specifically show that existing methods for understanding such an instruction can be improved by data augmentation. More explicitly, MCTM [1] and MTCM-GAN [2] show better results with data augmentation when specifically considering latent space features instead of raw features. Additionally our results show that latent-space data augmentation can improve better a network accuracy than regularization methods.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence - Selected Papers from the Annual Conference of Japanese Society of Artificial Intelligence JSAI 2019
EditorsYukio Ohsawa, Katsutoshi Yada, Takayuki Ito, Yasufumi Takama, Eri Sato-Shimokawara, Akinori Abe, Junichiro Mori, Naohiro Matsumura
Number of pages9
ISBN (Print)9783030398774
Publication statusPublished - 2020
Externally publishedYes
Event33rd Annual Conference of the Japanese Society for Artificial Intelligence, JSAI 2019 - Niigata, Japan
Duration: 2019 Jun 42019 Jun 7

Publication series

NameAdvances in Intelligent Systems and Computing
Volume1128 AISC
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365


Conference33rd Annual Conference of the Japanese Society for Artificial Intelligence, JSAI 2019


  • Domestic service robots
  • Multimodal language understanding

ASJC Scopus subject areas

  • Control and Systems Engineering
  • General Computer Science


Dive into the research topics of 'Latent-Space Data Augmentation for Visually-Grounded Language Understanding'. Together they form a unique fingerprint.

Cite this