Can Machines and Humans Use Negation When Describing Images?

Yuri Sato, Koji Mineshima

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Can negation be depicted? It has been claimed in various areas, including philosophy, cognitive science, and AI, that depicting negation through visual expressions such images and pictures is challenging. Recent empirical findings have shown that humans can indeed understand certain images as expressing negation, whereas this ability is not exhibited by machine learning models trained on image data. To elucidate the computational ability underlying the understanding of negation in images, this study first focuses on the image captioning task, specifically the performance of models pre-trained on large linguistic and image datasets for generating text from images. Our experiment demonstrates that a state-of-the-art model achieves some success in generating consistent captions from images, particularly in photographs rather than illustrations. However, when it comes to generating captions containing negation from images, the model is not as proficient as humans. To further investigate the performance of machine learning models in a more controlled setting, we conducted an additional analysis using a Visual Question Answering (VQA) task. This task enables us to specify where in the image the model should focus its attention when answering a question. As a result of this setting, the model’s performance was improved. These results will shed light on the disparities in the attentional focus between humans and machine learning models.

Original languageEnglish
Title of host publicationHuman and Artificial Rationalities - 2nd International Conference, HAR 2023, Proceedings
EditorsJean Baratgin, Baptiste Jacquet, Hiroshi Yama
PublisherSpringer Science and Business Media Deutschland GmbH
Pages39-47
Number of pages9
ISBN (Print)9783031552441
DOIs
Publication statusPublished - 2024
Event2nd International Conference on Human and Artificial Rationalities, HAR 2023 - Paris, France
Duration: 2023 Sept 192023 Sept 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14522 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Human and Artificial Rationalities, HAR 2023
Country/TerritoryFrance
CityParis
Period23/9/1923/9/22

Keywords

  • cognitive science
  • grounding
  • image captioning
  • machine learning
  • negation
  • visual question answering

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Can Machines and Humans Use Negation When Describing Images?'. Together they form a unique fingerprint.

Cite this