Multi-modal action segmentation in the kitchen with a feature fusion approach

Shunsuke Kogure, Yoshimitsu Aoki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a "Multi-modal Action Segmentation approach"that uses three modalities: (i) video, (ii) audio, (iii) thermal to classify cooking behavior in the kitchen. These 3 modalities are assumed to be features related to cooking. However, there is no public dataset containing these three modalities. Therefore, we built the original dataset and frame-level annotation. We then examined the usefulness of Action Segmentation using multi-modal features. We analyzed the effects of each modality using three evaluation metrics. As a result, the accuracy, edit distance, and F1 value were improved by up to about 1%, 2%, and 8%, respectively, compared to the case when only images were used.

Original languageEnglish
Title of host publicationFifteenth International Conference on Quality Control by Artificial Vision
EditorsKenji Terada, Akio Nakamura, Takashi Komuro, Tsuyoshi Shimizu
PublisherSPIE
ISBN (Electronic)9781510644267
DOIs
Publication statusPublished - 2021
Event15th International Conference on Quality Control by Artificial Vision - Tokushima, Virtual, Japan
Duration: 2021 May 122021 May 14

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume11794
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference15th International Conference on Quality Control by Artificial Vision
Country/TerritoryJapan
CityTokushima, Virtual
Period21/5/1221/5/14

Keywords

  • Action Segmentation
  • Computer Vision
  • Dataset Construction
  • Machine Learning
  • Multi-modal Learning

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multi-modal action segmentation in the kitchen with a feature fusion approach'. Together they form a unique fingerprint.

Cite this