Multi-modal action segmentation in the kitchen with a feature fusion approach

Shunsuke Kogure, Yoshimitsu Aoki

研究成果: Conference contribution

抄録

In this paper, we propose a "Multi-modal Action Segmentation approach"that uses three modalities: (i) video, (ii) audio, (iii) thermal to classify cooking behavior in the kitchen. These 3 modalities are assumed to be features related to cooking. However, there is no public dataset containing these three modalities. Therefore, we built the original dataset and frame-level annotation. We then examined the usefulness of Action Segmentation using multi-modal features. We analyzed the effects of each modality using three evaluation metrics. As a result, the accuracy, edit distance, and F1 value were improved by up to about 1%, 2%, and 8%, respectively, compared to the case when only images were used.

本文言語English
ホスト出版物のタイトルFifteenth International Conference on Quality Control by Artificial Vision
編集者Kenji Terada, Akio Nakamura, Takashi Komuro, Tsuyoshi Shimizu
出版社SPIE
ISBN(電子版)9781510644267
DOI
出版ステータスPublished - 2021
イベント15th International Conference on Quality Control by Artificial Vision - Tokushima, Virtual, Japan
継続期間: 2021 5月 122021 5月 14

出版物シリーズ

名前Proceedings of SPIE - The International Society for Optical Engineering
11794
ISSN(印刷版)0277-786X
ISSN(電子版)1996-756X

Conference

Conference15th International Conference on Quality Control by Artificial Vision
国/地域Japan
CityTokushima, Virtual
Period21/5/1221/5/14

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 凝縮系物理学
  • コンピュータ サイエンスの応用
  • 応用数学
  • 電子工学および電気工学

フィンガープリント

「Multi-modal action segmentation in the kitchen with a feature fusion approach」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル