TY - GEN
T1 - Multi-modal action segmentation in the kitchen with a feature fusion approach
AU - Kogure, Shunsuke
AU - Aoki, Yoshimitsu
N1 - Publisher Copyright:
© 2021 SPIE.
PY - 2021
Y1 - 2021
N2 - In this paper, we propose a "Multi-modal Action Segmentation approach"that uses three modalities: (i) video, (ii) audio, (iii) thermal to classify cooking behavior in the kitchen. These 3 modalities are assumed to be features related to cooking. However, there is no public dataset containing these three modalities. Therefore, we built the original dataset and frame-level annotation. We then examined the usefulness of Action Segmentation using multi-modal features. We analyzed the effects of each modality using three evaluation metrics. As a result, the accuracy, edit distance, and F1 value were improved by up to about 1%, 2%, and 8%, respectively, compared to the case when only images were used.
AB - In this paper, we propose a "Multi-modal Action Segmentation approach"that uses three modalities: (i) video, (ii) audio, (iii) thermal to classify cooking behavior in the kitchen. These 3 modalities are assumed to be features related to cooking. However, there is no public dataset containing these three modalities. Therefore, we built the original dataset and frame-level annotation. We then examined the usefulness of Action Segmentation using multi-modal features. We analyzed the effects of each modality using three evaluation metrics. As a result, the accuracy, edit distance, and F1 value were improved by up to about 1%, 2%, and 8%, respectively, compared to the case when only images were used.
KW - Action Segmentation
KW - Computer Vision
KW - Dataset Construction
KW - Machine Learning
KW - Multi-modal Learning
UR - http://www.scopus.com/inward/record.url?scp=85112431796&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112431796&partnerID=8YFLogxK
U2 - 10.1117/12.2591752
DO - 10.1117/12.2591752
M3 - Conference contribution
AN - SCOPUS:85112431796
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Fifteenth International Conference on Quality Control by Artificial Vision
A2 - Terada, Kenji
A2 - Nakamura, Akio
A2 - Komuro, Takashi
A2 - Shimizu, Tsuyoshi
PB - SPIE
T2 - 15th International Conference on Quality Control by Artificial Vision
Y2 - 12 May 2021 through 14 May 2021
ER -