TY - GEN
T1 - Alleviating over-segmentation errors by detecting action boundaries
AU - Ishikawa, Yuchi
AU - Kasai, Seito
AU - Aoki, Yoshimitsu
AU - Kataoka, Hirokatsu
N1 - Funding Information:
Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/1
Y1 - 2021/1
N2 - We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code is publicly available1.
AB - We propose an effective framework for the temporal action segmentation task, namely an Action Segment Refinement Framework (ASRF). Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB). The long-term feature extractor provides shared features for the two branches with a wide temporal receptive field. The ASB classifies video frames with action classes, while the BRB regresses the action boundary probabilities. The action boundaries predicted by the BRB refine the output from the ASB, which results in a significant performance improvement. Our contributions are three-fold: (i) We propose a framework for temporal action segmentation, the ASRF, which divides temporal action segmentation into frame-wise action classification and action boundary regression. Our framework refines frame-level hypotheses of action classes using predicted action boundaries. (ii) We propose a loss function for smoothing the transition of action probabilities, and analyze combinations of various loss functions for temporal action segmentation. (iii) Our framework outperforms state-of-the-art methods on three challenging datasets, offering an improvement of up to 13.7% in terms of segmental edit distance and up to 16.1% in terms of segmental F1 score. Our code is publicly available1.
UR - http://www.scopus.com/inward/record.url?scp=85113888695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113888695&partnerID=8YFLogxK
U2 - 10.1109/WACV48630.2021.00237
DO - 10.1109/WACV48630.2021.00237
M3 - Conference contribution
AN - SCOPUS:85113888695
T3 - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
SP - 2321
EP - 2330
BT - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Y2 - 5 January 2021 through 9 January 2021
ER -