TY - GEN
T1 - In-Plane Rotation-Aware Monocular Depth Estimation Using SLAM
AU - Saito, Yuki
AU - Hachiuma, Ryo
AU - Yamaguchi, Masahiro
AU - Saito, Hideo
N1 - Publisher Copyright:
© 2020, Springer Nature Singapore Pte Ltd.
PY - 2020
Y1 - 2020
N2 - Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This movement imposes perturbations on learning-based methods because gravity direction is considered to be strong prior to CNN depth estimation (i.e., the top region of an image has a relatively large depth, whereas bottom region tends to have a small depth). To overcome this crucial weakness in depth estimation with CNN, we propose a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM). For the experiment, we used public datasets and also created our own dataset composed of mostly in-plane roll camera movements. Evaluation results on these datasets show the effectiveness of our approach.
AB - Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This movement imposes perturbations on learning-based methods because gravity direction is considered to be strong prior to CNN depth estimation (i.e., the top region of an image has a relatively large depth, whereas bottom region tends to have a small depth). To overcome this crucial weakness in depth estimation with CNN, we propose a simple but effective refining method that incorporates in-plane roll alignment using camera poses of monocular Simultaneous Localization and Mapping (SLAM). For the experiment, we used public datasets and also created our own dataset composed of mostly in-plane roll camera movements. Evaluation results on these datasets show the effectiveness of our approach.
KW - Convolutional Neural Network
KW - Monocular depth estimation
KW - Simultaneous Localization and Mapping
UR - http://www.scopus.com/inward/record.url?scp=85090040625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090040625&partnerID=8YFLogxK
U2 - 10.1007/978-981-15-4818-5_23
DO - 10.1007/978-981-15-4818-5_23
M3 - Conference contribution
AN - SCOPUS:85090040625
SN - 9789811548178
T3 - Communications in Computer and Information Science
SP - 305
EP - 317
BT - Frontiers of Computer Vision - 26th International Workshop, IW-FCV 2020, Revised Selected Papers
A2 - Ohyama, Wataru
A2 - Jung, Soon Ki
PB - Springer
T2 - International Workshop on Frontiers of Computer Vision, IW-FCV 2020
Y2 - 20 February 2020 through 22 February 2020
ER -