Abstract
We propose a novel three-dimensional (3D)-convolution method, cv3dconv, for detecting spatiotemporal features from videos. It reduces the number of sum-of-products of 3D convolution by thousands of times by assuming the constant moving velocity of the camera. We observed that a specific class of video sequences, such as those captured by an in-vehicle camera, can be well approximated with piece-wise linear movements of 2D features in the temporal dimension. Our principal finding is that the 3D kernel, represented by the constant-velocity, can be decomposed into a convolution of a 2D kernel representing the shapes and a 3D kernel representing the velocity. We derived the efficient recursive algorithm for this class of 3D convolution which is exceptionally suited for sparse data, and this parameterized decomposed representation imposes a structured regularization along the temporal direction. We experimentally verified the validity of our approximation using a controlled dataset, and we also showed the effectiveness of cv3dconv for the visual odometry estimation task using real event camera data captured in urban road scene.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 International Conference on 3D Vision, 3DV 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 343-351 |
Number of pages | 9 |
ISBN (Electronic) | 9781538684252 |
DOIs | |
Publication status | Published - 2018 Oct 12 |
Event | 6th International Conference on 3D Vision, 3DV 2018 - Verona, Italy Duration: 2018 Sept 5 → 2018 Sept 8 |
Other
Other | 6th International Conference on 3D Vision, 3DV 2018 |
---|---|
Country/Territory | Italy |
City | Verona |
Period | 18/9/5 → 18/9/8 |
Keywords
- 3D convolution
- Constant velocity
- Fourier transform
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Computer Vision and Pattern Recognition