TY - GEN
T1 - Residual learning of video frame interpolation using convolutional LSTM
AU - Suzuki, Keito
AU - Ikehara, Masaaki
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2020
Y1 - 2020
N2 - Video frame interpolation aims to generate intermediate frames between the original frames. This produces videos with a higher frame r ate and creates smoother motion. Many video frame interpolation methods first estimate the motion vector between the input frames and then synthesizes the intermediate frame based on the motion. However, these methods rely on the accuracy of the motion estimation step and fail to accurately generate the interpolated frame when the estimated motion vectors are inaccurate. Therefore, to avoid the uncertainties caused by motion estimation, this paper proposes a method that directly generates the intermediate frame. Since two consecutive frames are relatively similar, our method takes the average of these two frames and utilizes residual learning to learn the difference between the average of these frames and the ground truth middle frame. In addition, our method uses Convolutional LSTMs and four input frames to better incorporate spatiotemporal information. This neural network can be easily trained end to end without difficult to obtain data such as optical flow. Our experimental results show that the proposed method can perform favorably against other state-of-the-art frame interpolation methods.
AB - Video frame interpolation aims to generate intermediate frames between the original frames. This produces videos with a higher frame r ate and creates smoother motion. Many video frame interpolation methods first estimate the motion vector between the input frames and then synthesizes the intermediate frame based on the motion. However, these methods rely on the accuracy of the motion estimation step and fail to accurately generate the interpolated frame when the estimated motion vectors are inaccurate. Therefore, to avoid the uncertainties caused by motion estimation, this paper proposes a method that directly generates the intermediate frame. Since two consecutive frames are relatively similar, our method takes the average of these two frames and utilizes residual learning to learn the difference between the average of these frames and the ground truth middle frame. In addition, our method uses Convolutional LSTMs and four input frames to better incorporate spatiotemporal information. This neural network can be easily trained end to end without difficult to obtain data such as optical flow. Our experimental results show that the proposed method can perform favorably against other state-of-the-art frame interpolation methods.
UR - http://www.scopus.com/inward/record.url?scp=85110494622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110494622&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9412470
DO - 10.1109/ICPR48806.2021.9412470
M3 - Conference contribution
AN - SCOPUS:85110494622
T3 - Proceedings - International Conference on Pattern Recognition
SP - 1499
EP - 1504
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -