TY - GEN
T1 - Training Large Kernel Convolutions with Resized Filters and Smaller Images
AU - Fukuzaki, Shota
AU - Ikehara, Masaaki
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Convolution is an essential component in neural networks for vision tasks. Convolutions on large receptive fields are more suitable than small kernel convolutions when aggregating global features in convolutional neural networks. However, large kernel convolutions require much computation and memory usage, slowing training neural networks. Then, we propose to train convolution weights with small images, resizing the convolution filters. While this idea shortens the time for training filters, simply applying this causes profound degradation. In this paper, we introduce four techniques that suppress degradation; weight scaling, removing Batch Normalization, defining a minimum resolution, and training with various-size images. In our experiment, we apply our proposals to train an image classification model based on RepLKNet-B on the image classification task of the CIFAR-100 dataset. Training with our proposals is approximately eight times faster than conventional training on the target spatial scale, keeping its accuracy.
AB - Convolution is an essential component in neural networks for vision tasks. Convolutions on large receptive fields are more suitable than small kernel convolutions when aggregating global features in convolutional neural networks. However, large kernel convolutions require much computation and memory usage, slowing training neural networks. Then, we propose to train convolution weights with small images, resizing the convolution filters. While this idea shortens the time for training filters, simply applying this causes profound degradation. In this paper, we introduce four techniques that suppress degradation; weight scaling, removing Batch Normalization, defining a minimum resolution, and training with various-size images. In our experiment, we apply our proposals to train an image classification model based on RepLKNet-B on the image classification task of the CIFAR-100 dataset. Training with our proposals is approximately eight times faster than conventional training on the target spatial scale, keeping its accuracy.
KW - computer vision
KW - convolutional neural network
UR - http://www.scopus.com/inward/record.url?scp=85179753964&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179753964&partnerID=8YFLogxK
U2 - 10.1109/GCCE59613.2023.10315578
DO - 10.1109/GCCE59613.2023.10315578
M3 - Conference contribution
AN - SCOPUS:85179753964
T3 - GCCE 2023 - 2023 IEEE 12th Global Conference on Consumer Electronics
SP - 32
EP - 33
BT - GCCE 2023 - 2023 IEEE 12th Global Conference on Consumer Electronics
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE Global Conference on Consumer Electronics, GCCE 2023
Y2 - 10 October 2023 through 13 October 2023
ER -