TY - GEN
T1 - Horizontal division of deep learning applications with all-to-all communication on a multi-FPGA system
AU - Yamauchi, Yugo
AU - Ahmed, Akram Ben
AU - Hironaka, Kazuei
AU - Iizuka, Kensuke
AU - Amano, Hideharu
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Although convolutional neural networks (CNNs) have plenty of parallelism, traditional layer-by-layer task division designs for multi-FPGA systems have the following problems: (1) The computational load of each layer is different from each other, so the execution time is dominated with the heaviest one. (2) Each FPGA must be designed independently, it means that we must design, generate and manage various configuration files. To address this problem, we propose a horizontal division method that enables us to use of a single design for each FPGA. All layers are divided horizontal direction of the target CNN, and a set of layers is implemented on an FPGA. It reduces the time of design as well as management costs for the execution. Also, since the weight data can be separated, the usage of local memory can be reduced. The apparent disadvantage of this method is that it requires all-to-all data communication between FPGA boards, and so it is not suitable to traditional multi-FPGA systems with a simple linear network. Here, we tried to apply the method to FiC (Flow-in-Cloud) which has a powerful network to enable efficient broadcasting. A simple CNN LeNet and a matrix multiplication for more practical fully connected layer is implemented on the FiC prototype. As a result of the evaluation, LeNet using 8 FP-GAs achieved 7.5 times faster than that with a single FPGA, and achieved 12.6 times faster than the optimized software of a high-end CPU.
AB - Although convolutional neural networks (CNNs) have plenty of parallelism, traditional layer-by-layer task division designs for multi-FPGA systems have the following problems: (1) The computational load of each layer is different from each other, so the execution time is dominated with the heaviest one. (2) Each FPGA must be designed independently, it means that we must design, generate and manage various configuration files. To address this problem, we propose a horizontal division method that enables us to use of a single design for each FPGA. All layers are divided horizontal direction of the target CNN, and a set of layers is implemented on an FPGA. It reduces the time of design as well as management costs for the execution. Also, since the weight data can be separated, the usage of local memory can be reduced. The apparent disadvantage of this method is that it requires all-to-all data communication between FPGA boards, and so it is not suitable to traditional multi-FPGA systems with a simple linear network. Here, we tried to apply the method to FiC (Flow-in-Cloud) which has a powerful network to enable efficient broadcasting. A simple CNN LeNet and a matrix multiplication for more practical fully connected layer is implemented on the FiC prototype. As a result of the evaluation, LeNet using 8 FP-GAs achieved 7.5 times faster than that with a single FPGA, and achieved 12.6 times faster than the optimized software of a high-end CPU.
KW - CNN
KW - FPGA
KW - multiFPGA
UR - http://www.scopus.com/inward/record.url?scp=85102190662&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102190662&partnerID=8YFLogxK
U2 - 10.1109/CANDARW51189.2020.00060
DO - 10.1109/CANDARW51189.2020.00060
M3 - Conference contribution
AN - SCOPUS:85102190662
T3 - Proceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
SP - 277
EP - 281
BT - Proceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
Y2 - 24 November 2020 through 27 November 2020
ER -