TY - GEN
T1 - Implementing a large application(LSTM) on the multi-FPGA system
T2 - 22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019
AU - Yamauchi, Yugo
AU - Musha, Kazusa
AU - Amano, Hideharu
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5/23
Y1 - 2019/5/23
N2 - In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].
AB - In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].
UR - http://www.scopus.com/inward/record.url?scp=85067132502&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067132502&partnerID=8YFLogxK
U2 - 10.1109/CoolChips.2019.8721333
DO - 10.1109/CoolChips.2019.8721333
M3 - Conference contribution
AN - SCOPUS:85067132502
T3 - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings
BT - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 April 2019 through 19 April 2019
ER -