TY - GEN
T1 - The design and implementation of scalable deep neural network accelerator cores
AU - Sakamoto, Ryuichi
AU - Takata, Ryo
AU - Ishii, Jun
AU - Kondo, Masaaki
AU - Nakamura, Hiroshi
AU - Ohkubo, Tetsui
AU - Kojima, Takuya
AU - Amano, Hideharu
N1 - Funding Information:
ACKNOWLEDGMENTS This work was supported by JSPS KAKENHI S Grant Number 25220002. This work was also supported by VLSI Design and Education Center(VDEC), the University of Tokyo with the collaboration with CACENCE Corporation and SYSNOPSYS Corporation.
Publisher Copyright:
© 2017 IEEE.
PY - 2018/3/26
Y1 - 2018/3/26
N2 - Due to the recent advances in Deep Neural Network (DNN) technologies, recognition and inference applications are expected to run on mobile embeddedsystems. Developing high-performance and power-efficient DNN engines becomesone of the important challenges for embedded systems. Since DNN algorithms orstructures are frequently updated, flexibility and performance scalability todeal with various types of networks are crucial requirement of the DNNaccelerator design. In this paper, we describe the architecture and LSI designof a flexible and scalable CNN accelerator called SNACC (Scalable NeuroAccelerator Core with Cubic integration) which consists of several processingcores, on-chip memory modules, and ThruChip Interface (TCI). We evaluate thescalability of SNACC with detailed simulation varying the number of cores andoff-chip memory access bandwidth. The results show that the energy efficiency of the accelerator becomes the highest in eight cores configuration with500MB/s off-chip bandwidth.
AB - Due to the recent advances in Deep Neural Network (DNN) technologies, recognition and inference applications are expected to run on mobile embeddedsystems. Developing high-performance and power-efficient DNN engines becomesone of the important challenges for embedded systems. Since DNN algorithms orstructures are frequently updated, flexibility and performance scalability todeal with various types of networks are crucial requirement of the DNNaccelerator design. In this paper, we describe the architecture and LSI designof a flexible and scalable CNN accelerator called SNACC (Scalable NeuroAccelerator Core with Cubic integration) which consists of several processingcores, on-chip memory modules, and ThruChip Interface (TCI). We evaluate thescalability of SNACC with detailed simulation varying the number of cores andoff-chip memory access bandwidth. The results show that the energy efficiency of the accelerator becomes the highest in eight cores configuration with500MB/s off-chip bandwidth.
KW - 3D-Integration
KW - Accelerator
KW - CNN
KW - LSI Design
UR - http://www.scopus.com/inward/record.url?scp=85049741277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049741277&partnerID=8YFLogxK
U2 - 10.1109/MCSoC.2017.29
DO - 10.1109/MCSoC.2017.29
M3 - Conference contribution
AN - SCOPUS:85049741277
T3 - Proceedings - IEEE 11th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2017
SP - 13
EP - 20
BT - Proceedings - IEEE 11th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2017
Y2 - 18 September 2017 through 20 September 2017
ER -