TY - JOUR
T1 - QUEST
T2 - Multi-purpose log-quantized DNN inference engine stacked on 96-MB 3-D SRAM using inductive coupling technology in 40-nm CMOS
AU - Ueyoshi, Kodai
AU - Ando, Kota
AU - Hirose, Kazutoshi
AU - Takamaeda-Yamazaki, Shinya
AU - Hamada, Mototsugu
AU - Kuroda, Tadahiro
AU - Motomura, Masato
N1 - Funding Information:
Manuscript received April 30, 2018; revised July 28, 2018 and September 4, 2018; accepted September 11, 2018. Date of publication October 15, 2018; date of current version January 14, 2019. This paper was approved by Guest Editor Wim Dehaene. This work was supported by JST ACCEL, Japan, under Grant JPMJAC1502. (Corresponding author: Kodai Ueyoshi.) K. Ueyoshi, K. Ando, K. Hirose, S. Takamaeda-Yamazaki, and M. Motomura are with the Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan (e-mail: ueyoshi.kodai.6a@ist.hokudai.ac.jp).
Funding Information:
ACKNOWLEDGMENT The authors would like to thank Profs. T. Asai, M. Ikebe, E. Sano, and M. Arita from Hokkaido University for their invaluable support. This work was partially supported by JST ACCEL Grant Number JPMJAC1502, Japan.
Publisher Copyright:
© 1966-2012 IEEE.
PY - 2019/1
Y1 - 2019/1
N2 - QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.
AB - QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.
KW - Accelerator
KW - deep learning
KW - deep neural networks (DNNs)
KW - logarithmic-quantized neural networks
KW - processor architecture
UR - http://www.scopus.com/inward/record.url?scp=85055053908&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055053908&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2018.2871623
DO - 10.1109/JSSC.2018.2871623
M3 - Article
AN - SCOPUS:85055053908
SN - 0018-9200
VL - 54
SP - 186
EP - 196
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 1
M1 - 8492341
ER -