TY - GEN
T1 - A 2.72GOPS/11mW low power reconfigurable accelerator with a highly parallel datapath consisting of combinatorial circuits in 65nm CMOS
AU - Ozaki, N.
AU - Yasuda, Y.
AU - Saito, Y.
AU - Ikebuchi, D.
AU - Kimura, M.
AU - Amano, H.
AU - Nakamura, H.
AU - Usami, K.
AU - Namiki, M.
AU - Kondo, M.
PY - 2011
Y1 - 2011
N2 - CMA (Cool Mega-Array) is a high energy-efficiency reconfigurable accelerator for battery-driven mobile devices. It consists of a large processing element (PE) array without memory elements for mapping the data-flow graph of the application being executed, a small simple programmable micro-controller for data management, and a data memory. Unlike traditional coarse grained reconfigurable processors in which each PE provides registers and context memory, a CMA rduces power consumption by doing away with that for switching of hardware context and storing intermediate data in registers and their clock distribution. Although the data-flow graph mapped on the PE array is static during execution, various application programs can be implemented by making the best use of flexible data management instructions in the micro-controller. When the delay time of the PE array is shorter than the data handling time taken by the micro-controller, the supply voltage for the PE array is scaled to reduce the power consumption without degrading the performance. In contrast, when the delay time of the PE array is longer, wave pipelining is applied to enhance performance of the PE array. A prototype CMA chip (CMA-1) with 8 × 8 PE array with 24-bit data width was fabricated on the basis of 2.1 × 4.2-mm 65-nm CMOS technology, and achieves sustained performance of 2.5-GOPS/11.2-mW. This energy efficiency is comparable to that of the most-energy-efficient accelerators that have been reported.
AB - CMA (Cool Mega-Array) is a high energy-efficiency reconfigurable accelerator for battery-driven mobile devices. It consists of a large processing element (PE) array without memory elements for mapping the data-flow graph of the application being executed, a small simple programmable micro-controller for data management, and a data memory. Unlike traditional coarse grained reconfigurable processors in which each PE provides registers and context memory, a CMA rduces power consumption by doing away with that for switching of hardware context and storing intermediate data in registers and their clock distribution. Although the data-flow graph mapped on the PE array is static during execution, various application programs can be implemented by making the best use of flexible data management instructions in the micro-controller. When the delay time of the PE array is shorter than the data handling time taken by the micro-controller, the supply voltage for the PE array is scaled to reduce the power consumption without degrading the performance. In contrast, when the delay time of the PE array is longer, wave pipelining is applied to enhance performance of the PE array. A prototype CMA chip (CMA-1) with 8 × 8 PE array with 24-bit data width was fabricated on the basis of 2.1 × 4.2-mm 65-nm CMOS technology, and achieves sustained performance of 2.5-GOPS/11.2-mW. This energy efficiency is comparable to that of the most-energy-efficient accelerators that have been reported.
UR - http://www.scopus.com/inward/record.url?scp=84856701361&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856701361&partnerID=8YFLogxK
U2 - 10.1109/ISICir.2011.6131929
DO - 10.1109/ISICir.2011.6131929
M3 - Conference contribution
AN - SCOPUS:84856701361
SN - 9781612848648
T3 - 2011 International Symposium on Integrated Circuits, ISIC 2011
SP - 579
EP - 584
BT - 2011 International Symposium on Integrated Circuits, ISIC 2011
T2 - 2011 International Symposium on Integrated Circuits, ISIC 2011
Y2 - 12 December 2011 through 14 December 2011
ER -