TY - JOUR
T1 - Mapping-Aware Kernel Partitioning Method for CGRAs Assisted by Deep Learning
AU - Kojima, Takuya
AU - Ohwada, Ayaka
AU - Amano, Hideharu
N1 - Funding Information:
This work is supported in part by the JSPS KAKENHI under Grant 19J21493 and in part by JST CREST under Grant JPMJCR19K1, Japan. This work was also partially based on results obtained from a Project JPNP16007 commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Publisher Copyright:
© 1990-2012 IEEE.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Coarse-grained reconfigurable architectures (CGRAs) provide high energy efficiency with word-level programmability rather than bit-level ones such as FPGAs. The coarser reconfigurability brings about higher energy efficiency and reduces the complexity of compiler tasks compared to the FPGAs. However, application mapping process for CGRAs is still time-consuming. When the compiler tries to map a large and complicated application data-flow-graph(DFG) onto the reconfigurable fabric, it tends to result in inefficient resource use or to fail in mapping. In case of failure, the compiler must divide it into several sub-DFGs and goes back to the same flow. In this work, we propose a novel partitioning method based on a genetic algorithm to eliminate the unmappable DFGs and improve the mapping quality. In order not to generate unmappable sub-DFGs, we also propose an estimation model which predicts the mappability and resource requirements using a DGCNN (Deep Graph Convolutional Neural Network). The genetic algorithm with this model can seek the most resource-efficient mapping without the back-end mapping process. Our model can predict the mappability with more than 98% accuracy and resource usage with a negligible error for two studied CGRAs. Besides, the proposed partitioning method demonstrates 53-75% of memory saving, 1.28-1.39x higher throughput, and better mapping quality over three comparative approaches.
AB - Coarse-grained reconfigurable architectures (CGRAs) provide high energy efficiency with word-level programmability rather than bit-level ones such as FPGAs. The coarser reconfigurability brings about higher energy efficiency and reduces the complexity of compiler tasks compared to the FPGAs. However, application mapping process for CGRAs is still time-consuming. When the compiler tries to map a large and complicated application data-flow-graph(DFG) onto the reconfigurable fabric, it tends to result in inefficient resource use or to fail in mapping. In case of failure, the compiler must divide it into several sub-DFGs and goes back to the same flow. In this work, we propose a novel partitioning method based on a genetic algorithm to eliminate the unmappable DFGs and improve the mapping quality. In order not to generate unmappable sub-DFGs, we also propose an estimation model which predicts the mappability and resource requirements using a DGCNN (Deep Graph Convolutional Neural Network). The genetic algorithm with this model can seek the most resource-efficient mapping without the back-end mapping process. Our model can predict the mappability with more than 98% accuracy and resource usage with a negligible error for two studied CGRAs. Besides, the proposed partitioning method demonstrates 53-75% of memory saving, 1.28-1.39x higher throughput, and better mapping quality over three comparative approaches.
KW - CGRA
KW - Coarse-grained reconfigurable architecture
KW - deep learning
KW - genetic algorithm
KW - graph partitioning
UR - http://www.scopus.com/inward/record.url?scp=85113852897&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113852897&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2021.3107746
DO - 10.1109/TPDS.2021.3107746
M3 - Article
AN - SCOPUS:85113852897
SN - 1045-9219
VL - 33
SP - 1213
EP - 1230
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 5
ER -