TY - GEN
T1 - Data Rearrange Unit for Efficient Data Computation in Embedded Systems
AU - Mamiya, Akiyuki
AU - Yamasaki, Nobuyuki
N1 - Funding Information:
ACKNOWLEDGMENT This research is supported by Adaptable and Seamless Technology transfer Program through Target-driven R&D (ASTEP, AS2815003R) from Japan Science and Technology Agency (JST).
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Recently demands for computation intensive applications such as convolutional neural networks (CNNs) have been increasing. In these applications, valid data for computation are allocated in non-continuous addresses. Therefore, common burst memory access pattern results in a low spatial locality of valid data per access. As a result, computation of data parallel execution units degrades in throughput, as computation resource is wasted by computing invalid data. This is especially a problem in embedded systems in which constraints in power consumption provoke a requirement for high computation efficiency. In this paper, we introduce a Data Rearrange Unit (DRU), a hardware unit rearranging computation data to increase spatial locality of valid data. The DRU drastically reduces the main memory access rate and increases computation efficiency by decreasing memory access to reduce power consumption. We demonstrate the effectiveness of our DRU by implementation on the RMTP SoC [1] [2] improving convolution throughput on a data parallel execution unit by a maximum of 94times, while only increasing the total cell area by about 13%.
AB - Recently demands for computation intensive applications such as convolutional neural networks (CNNs) have been increasing. In these applications, valid data for computation are allocated in non-continuous addresses. Therefore, common burst memory access pattern results in a low spatial locality of valid data per access. As a result, computation of data parallel execution units degrades in throughput, as computation resource is wasted by computing invalid data. This is especially a problem in embedded systems in which constraints in power consumption provoke a requirement for high computation efficiency. In this paper, we introduce a Data Rearrange Unit (DRU), a hardware unit rearranging computation data to increase spatial locality of valid data. The DRU drastically reduces the main memory access rate and increases computation efficiency by decreasing memory access to reduce power consumption. We demonstrate the effectiveness of our DRU by implementation on the RMTP SoC [1] [2] improving convolution throughput on a data parallel execution unit by a maximum of 94times, while only increasing the total cell area by about 13%.
KW - data rearrange
KW - data-parallel
KW - embedded-systems
KW - neural network
UR - http://www.scopus.com/inward/record.url?scp=85124135014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124135014&partnerID=8YFLogxK
U2 - 10.1109/CANDARW53999.2021.00024
DO - 10.1109/CANDARW53999.2021.00024
M3 - Conference contribution
AN - SCOPUS:85124135014
T3 - Proceedings - 2021 9th International Symposium on Computing and Networking Workshops, CANDARW 2021
SP - 101
EP - 106
BT - Proceedings - 2021 9th International Symposium on Computing and Networking Workshops, CANDARW 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Symposium on Computing and Networking Workshops, CANDARW 2021
Y2 - 23 November 2021 through 26 November 2021
ER -