TY - JOUR
T1 - An analytical network performance model for SIMD processor CSX600 interconnects
AU - Nishikawa, Yuri
AU - Koibuchi, Michihiro
AU - Yoshimi, Masato
AU - Miura, Kenichi
AU - Amano, Hideharu
PY - 2011/1/1
Y1 - 2011/1/1
N2 - One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.
AB - One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.
KW - Many-core processor
KW - Network-on-chips (NoCs)
KW - SIMD
UR - http://www.scopus.com/inward/record.url?scp=78650271799&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650271799&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2010.10.004
DO - 10.1016/j.sysarc.2010.10.004
M3 - Article
AN - SCOPUS:78650271799
SN - 1383-7621
VL - 57
SP - 146
EP - 159
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
IS - 1
ER -