TY - JOUR
T1 - SIMD vectorization for the Lennard-Jones potential with AVX2 and AVX-512 instructions
AU - Watanabe, Hiroshi
AU - Nakagawa, Koh M.
N1 - Funding Information:
The authors would like to thank S. Mitsunari and H. Noguchi for fruitful discussions. This work was supported by JSPS, Japan KAKENHI Grant Number 15K05201 and by the MEXT, Japan project as “Exploratory Challenge on Post-K Computer” (Frontiers of Basic Science: Challenging the Limits). The computations were carried out using the facilities of the Information Technology Center of the University of Tokyo and the Institute for Solid State Physics of the University of Tokyo.
Funding Information:
The authors would like to thank S. Mitsunari and H. Noguchi for fruitful discussions. This work was supported by JSPS, Japan KAKENHI Grant Number 15K05201 and by the MEXT, Japan project as ?Exploratory Challenge on Post-K Computer? (Frontiers of Basic Science: Challenging the Limits). The computations were carried out using the facilities of the Information Technology Center of the University of Tokyo and the Institute for Solid State Physics of the University of Tokyo.
Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2019/4
Y1 - 2019/4
N2 - This work describes the SIMD vectorization of the force calculation of the Lennard-Jones potential with Intel AVX2 and AVX-512 instruction sets. Since the force-calculation kernel of the molecular dynamics method involves indirect access to memory, the data layout is one of the most important factors in vectorization. We find that the Array of Structures (AoS) with padding exhibits better performance than Structure of Arrays (SoA) with appropriate vectorization and optimizations. In particular, AoS with 512-bit width exhibits the best performance among the architectures. While the difference in performance between AoS and SoA is significant for the vectorization with AVX2, that with AVX-512 is minor. The effect of other optimization techniques, such as software pipelining together with vectorization, is also discussed. We present results for benchmarks on three CPU architectures: Intel Haswell (HSW), Knights Landing (KNL), and Skylake (SKL). The performance gains by vectorization are about 42% on HSW compared with the code optimized without vectorization. On KNL, the hand-vectorized codes exhibit 34% better performance than the codes vectorized automatically by the Intel compiler. On SKL, the code vectorized with AVX2 exhibits slightly better performance than that with vectorized AVX-512.
AB - This work describes the SIMD vectorization of the force calculation of the Lennard-Jones potential with Intel AVX2 and AVX-512 instruction sets. Since the force-calculation kernel of the molecular dynamics method involves indirect access to memory, the data layout is one of the most important factors in vectorization. We find that the Array of Structures (AoS) with padding exhibits better performance than Structure of Arrays (SoA) with appropriate vectorization and optimizations. In particular, AoS with 512-bit width exhibits the best performance among the architectures. While the difference in performance between AoS and SoA is significant for the vectorization with AVX2, that with AVX-512 is minor. The effect of other optimization techniques, such as software pipelining together with vectorization, is also discussed. We present results for benchmarks on three CPU architectures: Intel Haswell (HSW), Knights Landing (KNL), and Skylake (SKL). The performance gains by vectorization are about 42% on HSW compared with the code optimized without vectorization. On KNL, the hand-vectorized codes exhibit 34% better performance than the codes vectorized automatically by the Intel compiler. On SKL, the code vectorized with AVX2 exhibits slightly better performance than that with vectorized AVX-512.
KW - AVX-512
KW - AVX2
KW - Molecular Dynamics Simulation
KW - SIMD Vectorization
KW - Xeon Phi
UR - http://www.scopus.com/inward/record.url?scp=85056810085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056810085&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2018.10.028
DO - 10.1016/j.cpc.2018.10.028
M3 - Article
AN - SCOPUS:85056810085
SN - 0010-4655
VL - 237
SP - 1
EP - 7
JO - Computer Physics Communications
JF - Computer Physics Communications
ER -