TY - GEN
T1 - Acceleration of the aggregation process in a Hall-thruster simulation using Intel FPGA SDK for OpenCL
AU - Noda, Hiroyuki
AU - Sakai, Ryotaro
AU - Miyajima, Takaaki
AU - Fujita, Naoyuki
AU - Amano, Hideharu
N1 - Funding Information:
The present study was supported in part by the JST/CREST program entitled”Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era” in the research area of”Development of System Software Technologies for post-Peta Scale High Performance Computing”.
Publisher Copyright:
c 2017 Association for Computing Machinery.
PY - 2017/6/7
Y1 - 2017/6/7
N2 - The Full Particle-In-Cell (Full-PIC) method is a numerical simulation technique used in the research and development of Hall-thrusters which are a type of electric propulsion engines. It treats ions, neutrons, and electrons as particles and is highly accurate compared with other methods which treat them as a fluid. However, it requires a large computational cost. The Japan Aerospace Exploration Agency (JAXA) is developing a software package called NSRU-Full-PIC that implements such a method. One of the important computing tasks in NSRU-Full-PIC is the aggregation process, which causes Read-After-write (RAW) hazards, and hence makes parallel computation difficult. In this paper, we tackle this problem by introducing a reduction operation with an FPGA accelerator. We use Intel’s mid-range SoC, Arria 10 which embeds floating-point DSPs for high performance numerical computation. Intel FPGA SDK for OpenCL is available for this platform for easy offloading of complex tasks. We implemented 4 types reduction kernels and compared their performance. As a result, the aggregation process becomes 76.4 times faster than the single-thread version on an ARM Cortex-A9 1.5 GHz, and 14.1 times faster than that on a Xeon E5-2660 2.9 GHz in our fastest implementation, Read-16-Vect. In this implementation, we achieved 93.5% of theoretical performance with optimized FPGA resources.
AB - The Full Particle-In-Cell (Full-PIC) method is a numerical simulation technique used in the research and development of Hall-thrusters which are a type of electric propulsion engines. It treats ions, neutrons, and electrons as particles and is highly accurate compared with other methods which treat them as a fluid. However, it requires a large computational cost. The Japan Aerospace Exploration Agency (JAXA) is developing a software package called NSRU-Full-PIC that implements such a method. One of the important computing tasks in NSRU-Full-PIC is the aggregation process, which causes Read-After-write (RAW) hazards, and hence makes parallel computation difficult. In this paper, we tackle this problem by introducing a reduction operation with an FPGA accelerator. We use Intel’s mid-range SoC, Arria 10 which embeds floating-point DSPs for high performance numerical computation. Intel FPGA SDK for OpenCL is available for this platform for easy offloading of complex tasks. We implemented 4 types reduction kernels and compared their performance. As a result, the aggregation process becomes 76.4 times faster than the single-thread version on an ARM Cortex-A9 1.5 GHz, and 14.1 times faster than that on a Xeon E5-2660 2.9 GHz in our fastest implementation, Read-16-Vect. In this implementation, we achieved 93.5% of theoretical performance with optimized FPGA resources.
UR - http://www.scopus.com/inward/record.url?scp=85040673526&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040673526&partnerID=8YFLogxK
U2 - 10.1145/3120895.3120915
DO - 10.1145/3120895.3120915
M3 - Conference contribution
AN - SCOPUS:85040673526
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 8th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2017
PB - Association for Computing Machinery
T2 - 8th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2017
Y2 - 7 June 2017 through 9 June 2017
ER -