TY - GEN
T1 - Accelerating spark RDD operations with local and remote GPU devices
AU - Ohno, Yasuhiro
AU - Morishima, Shin
AU - Matsutani, Hiroki
PY - 2016/7/2
Y1 - 2016/7/2
N2 - Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.
AB - Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.
KW - Apache Spark
KW - CUDA
KW - GPU
KW - PCIe over 10GbE
KW - RDD
UR - http://www.scopus.com/inward/record.url?scp=85018520965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018520965&partnerID=8YFLogxK
U2 - 10.1109/ICPADS.2016.0108
DO - 10.1109/ICPADS.2016.0108
M3 - Conference contribution
AN - SCOPUS:85018520965
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 791
EP - 799
BT - Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
A2 - Liao, Xiaofei
A2 - Lovas, Robert
A2 - Shen, Xipeng
A2 - Zheng, Ran
PB - IEEE Computer Society
T2 - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
Y2 - 13 December 2016 through 16 December 2016
ER -