TY - GEN
T1 - An FPGA-based low-latency network processing for spark streaming
AU - Nakamura, Kohei
AU - Hayashi, Ami
AU - Matsutani, Hiroki
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.
AB - Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.
UR - http://www.scopus.com/inward/record.url?scp=85015245393&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015245393&partnerID=8YFLogxK
U2 - 10.1109/BigData.2016.7840876
DO - 10.1109/BigData.2016.7840876
M3 - Conference contribution
AN - SCOPUS:85015245393
T3 - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
SP - 2410
EP - 2415
BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
A2 - Ak, Ronay
A2 - Karypis, George
A2 - Xia, Yinglong
A2 - Hu, Xiaohua Tony
A2 - Yu, Philip S.
A2 - Joshi, James
A2 - Ungar, Lyle
A2 - Liu, Ling
A2 - Sato, Aki-Hiro
A2 - Suzumura, Toyotaro
A2 - Rachuri, Sudarsan
A2 - Govindaraju, Rama
A2 - Xu, Weijia
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Big Data, Big Data 2016
Y2 - 5 December 2016 through 8 December 2016
ER -