TY - GEN
T1 - Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling
AU - Furukawa, Masaki
AU - Matsutani, Hiroki
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-Network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication performance is optimized by using DPDK (Data Plane Development Kit). Specifically, DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.
AB - A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-Network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication performance is optimized by using DPDK (Data Plane Development Kit). Specifically, DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.
KW - Deep Q Network
KW - Distributed deep reinforcement learning
KW - DPDK
KW - In network computing
UR - http://www.scopus.com/inward/record.url?scp=85129666611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129666611&partnerID=8YFLogxK
U2 - 10.1109/PDP55904.2022.00020
DO - 10.1109/PDP55904.2022.00020
M3 - Conference contribution
AN - SCOPUS:85129666611
T3 - Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022
SP - 75
EP - 82
BT - Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022
A2 - Gonzalez-Escribano, Arturo
A2 - Garcia, Jose Daniel
A2 - Torquati, Massimo
A2 - Skavhaug, Amund
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022
Y2 - 9 March 2022 through 11 March 2022
ER -