TY - GEN
T1 - An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
AU - Watanabe, Hirohisa
AU - Tsukada, Mineto
AU - Matsutani, Hiroki
N1 - Funding Information:
Acknowledgements This work was partially supported by JST CREST Grant Number JPMJCR20F2, Japan.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNS require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.
AB - DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNS require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.
KW - FPGA
KW - OS ELM
KW - On device learning
KW - Reinforcement learning
KW - Spectral normalization
UR - http://www.scopus.com/inward/record.url?scp=85114421888&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114421888&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW52791.2021.00022
DO - 10.1109/IPDPSW52791.2021.00022
M3 - Conference contribution
AN - SCOPUS:85114421888
T3 - 2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021
SP - 96
EP - 103
BT - 2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021
Y2 - 17 May 2021
ER -