TY - GEN
T1 - Power management of wireless sensor nodes with coordinated distributed reinforcement learning
AU - Shresthamali, Shaswot
AU - Kondo, Masaaki
AU - Nakamura, Hiroshi
N1 - Funding Information:
ACKNOWLEDGMENT This work was partially supported by JSPS KAKENHI Grant Numbers 18J20946, 17H01708 and Japan Science and Technology Agency (JST) CREST Grant Number JPMJCR1785.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Energy Harvesting Wireless Sensor Nodes (EHWSNs) require adaptive energy management policies for uninterrupted perpetual operation in their physical environments. Contemporary online Reinforcement Learning (RL) solutions take an unrealistically long time exploring the environment to converge on working policies. Our work accelerates learning by partitioning the state-space for simultaneous exploration by multiple agents. We achieve this by using a novel coordinated e-greedy method and implement it via Distributed RL (DiRL) in an EHWSN network. Our simulation results show a four-fold increase in state-space penetration and reduction in time to achieve optimal operation by an order of magnitude (50x). Moreover, we also propose methods to reduce instances of disastrous outcomes associated with learning and exploration. This translates to reducing the downtimes of the nodes in simulations corresponding to a real-world scenario by one thirds.
AB - Energy Harvesting Wireless Sensor Nodes (EHWSNs) require adaptive energy management policies for uninterrupted perpetual operation in their physical environments. Contemporary online Reinforcement Learning (RL) solutions take an unrealistically long time exploring the environment to converge on working policies. Our work accelerates learning by partitioning the state-space for simultaneous exploration by multiple agents. We achieve this by using a novel coordinated e-greedy method and implement it via Distributed RL (DiRL) in an EHWSN network. Our simulation results show a four-fold increase in state-space penetration and reduction in time to achieve optimal operation by an order of magnitude (50x). Moreover, we also propose methods to reduce instances of disastrous outcomes associated with learning and exploration. This translates to reducing the downtimes of the nodes in simulations corresponding to a real-world scenario by one thirds.
KW - Deep Reinforcement Learning
KW - Distributed Reinforcement Learning
KW - E-greedy exploration
KW - Energy Harvesting Wireless Sensor Nodes
KW - Energy Neutral Operation
KW - Internet of Things
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85081159609&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081159609&partnerID=8YFLogxK
U2 - 10.1109/ICCD46524.2019.00092
DO - 10.1109/ICCD46524.2019.00092
M3 - Conference contribution
AN - SCOPUS:85081159609
T3 - Proceedings - 2019 IEEE International Conference on Computer Design, ICCD 2019
SP - 638
EP - 647
BT - Proceedings - 2019 IEEE International Conference on Computer Design, ICCD 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th IEEE International Conference on Computer Design, ICCD 2019
Y2 - 17 November 2019 through 20 November 2019
ER -