TY - GEN
T1 - Tug-of-war model for multi-armed bandit problem
AU - Kim, Song Ju
AU - Aono, Masashi
AU - Hara, Masahiko
PY - 2010
Y1 - 2010
N2 - We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.
AB - We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.
KW - Amoeba-based computing
KW - Bio-inspired computation
KW - Multi-armed bandit problem
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=79956331440&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956331440&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13523-1_10
DO - 10.1007/978-3-642-13523-1_10
M3 - Conference contribution
AN - SCOPUS:79956331440
SN - 3642135226
SN - 9783642135224
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 69
EP - 80
BT - Unconventional Computation - 9th International Conference, UC 2010, Proceedings
T2 - 9th International Conference on Unconventional Computation, UC 2010
Y2 - 21 June 2010 through 25 June 2010
ER -