TY - GEN
T1 - Causal bandits with propagating inference
AU - Yabe, Akihiro
AU - Hatano, Daisuke
AU - Sumita, Hanna
AU - Ito, Shinji
AU - Kakimura, Naonori
AU - Fukunaga, Takuro
AU - Kawarabayashi, Ken Ichi
N1 - Publisher Copyright:
© 35th International Conference on Machine Learning, ICML 2018.All Rights Reserved.
PY - 2018
Y1 - 2018
N2 - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.
AB - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.
UR - http://www.scopus.com/inward/record.url?scp=85057298452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057298452&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057298452
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 8761
EP - 8781
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -