Causal bandits with propagating inference

Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, Ken Ichi Kawarabayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ log(|A|T)/T) regret bound, where γ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ = O(N2), where N is the number of nodes.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages8761-8781
Number of pages21
ISBN (Electronic)9781510867963
Publication statusPublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: 2018 Jul 102018 Jul 15

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume12

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period18/7/1018/7/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Causal bandits with propagating inference'. Together they form a unique fingerprint.

Cite this