TY - JOUR
T1 - Improved regret bounds for bandit combinatorial optimization
AU - Ito, Shinji
AU - Hatano, Daisuke
AU - Sumita, Hanna
AU - Takemura, Kei
AU - Fukunaga, Takuro
AU - Kakimura, Naonori
AU - Kawarabayashi, Ken Ichi
N1 - Funding Information:
∗This work was supported by JST, ERATO, Grant Number JPMJER1201, Japan. †This work was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan. ‡This work was supported by JST, PRESTO, Grant Number JPMJPR1759, Japan. §This work was supported by JSPS, KAKENHI, Grant Number JP18H05291, Japan.
Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set A ? {0, 1}d and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in Rd in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. [8] obtained a lower bound ?(pdk3T/log T) of the regret, where k is the maximum `1-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by ?(vdk3T) through applying a factor of vlog T, which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of Õ(vdk2T), which is vk times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret.
AB - Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set A ? {0, 1}d and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in Rd in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. [8] obtained a lower bound ?(pdk3T/log T) of the regret, where k is the maximum `1-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by ?(vdk3T) through applying a factor of vlog T, which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of Õ(vdk2T), which is vk times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret.
UR - http://www.scopus.com/inward/record.url?scp=85090170188&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090170188&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090170188
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -