TY - JOUR
T1 - Theory and algorithms for shapelet-based multiple-instance learning
AU - Suehiro, Daiki
AU - Hatano, Kohei
AU - Takimoto, Eiji
AU - Yamamoto, Shuji
AU - Bannai, Kenichi
AU - Takeda, Akiko
N1 - Publisher Copyright:
© 2020 Massachusetts Institute of Technology.
PY - 2020/8/1
Y1 - 2020/8/1
N2 - We propose a new formulation of multiple-instance learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a “shapelet” (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances have been chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many, shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through linear programming boosting (LPBoost) to difference of convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for shapelet learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods.Moreover, we show that the proposed heuristics allow us to achieve the result in reasonable computational time.
AB - We propose a new formulation of multiple-instance learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a “shapelet” (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances have been chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many, shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through linear programming boosting (LPBoost) to difference of convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for shapelet learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods.Moreover, we show that the proposed heuristics allow us to achieve the result in reasonable computational time.
UR - http://www.scopus.com/inward/record.url?scp=85088272063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088272063&partnerID=8YFLogxK
U2 - 10.1162/neco_a_01297
DO - 10.1162/neco_a_01297
M3 - Article
C2 - 32521217
AN - SCOPUS:85088272063
SN - 0899-7667
VL - 32
SP - 1580
EP - 1613
JO - Neural Computation
JF - Neural Computation
IS - 8
ER -