TY - JOUR
T1 - Empirical geodesic graphs and CAT(k) metrics for data analysis
AU - Kobayashi, Kei
AU - Wynn, Henry P.
N1 - Funding Information:
JST PREST, JSPS KAKENHI Grant Numbers 16K02843 and 26280009.
Funding Information:
Funding was provided by JST, PRESTO (JPMJPR14E3) and JSPS, KAKENHI (26280009,16K02843), Japan. The first author would like to thank Masayuki Sakai, Takaaki Koike and Tatsuhiro Aoshima for their excellent computation and visualization of the results. He also appreciates Reiko Miyaoka and Hiroshi Kokubu for their helpful and encouraging advice.
Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/2/1
Y1 - 2020/2/1
N2 - A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.
AB - A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.
KW - CAT(0)
KW - Cluster analysis
KW - Curvature
KW - Extrinsic mean
KW - Intrinsic mean
KW - Non-parametric analysis
UR - http://www.scopus.com/inward/record.url?scp=85078791383&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078791383&partnerID=8YFLogxK
U2 - 10.1007/s11222-019-09855-3
DO - 10.1007/s11222-019-09855-3
M3 - Article
AN - SCOPUS:85078791383
SN - 0960-3174
VL - 30
SP - 1
EP - 18
JO - Statistics and Computing
JF - Statistics and Computing
IS - 1
ER -