TY - JOUR
T1 - Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates
AU - Kato, Ryo
AU - Hoshino, Takahiro
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Numbers JP26285151, 18H03209, 16H02013, 16H06323. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu; http://adni.loni.usc.edu.) As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Publisher Copyright:
© 2019, The Institute of Statistical Mathematics, Tokyo.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95% interval calculated using MICE can be less than 1%, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.
AB - Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95% interval calculated using MICE can be less than 1%, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.
KW - Full conditional specification
KW - Missing data
KW - Multiple imputation
KW - Probit stick-breaking process mixture
KW - Semiparametric Bayes model
UR - http://www.scopus.com/inward/record.url?scp=85062883998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062883998&partnerID=8YFLogxK
U2 - 10.1007/s10463-019-00710-w
DO - 10.1007/s10463-019-00710-w
M3 - Article
AN - SCOPUS:85062883998
SN - 0020-3157
VL - 72
SP - 803
EP - 825
JO - Annals of the Institute of Statistical Mathematics
JF - Annals of the Institute of Statistical Mathematics
IS - 3
ER -