Application of Machine Learning to Identify Clustering of Cardiometabolic Risk Factors in U.S. Adults.
Diabetes Technol Ther. 2019 Apr 10;:
Authors: Liao X, Kerr D, Morales J, Duncan I
AIMS: The aim of this study is to compare some machine learning methods with traditional statistical parametric analyses using logistic regression to investigate the relationship of risk factors for diabetes and cardiovascular (cardiometabolic risk) for U.S. adults using a cross-sectional data from participants in a wellness improvement program.
METHODS: Logistic regression was used to find the relationship between individual risk factors, predictor and cardiometabolic risk. Supervised machine learning methods were used to predict risk and produce a ranking of variables’ importance. A clustering method was used to identify subpopulations of interest. Predictors were divided into those that are nonmodifiable and those that are modifiable.
RESULTS: The population comprised 217,254 adults of whom 8.1% had diabetes. Using logistic regression, six variables were identified to be negatively related and eleven were positively related to cardiometabolic risk. Three supervised machine learning classifiers (random forest, gradient boosting, and bagging) were applied with average AUC to be 0.806. Each classifier also produced a ranking of variables’ importance. Four subgroups were identified with a k-medoid clustering algorithm, which were mainly distinguished by gender and diabetes status.
CONCLUSIONS: The study illustrates that machine learning is an important addition to traditional logistic regression in terms of identifying important cardiometabolic risk factors and ranking their importance and the potential for interventions based on lifestyle and medications at an individual level.
PMID: 30969131 [PubMed – as supplied by publisher]