Development and validation of a two-stage machine learning model for personalised type 2 diabetes screening in the All of Us Research Program and UK Biobank

Por: Khattab · A. · Chen · S.-F. · Sadaei · H. J. · Wineinger · N. E. · Torkamani · A.

Objective

To develop and externally validate a two-stage machine learning framework that integrates polygenic risk and clinical variables for early identification of individuals at risk of developing type 2 diabetes.

Methods

We conducted a prospective prediction study using data from the All of Us Research Program for model development and the UK Biobank for external validation. Two models were constructed. Stage 1 used gradient boosted decision trees (XGBoost) with cross validation, automated hyperparameter optimisation and class weighting to predict 5-year incident type 2 diabetes using demographic, clinical and polygenic predictors. Stage 2 incorporated glycated haemoglobin or fasting glucose measurements to refine risk estimates. Model interpretation used SHapley Additive exPlanations values and permutation importance, and logistic regression and random forest models served as comparators. Discrimination of all models was compared using the DeLong test.

Results

The Stage 1 model achieved an area under the receiver operating characteristic curve (AUROC) of 0.81 in All of Us and 0.82 in UK Biobank, performing significantly better than the phenotype-only model in UK Biobank (DeLong p=1.05x10^–⁷⁶). Higher polygenic risk quartiles were associated with increased incidence of type 2 diabetes in both cohorts (global ² p

Conclusion

A two-stage machine learning framework that integrates genetic and clinical information can support personalised screening for type 2 diabetes across diverse populations. The approach demonstrated robust performance across cohorts and offers a practical structure for early risk identification.

FreshRSS

Development and validation of a two-stage machine learning model for personalised type 2 diabetes screening in the All of Us Research Program and UK Biobank