by Faten Al-hussein, Laleh Tafakori, Mali Abdollahian, Khalid Al-Shali
Type 2 diabetes (T2D) is a chronic condition affecting millions globally. A robust predictive model to estimate the number of new cases of T2D can facilitate precise monitoring and effective intervention strategies. This study aims to predict the number of new T2D cases per month in Saudi Arabia and identify the Key Performance Indicators (KPIs) associated with T2D, using count regression models, Poisson Regression (PR), Negative Binomial Regression (NBR), Poisson Inverse Gaussian Regression (PIGR), and Bell Regression (BR). De-identified data from 1,000 patients with T2D in Saudi Arabia were used to develop the models. The performance of the full models, which include recommended Key Performance Indicators (KPIs), is compared using metrics such as the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), 10-fold cross-validation (CV-10), Akaike information criterion (AIC), and Bayesian information criterion (BIC). The most significant KPIs identified by the full models were utilized to develop the reduced models. The full NBR model outperformed other models, achieving R² of 0.88, RMSE of 0.93, MAE of 0.69, CV-10 of 1.21, AIC = 873.23, and BIC = 880. The reduced NBR model, focusing solely on the five most influential variables (marital status, age, body mass index (BMI), total cholesterol (TC), and high-density lipoprotein (HDL)), with R² = 0.84, RMSE = 1.10, MAE = 0.86, CV-10 = 1.37, AIC = 899, and BIC = 910, also outperformed other reduced models. The Likelihood Ratio Test (LRT) did not show a significant difference between the full and reduced NBR models (p = 0.694), supporting the adequacy of the reduced model. The proposed reduced model, utilizing only five significant KPIs, can help healthcare providers develop effective, targeted strategies by monitoring a smaller number of KPIs to reduce the rising number of T2D cases in Saudi Arabia.