To identify and explore variable groups and individual predictors of long sickness absences outside of well-known predictors such as service use and previous sickness absence using machine learning, explainable artificial intelligence methods and a submodel approach.
Retrospective study of prospectively collected registry data on sickness absences and a questionnaire used in health examinations.
Electronic medical record data of one large occupational health service provider in Finland.
11 533 employees of various occupations who, between 2011 and 2019, had at least once completed a health questionnaire that could be linked to service usage data and who had not had their initial health check within 1 year before or 3 months after completing the questionnaire.
To identify predictors of at least one long sickness absence period (≥30 days) during a 2-year follow-up.
The highest area under the receiver operating characteristic curve (AUROC) values among the submodel groups were for the sickness absence and service use submodels (0.68–0.74). The AUROC values for the submodels of sociodemographic factors, health habits or diseases data category ranged from 0.55 to 0.67 and from 0.55 to 0.67 for the submodels of questionnaire data. The AUROC value of the ensemble model that combined all submodels was 0.79 (95% CI 0.788 to 0.794).
The most important factors predicting long sickness absences based on the submodels were reported pain, number of symptoms and diseases, body mass index and short sleep duration. Additionally, several work and mental health-related variables increased the risk of long sickness absence.
Other variables besides service use and sickness absence increase the accuracy in predicting long sickness absence and providing information for planning interventions that could have a beneficial impact on work disability risk.