Retrospective validation of an artificial intelligence system for diagnostic assessment of prostate biopsies on the ProMort cohort: study protocol

Por: Ji · X. · Zelic · R. · Aspegren · O. · Mulliqi · N. · Fiorentino · M. · Giunchi · F. · Molinaro · L. · Boman · S. E. · Szolnoky · K. · Liu · L. X. · Pettersson · A. · Vincent · P. H. · Eklund · M. · Akre · O. · Kartasalo · K. — Diciembre 25^th 2025 at 05:45

Introduction

Prostate cancer diagnosis and treatment planning depend on accurate histopathological assessment of needle biopsies, particularly through the Gleason scoring system. The inherently subjective nature of the grading creates variability between pathologists, potentially resulting in suboptimal patient management decisions. These reproducibility challenges extend beyond Gleason scoring to encompass other critical diagnostic and prognostic markers, including cancer volume quantification and detection of cribriform morphology patterns and perineural invasion. Artificial intelligence (AI) applications in digital pathology have emerged as promising solutions for enhancing diagnostic consistency and accuracy, with recent research demonstrating that automated systems can match expert-level performance in prostate biopsy evaluation. Nevertheless, comprehensive validation studies have revealed concerning limitations in model generalisability when deployed across different clinical environments and patient populations. Recent systematic reviews revealed widespread risk-of-bias limitations and insufficient external validation in AI diagnostic studies, highlighting critical needs for accumulated evidence supporting generalisability before clinical implementation. Rigorous external validation with preregistered protocols using independent datasets from diverse clinical settings remains essential to establish the reliability and safety of AI-assisted prostate pathology systems.

Methods and analysis

This study protocol establishes a framework for the retrospective external validation of an AI system developed for prostate biopsy assessment, to be conducted on the case-control samples of the National Prostate Cancer Register of Sweden, ProMort study (1998-2015). The primary aim is to evaluate the AI model’s diagnostic accuracy and Gleason grading performance using completely independent datasets separate from any model development or previously used validation cohorts. The diversity of the validation samples, spanning multiple geographic regions, temporal collection periods and reference standards, allows evaluation of model robustness across varied clinical contexts. Secondary aims encompass evaluating AI performance in cancer length estimation and detection of cribriform patterns and perineural invasion. This protocol delineates procedures for data collection, reference standard clarification and prespecified statistical analyses, ensuring comprehensive validation and reliable performance assessment. The study design conforms to established reporting guidelines Checklist for Artificial Intelligence in Medical Imaging (CLAIM) and Standards for Reporting Diagnostic Accuracy Studies using Artificial Intelligence (STARD-AI), and recognised best practices for AI validation in medical imaging.

Ethics and dissemination

Data collection and usage were approved by the Swedish Regional Ethics Review Board and the Swedish Ethical Review Authority (permits 2012/1586-31/1, 2016/613-31/2, 2019-01395, 2019-05220). The study adheres to the Declaration of Helsinki principles, and findings will be made available in open access peer-reviewed publications.

FreshRSS

Retrospective validation of an artificial intelligence system for diagnostic assessment of prostate biopsies on the ProMort cohort: study protocol