To test the agreement and usability of a novel quality appraisal tool: A MeaSurement Tool to Assess systematic Reviews of Prognostic Factor studies (AMSTAR-PF).
Observational study.
14 appraisers of varied experience levels and backgrounds, including undergraduate, master’s and PhD students, postgraduate researchers, research fellows and clinicians.
Eight systematic reviews were rated by all reviewers using AMSTAR-PF.
Planned measures included intrapair and inter-pair agreement using Cohen’s and Fleiss’ kappa, time of use and time to reach consensus. Interrater agreement was an added measure, and Gwet’s agreement coefficient was calculated and presented due to its greater stability across agreement levels. The percentage of intrapair agreements identical or one category apart was also presented.
Interrater agreement averaged 0.59 (range 0.21–0.90), inter-pair agreement 0.61 (range 0.24–0.91) and intrapair agreement 0.75 (range 0.45–0.95) across the domains, with agreement for the overall rating 0.46 (95% CI 0.30 to 0.62) for interrater agreement, 0.46 (95% CI 0.17 to 0.74) for inter-pair agreement and 0.68 (range of averages 0.22–1.00) for intrapair agreement. The majority (60.7%) of intrapair ratings were identical, with 94.6% of final ratings either identical or only one category different for the overall appraisal. The time taken to appraise a study with AMSTAR-PF improved with use and averaged around 34 min after the first two appraisals.
Despite some variance in agreement for different domains and between different appraisers, the testing results suggest that AMSTAR-PF has clear utility for appraising the quality of systematic reviews of prognostic factor studies.