The authors declare that they have no competing interests.
KJ prepared the initial draft of the manuscripts and conducted ratings. GN prepared the initial draft of the manuscripts and conducted ratings and statistical analysis. LD served on the writing group and conducted ratings. GC served on the writing group and conducted ratings. JS served on the writing group and conducted ratings. DC served on the writing group and conducted ratings. SR conducted ratings. DH conducted ratings. BL performed statistical analysis. ST conducted ratings. CS conducted ratings. MF conducted ratings. RG provided the training. All authors contributed to and approved the final manuscript.
The National Institutes of Health (NIH) Health Care Systems Research Collaboratory (NIH Collaboratory) seeks to produce generalizable knowledge about the conduct of pragmatic research in health systems. This analysis applied the PRECIS-2 pragmatic trial criteria to five NIH Collaboratory pragmatic trials to better understand 1) the pragmatic aspects of the design and implementation of treatments delivered in real world settings and 2) the usability of the PRECIS-2 criteria for assessing pragmatic features across studies and across time.
Using the PRECIS-2 criteria, five pragmatic trials were each rated by eight raters. For each trial, we reviewed the original grant application and a required progress report written at the end of a 1-year planning period that included changes to the protocol or implementation approach. We calculated median scores and interrater reliability for each PRECIS domain and for the overall trial at both time points, as well as the differences in scores between the two time points. We also reviewed the rater comments associated with the scores.
All five trials were rated to be more pragmatic than explanatory, with comments indicating that raters generally perceived them to closely mirror routine clinical care across multiple domains. The PRECIS-2 domains for which the trials were, on average, rated as most pragmatic on the 1 to 5 scale at the conclusion of the planning period included primary analysis (mean = 4.7 (range = 4.5 to 4.9)), recruitment (4.3 (3.6 to 4.8)), eligibility (4.1 (3.4 to 4.8)), setting (4.1 (4.0 to 4.4)), follow-up (4.1 (3.4 to 4.9)), and primary outcome (4.1 (3.5 to 4.9)). On average, the less pragmatic domains were organization (3.3 (2.6 to 4.4)), flexibility of intervention delivery (3.5 (2.1-4.5)), and flexibility of intervention adherence (3.8 (2.8-4.5)). Interrater agreement was modest but statistically significant for four trials (Gwet’s AC1 statistic range 0.23 to 0.40) and the intraclass correlation coefficient ranged from 0.05 to 0.31. Rating challenges included assigning a single score for domains that may relate to both patients and care settings (that is, eligibility or recruitment) and determining to what extent aspects of complex research interventions differ from usual care.
These five trials in diverse healthcare settings were rated as highly pragmatic using the PRECIS-2 criteria. Applying the tool generated insightful discussion about real-world design decisions but also highlighted challenges using the tool. PRECIS-2 raters would benefit from additional guidance about how to rate the interwoven patient and practice-level considerations that arise in pragmatic trials.