Background
The peer-review process is a cornerstone of biomedical research [
1]. It is considered the best method for helping scientific editors decide on the acceptability of a manuscript for publication and improving the quality of published reports [
2]. Nevertheless, the effectiveness of this system has been questioned [
3‐
8]. For example, peer reviewers do not consistently perform some essential tasks when evaluating the report of a randomized controlled trial (RCT) such as checking adherence to the CONSORT reporting guideline or checking trial registries to identify outcomes that are switched from the registered protocol [
5].
Perhaps peer reviewers are expected to perform too many tasks [
9], and simple and neglected tasks such as checking the reporting could be transferred to early career peer reviewers (ECRs) (i.e., junior researchers with no or little experience in the peer review of RCTs) [
10]. CONSORT guidelines and the associated COBPeer tool have been developed with the intent of making it possible to expect that after some basic training ECRs can screen for key items in a manuscript, thereby letting the already over-burdened senior/experienced reviewers focus on the areas where their subject and technical expertise will be of most value.
The objectives of this study were to evaluate accuracy in identifying inadequate reporting (i.e., incomplete reporting and a switch in primary outcome) in two-arm parallel-group RCT reports by ECRs using the COBPeer tool versus the usual journal peer-review process.
Methods
Study design
We performed a cross-sectional diagnostic study to identify inadequate reporting of RCTs. The study was developed, and the results are reported according to the guidelines on the STAndards of the Reporting of Diagnostic accuracy (STARD) [
11]. The checklist is available in Additional file
1. The protocol was published previously [
12]. There were no major changes to the protocol (Additional file
2).
Inadequate reporting was defined as incomplete reporting or a switch in primary outcome. It considered nine domains: eight most important CONSORT domains (rated as incompletely reported, yes/no) and a switch in primary outcome(s) (rated as a switch/no switch) [
5]. The eight most important CONSORT domains (which include 10 items) concern:
Outcomes (item 6a),
Randomization/sequence generation (item 8a),
Allocation concealment mechanism (item 9),
Blinding (items 11a, 11b),
Participant flow (items 13a, 13b),
Outcomes and estimation (item 17a),
Harms (item 19), and
Trial registration (item 23).
We defined a switch in primary outcome as a primary outcome was added, deleted, or changed between the primary outcome(s) published in the protocol or register and the primary outcome(s) reported in the report. Moreover, if the reference standard identified discrepancies in definition of the primary outcome(s) (i.e., variable of interest, terms of time frame, metric) between the primary outcome(s) registered and reported in the report, we considered that like a switch in primary outcome.
These domains were considered most important because they are frequently incompletely reported and are necessary for conducting a systematic review to evaluate the risk of bias and record the outcome data [
5].
To assess accuracy in identifying inadequate reporting, we used two tests: (1) ECRs assessing a manuscript by using the COBPeer tool (after completing an online training module) and (2) the usual journal peer-review process (i.e., any peer reviewer assessing a manuscript as per the first round of the journal’s peer-review process). The reference standard was the assessment of the manuscript by two experienced systematic reviewers achieving consensus in case of discrepancies. Thus, we compared face-to-face the accuracy in identifying inadequate reporting by the ECRs using the COBPeer tool and by the actual peer reviewers involved in the process at the first round.
The assessment of the reports by the usual journal’s peer-review process was performed before this study was conducted. However, the data extracted from the peer-review reports produced during the journal’s usual peer-review process, as well as the assessment of the manuscript by ECRs and the reference standard were performed prospectively.
Manuscript identification
We identified a sample of 120 manuscripts reporting the results of two-arm parallel-group RCTs for which we could access the first manuscript submitted and all peer-review reports. We searched PubMed using the “article types” filter
randomized controlled trial to identify all articles reporting results of RCTs published between 1 January 2015 and 13 December 2016 (search date: December 14, 2016) in:
BMC series medical journals that publish at least five RCT reports per year,
BMJ,
BMJ Open, and
Annals of Emergency Medicine.
These journals were chosen either because the peer-review reports are available online (i.e., BMC series medical journal, BMJ and BMJ Open) or because editors gave us access to peer-review reports (i.e., Annals of Emergency Medicine).
The search strategy is given in Additional file
3. One researcher (AC) screened all titles and abstracts and included all reports of two-arm parallel-group RCTs assessing any intervention in human participants. We excluded cluster RCTs, cross-over trials, equivalence and non-inferiority trials, feasibility studies, cost-effectiveness studies, phase I trials, study protocols, non-RCTs, secondary publications or analyses, pilot studies, systematic reviews, methodology studies, and early-phase studies. All journals included endorsed the CONSORT statement.
For each article identified, we retrieved the manuscript submitted to the first round of the peer-review process as well as all the peer-review reports submitted by peer reviewers during this first round. If the manuscript or any peer-review reports were not available, the RCT was excluded. Overall, of the 1600 citations retrieved, 222 were eligible, and 17 were excluded because the manuscript submitted or peer-review reports were not available. Of these, we randomly selected 120 reports to be evaluated. These reports were published in 24 different journals (i.e.,
BMJ,
BMJ Open,
Annals of Emergency Medicine, and 22 journals of the BMC series). The reference list is in Additional file
4.
Each manuscript had to be evaluated by a single ECR using COBPeer. A single ECR was considered sufficient because the ECRs had a single task to perform and they were supported by COBPeer. Furthermore, this design should facilitate future implementation in practice.
Test methods
ECRs using COBPeer after completing an online training module
Usual peer-review process
The peer reviewers involved in the usual peer-review process were those invited by editors who agreed to review the manuscript and submitted their peer-review report during the usual process. They were invited and accepted before the study was planned and they were consequently not aware of the aim of this study and were blinded to the assessment by the reference standard and ECRs. For each manuscript identified, we retrieved and merged all the peer-review reports available after the first round for peer review.
Two senior clinical epidemiologists independently extracted data from the peer-review report by using a standardized data extraction form available in Additional file
8. These researchers were not involved in other data extraction. They were blinded to the reference standard and ECR assessments. Disagreements were discussed to achieve consensus.
The researchers recorded whether the peer reviewers raised some concerns about the completeness of reporting of the eight CONSORT domains considered and identified a switch in primary outcome between the manuscript and the register. The assessments in all peer-review reports for each manuscript were combined, and the domain was rated as “incompletely reported” or “presence of switch in primary outcome” if at least one peer reviewer rated it as such. In case of disagreement between peer reviewers (one raising concerns about the completeness of reporting and one highlighting the adequacy of reporting), the domain was rated as “incompletely reported” or “presence of switch in primary outcome.” Any discrepancies were discussed to achieve consensus.
Reference standard
The reference standard was the assessment of the manuscript by four pairs of systematic reviewers with expertise in the conduct of systematic reviews. They independently evaluated the completeness of reporting of the eight CONSORT domains by using the CONSORT checklist (i.e., domains with CONSORT items provided without bullet points). They rated each domain as “completely reported,” “partially reported,” or “not reported.” A domain was considered incompletely reported if it was rated “partially reported” or “not reported.” The systematic reviewers were also asked to systematically compare the primary outcome reported in the manuscript and the primary outcome reported in the registry and indicate “no switch in primary outcome,” “presence of a switch in primary outcome,” “not available” (i.e., not registered), or “unable to assess” (i.e., insufficiently described in register). We defined switch in primary outcome as a primary outcome was added, deleted, or changed between the primary outcome(s) published in the protocol or register and the primary outcome(s) reported in the report. Moreover, if reference standard identified a discrepancy in definition of the primary outcome(s) (i.e., variable of interest, terms of time frame, metric) between the primary outcome(s) registered and reported in the report, we considered that like a switch in primary outcome.
This assessment corresponded to the assessment of the risk of bias (random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, selective reporting of outcomes) and the extraction of efficacy and harms outcome data in a systematic review.
The systematic reviewers were asked to rate domains as incompletely reported only if the reporting was a real barrier to the conduct of a systematic review. This approach allows for a reference standard evaluation from systematic reviewers’ perspectives and avoids focusing on the reporting of useless details (Table
1).
Table 1Assessment of domains by usual peer reviewer, reference standard, and early career peer reviewer
Usual peer reviewer | ➢ For CONSORT domains: For each manuscript included, determine whether the peer reviewers and/or editors raised some concern on the completeness of reporting of the following CONSORT items. The assessment of all peer-review reports and editors’ comments for each manuscript need to be combined. - Yes, some concern was raised - No, some concern was not raised ➢ For switched outcomes: For each manuscript included, check whether peer reviewers and/or editors identified inconsistency between data registered and reported for the primary outcome(s). - Yes, inconsistency was detected - No, inconsistency was not detected - Not available, because the study was not registered or the protocol was not available Comments: For blinding domains researchers could quote “unblinded study.” Moreover, if the domain was considered partially reported or not reported, it was quoted as not reported. |
Reference standard | ➢ For CONSORT domains: Now you will have to evaluate in each RCT if authors correctly reported all key elements of selected CONSORT items. Please evaluate whether authors correctly reported all key elements of the domain considered. Rate items as inadequately reported only if the reporting is a real barrier to the conduct of a systematic review. - Completely reported - Partially reported - Not reported ➢ For switched outcomes: Did authors register their protocol after the beginning of the study? - Yes - No - Not available Inconsistency between data registered and reported for the primary outcome(s) (i.e., at least one primary outcome added, deleted, or changed)? - Yes - No - Not available, because the study was not registered or the protocol is not available - Unable to assess (i.e., outcomes insufficiently described in the register) Comments: For blinding domains researchers could quote “not available because blinding was impossible.” Moreover, domains partially reported or not reported were quoted as not reported. |
Early career peer reviewer | |
The systematic reviewers were blinded to the peer-reviewer assessments, ECR assessments, and the content of COBPeer. They were instructed to base their assessment only on the content of the manuscript. Any differences between systematic reviewers were resolved by discussion, with the involvement of an arbitrator if necessary.
Outcomes
The primary outcome was the mean number of domains accurately classified per manuscript initially submitted to the journal on a scale from 0 to 9. Each of the eight CONSORT domains was rated “incompletely reported” (yes/no); when blinding was not possible, the related domain (item 11a/b) was rated “no incomplete reporting.” The domain on “switch primary outcome” was rated “yes/no” or “unable to assess or unavailable.” Table
1 describes the modalities of assessment and cutoff for each test and reference standard.
Secondary outcomes were the mean number of CONSORT domains accurately classified per manuscript and the sensitivity, specificity, and likelihood ratio to detect the domains as incompletely reported and to identify a switch in primary outcome.
We also performed a sensitivity analysis to check that our results were not related to the reference standard. For this purpose, for all false-positive domains (i.e., the domain was considered not adequately reported by ECRs using COBPeer or by the usual peer-review process, but the reference standard considered the domain adequately reported), we asked the two systematic reviewers to check their assessment. They could change their assessment if necessary. They were blinded to whether the false positive was identified by ECRs or the usual peer-review process.
Sample size calculation
We allowed for detecting an effect size of 0.3 for the mean number of domains accurately classified per manuscript with a power of 90% and a two-sided alpha level of 5%. To that end, we needed evaluation of 120 reports [
18]. Each ECR included, assessed 1 manuscript, so we randomly selected 120 reports of two-arm parallel-group RCTs to be assessed in the study. We analyzed only the first evaluation of each report.
Statistical analysis
Quantitative data are described with means (SD) and/or medians [Q1-Q3] and categorical data with numbers (%). We compared the mean number of domains accurately classified per manuscript by ECRs versus the usual peer-review process by using paired
t test. The sensitivity and specificity for the ECRs and the usual peer-review process were compared by using an exact McNemar chi-square test. Exact binomial 95% confidence intervals (CIs) were calculated for sensitivity and specificity [
19]. Positive and negative likelihood ratios were computed, and 95% CIs were based on formulae provided by Simel et al. [
20].
SAS 9.4 (
SAS Institute
Inc., Cary,
NC) was used for descriptive statistics and tests and the epiR package in R v3.5.1 to estimate diagnostic performance parameters [
21].
Changes to the protocol
There were no changes in the protocol, but the wording used was slightly modified and clarified.
First, we clarified that we focused on the eight most important CONSORT domains (which include 10 CONSORT items). Second, we classified each item as “incompletely reported” (yes/no) instead of “completely reported” (yes/no) as stated in the protocol and registry. Third, we clarified the assessment of domains by usual peer reviewers, reference standard, and early career peer reviewers in Table
1 which was not completely detailed in the protocol and registry.
Discussion
In this study, we explored a new approach for the peer-review process. We proposed to transfer some tasks that are essential but often neglected by peer reviewers to early career researchers who had not previously been involved in peer review. For this purpose, we focused on a small number of relatively simple tasks (evaluating adherence to eight CONSORT domains and identifying a switch in primary outcome). To standardize this task and the feedback provided to authors, we developed the COBPeer tool, a specific online tool and a training module. ECRs using COBPeer were more accurate than the journal’s usual peer-review process in detecting inadequate reporting. They showed high sensitivity but lower specificity in detecting incomplete reporting and a switch in primary outcome.
The interpretation of these results should consider differences between the different processes. First, ECRs were prompted and supported by COBPeer to check the completeness of reporting of the eight CONSORT domains and a switch in outcome; it was the only task they had to do, and they were specifically trained to do it. In contrast, usual peer reviewers were not specifically prompted to check the adequacy of reporting, they did not have access to COBPeer, and they were requested to perform several tasks other than checking the reporting. Nevertheless, because we selected journals endorsing CONSORT, peer reviewers may more likely be aware of issues of transparency.
Furthermore, ECRs were aware that they were participating in a research project, whereas usual peer reviewers did not know that their assessment would be used in a study on the completeness of reporting. However, because the peer-review process was an open process, peer reviewers could be encouraged to perform a more complete review knowing that it was made publicly available.
Finally, the results were based on the assessment by only one ECR using COBPeer per manuscript, whereas for the usual peer-review process, we considered the assessment of all peer reviewers involved in the first round (i.e., 2.5, on average).
This study has important strengths. We identified a large sample of manuscripts submitted to 24 journals; the peer-review process was conducted as usual according to each journal’s strategy, and the information provided to peer reviewers was not modified because we retrospectively evaluated a manuscript’s peer-review report. We had a large recruitment strategy for ECRs, who came from various countries. Our approach related to the reference standard was pragmatic to avoid considering some domains as incompletely reported, and the information provided is sufficiently detailed to be integrated in a systematic review.
Our study has some limitations. First, we focused on only two-arm parallel-group RCTs and cannot extrapolate our findings to other more complex study designs. But this design is the most popular and reported one. Second, almost all of the ECRs were physicians and this may not reflect the broad spectrum of peer reviewers. Third, we included only articles that were peer reviewed and published. This inclusion may imply that the quality of the methods and reporting of these manuscripts was probably initially relatively good and may not reflect the quality of all submitted manuscripts. Fourth, we considered only the first round of the peer-review process and cannot exclude that some peer reviewers identified inadequate reporting at a later stage of the process. However, improving transparency should be a task performed at an early stage in the process because lack of transparency is a major barrier to an accurate evaluation of methodological quality. Fifth, the online tool considered only eight CONSORT domains and a switch in primary outcome. However, we focused on the most important and often poorly reported domains. Lastly, we did not select ECRs according to their results after the training module. We could expect that deciding a threshold for authorization to become an ECR reviewer would improve the results.
The results of this study have important implications. First, we could implement a new process of peer-review relying on a two-stage peer review. However, we need to explore the feasibility of this process. The involvement of junior reviewers could have an impact on the whole peer-review system. This new system could also increase the duration of the peer-review process and increase the burden on authors. We need to evaluate this new system in an RCT. Finally, we believe that COBPeer could probably be used by stakeholders other than early career researchers. According to the in-house support available for editors, COBPeer could probably be used by editorial managers or other staff.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.