Criteria of quality
We chose internal validity as a measure of quality according to the definition given by Gehlbach [
7], namely that a RCT is internally valid when "within the confines of the study, results appear to be accurate and interpretation of the investigators is supported". We selected criteria of internal validity according to the recommendations of Moher et al. [
8]. The relevant points are addressed below.
I. Definition of the quality construct
We intended to measure the presence or absence of various criteria of RCT quality as described in the published manuscript. No attempt was made to contact the authors of a manuscript either to clarify the information provided in the manuscript or to gain additional information about a RCT. We acknowledge that relying on the published manuscript in order to assess the quality of a RCT may be biased (1) against well-designed RCTs that were reported in poorly written manuscripts and (2) in favor of poorly-designed RCTs that were reported in well-written manuscripts [
9]. Thus, our scoring process ultimately measured the quality of the report of the RCT manuscript, rather than the true methodological quality of the trial as it was conducted. However, attempting to obtain an understanding of the true methodological quality of a RCT in a retrospective manner by contacting the authors of the manuscripts would undoubtedly collect more information on recent RCTs because their authors will be more accessible (i.e., less likely to have relocated, retired, or died). Attempting to contact the authors of manuscripts is rarely successful [
10] and, when it is successful, accurate information about the design and conduct of the RCT is not always forthcoming [
11,
12].
II. Definition of the scope of internal validity and identification of quality criteria
Although random allocation and the use of a concurrent control group are the
sine qua non of the RCT, additional criteria have been so frequently included in their design and execution that they are now commonly considered as part of quality RCTs. Several sources (themselves located by PubMed MEDLINE and bibliography searches) were used to identify such criteria [
2,
9,
13‐
18]. After forming a composite list of internal validity criteria from these sources, we searched the literature (again by means of PubMed MEDLINE and bibliographies) for instances where the presence or absence of each criterion in a RCT affected the results obtained from the RCT. Thus, we identified criteria that were supported by empirical evidence as measures of RCT quality. We identified six criteria that had predominantly supporting evidence in their favor. Subsequently, allocation concealment was included as a separate quality criteria. The quality criteria, with brief descriptions, are listed in Table
1.
Table 1
The quality scale This table lists the criteria of quality that were used to score the RCT manuscripts. Abbreviated definitions for the presence (1 point) or absence (0 points) of each criterion are provided.
1) assessment of the distribution of patient characteristics and prognostic factors between groups
|
present | distribution of patient characteristics and prognostic factors assessed without asymmetry between groups |
absent | not mentioned; distribution of patient characteristics and prognostic factors assessed with asymmetry noted between groups |
2) prevention of the movement of patients between groups after allocation, and the use of intention-to-treat analysis
|
present | use of intention-to-treat analysis; no movement of patients between groups confirmed |
absent | not mentioned; patients known to change groups before analysis |
3) the blinding of the patients to the treatment they received
|
present | statements of double-blind present; use of a placebo; statements of the treatments being indistinguishable present; patients not aware of study due to clinical condition |
absent | not mentioned; lack of placebo use in control group; readily-distinguishable treatments; blinding breakdown confirmed |
4) the blinding of the health care providers to the treatments received by the patients
|
present | third-party dispensation of treatments; statements of health care provider blinding present; health care provider identical to outcome observer, and outcome observer is blinded |
absent | not mentioned; health care team aware of patient allocation; lack of placebo in control condition; readily-distinguishable treatments; blinding breakdown confirmed |
5) the blinding of the outcome observer to the treatment received by the patient
|
present | statements of double blind present; objective outcome; use of standardized tests or questionnaires that do not require an outcome observer; subjective principle outcome but outcome observer blinded to treatment; blinded health care providers performing outcome assessment |
absent | not mentioned; subjective outcome without blinding of the outcome observer; blinding breakdown confirmed |
6) completeness of follow-up
|
present | no patients lost to follow-up; acute experimental design does not permit loss of patients; analysis of lost patients provided according to randomization groups, with reason for loss |
absent | not mentioned; no analysis of lost patients provided; effect of patient loss to follow-up confirmed |
7) allocation concealment
|
present | use of consecutive opaque envelopes or pre-ordered treatments; third party assignment of allocation |
absent | not mentioned; repeatable pattern of allocation; use of obvious identifiers for allocation (e.g., birth date, record number); assignment of treatment by treating physician |
We limited our quality scale to measure criteria that have been demonstrated empirically to be associated with the quality of RCTs. This necessarily excluded many items associated with RCT design and execution that are widely thought to affect quality or that are included in commonly-used quality scales, but it provided us with a defensible "bare minimum" definition of quality. It should be noted that we did not intend our list of criteria to be encompassing of all aspects of quality; our criteria were intended to serve only as a tool for the comparative analysis of the two sets of RCT manuscripts for the purpose of this study.
III. Scoring System
Each of the seven criteria was scored as being present (1 point) or as absent (0 points) in the RCT manuscript. Definitions of each criterion are shown in Table
1. If a RCT manuscript did not mention the presence of a criterion, it was considered absent. Conversely, all written statements in the manuscripts were assumed to be accurate both factually and semantically.
IV. Criteria Scoring Verification
The intra-rater reliability for the scoring of the quality criteria was determined by comparing the individual criteria scores given to n = 16 RCT manuscripts by one of the authors of this communication (MKB) on two occasions separated by 3 weeks. The correlation coefficient (Kappa) measured in this manner was 0.94.
Inter-rater reliability was determined by comparing the quality criteria scores given to n = 10 RCT manuscripts by two different examiners. One copy of each manuscript was scored by one of the authors of this communication (MKB) while the other copy was scored by an independent examiner (Dr. Babak Jahromi, Department of Neurosurgery, the University of Toronto) who was provided with a thorough description of the criteria. The correlation coefficient (Kappa) for inter-rater reliability was determined to be 0.74.
Manuscript selection and the screening process
We chose to evaluate the field of brain injury because two search techniques for sampling the population of these RCT manuscripts were readily available. The first search technique was our own PubMed MEDLINE search. The second search technique was performed by the Cochrane Collaboration Injuries (CIG) Group, and forms the CIG trials registry. Copies of the RCT manuscripts identified by these two search techniques were retrieved through the library holdings and interlibrary loan services of five universities.
Next, the manuscripts were read by one of us (CY) to screen-out inappropriately identified manuscripts. Table
2 provides a detailed list of these exclusions. Inherent in the phrase 'randomized controlled trial' is (1) the random allocation of patients into multiple groups for prospective analysis, and (2) the concurrent comparison of at least one group that receives the experimental treatment against another group that does not; manuscripts that did not include random allocation and a concurrent control group were excluded. Furthermore, in order for a manuscript to be considered pertinent to the study of brain injury one of the following conditions had to be met: (1) brain injury had to directly define the patient population; (2) brain injury had to be the cause of a second condition (e.g., seizures) that defined the patient population; or (3) brain injury had to be the outcome measure for the patient population. If none of the above conditions were met the manuscript was discarded from further examination. Duplicate publications, protocol descriptions, abstracts, letters-to-the-editor, and incomplete or preliminary reports were also removed during the screening process.
Table 2
Exclusion of manuscripts from the PubMed MEDLINE and CIG Trials Registry groups of manuscripts Manuscripts inappropriately identified by the PubMed MEDLINE search and the CIG trials registry were removed from review during a screening process performed by one of the authors of the current communication (CY).
INITIALLY IDENTIFIED |
139
|
312
|
libraries unable to locate | 0 | 15 |
unrelated to brain injury | 2 | 3 |
duplicate publications | 2 | 8 |
inaccurately claimed to be a controlled trial | 22 | 47 |
inaccurately claimed to use randomization | 11 | 30 |
abstracts / letters-to-the-editor | 1 | 31 |
protocol descriptions | 3 | 0 |
incomplete / preliminary reports | 0 | 4 |
non-human subjects | 0 | 1 |
TOTAL NUMBER DISCARDED |
41
|
139
|
REMAINING |
98 (70% of initially identified) |
173 (55% of initially identified) |
The design and yield of the two search techniques was as follows:
1) the PubMed MEDLINE search: The first search technique we used to identify RCT manuscripts pertaining to brain injury involved the PubMed search engine of the MEDLINE database. It was designed to represent a typical literature search performed by a North American researcher who is fluent only in English. The search term "brain injuries" (C10.228.140.199) was used with the limitations of (1) randomized controlled trial, (2) human subjects, and (3) publication in English. The PubMed MEDLINE search included manuscripts indexed from January, 1966, up to February, 2001 (the time at which the search was performed).
The PubMed MEDLINE search identified n = 139 manuscripts. During the screening process, n = 41 manuscripts from the original 139 (30%) were discarded leaving n = 98 manuscripts (see Table
2 for a detailed list of the exclusions).
2.) the CIG trials registry: The Injuries Group of the Cochrane Collaboration was kind enough to share their list of RCT manuscripts with us for the purpose of conducting this study. The list of manuscripts they provided was compiled by means of the following three steps:
step 1) The CIG trials master list was searched using the keywords "head" or "brain" in conjunction with "injur*" or "trauma*". The CIG trials master list is a local database maintained at the London School of Hygiene and Tropical Medicine that uses a detailed search strategy to identify RCTs from multiple computerized databases (a copy of this search strategy is available from Ms. Fiona Renton of the London School of Hygiene and Tropical Medicine Fiona.Renton@lshtm.ac.uk) as well as various hand searches of journals performed during the writing of systemic reviews; it is updated quarterly.
step 2) MEDLINE, EMBASE, and CENTRAL databases were searched using the exploded keyword "head injuries:ME" or "head injuries:TI". EMBASE includes references from 1974 onward and, while it uses its own database, it is based on an indexing hierarchy which incorporates that used by MEDLINE. Here, MEDLINE was searched with the SilverPlatter search engine, not with the PubMed Search engine. Manuscripts of the MEDLINE database indexed as early as 1966 were accessible to the SilverPlatter search engine. The CENTRAL database is a general list of clinical trials that is maintained by the collaborative efforts of multiple Cochrane specialty groups.
step 3) Manuscripts identified by hand searches of relevant journals and from references provided by direct contact with experts in the field of brain injury were also included.
The original CIG trials registry was completed in 1998 and was last fully updated in May, 2001; it is that version which was used in our study.
The CIG trials registry included n = 312 manuscripts. During the screening process, n = 139 manuscripts from the original 312 (45%) were discarded leaving n = 173 RCT manuscripts (see Table
2 for a detailed list of the exclusions).
3.) overlap between the PubMed MEDLINE search and the CIG trials registry: Of the total unscreened samples of manuscripts identified through each search technique, n = 80 manuscripts were present in both samples; this corresponded to 58% of the sample of manuscripts identified by PubMed MEDLINE search and 26% of the sample of manuscripts from the CIG trials registry. After the removal of inappropriate manuscripts during the screening process, and scoring process only n = 56 manuscripts were identified by both the PubMed MEDLINE search and the CIG trials registry. This corresponded to 57% and 32% of the PubMed MEDLINE search and the CIG trials registry samples, respectively.
The scoring process
Each of the RCT manuscripts was read by both authors of the current communication (CY and MKB) who, for clarity's sake, will be referred to as "examiners". One examiner ("non-judging examiner": CY) performed the screening process described previously, then recorded the year-of-publication of each manuscript that survived the screening process in a computerized spreadsheet (Microsoft Excel) and marked them with identification numbers. Then, the non-judging examiner hid the names of the authors of the manuscript, the authors' degrees and departmental affiliations, the journal in which the RCT manuscript was published, and the year-of-publication of the manuscript with black marker. This information was covered wherever it was found in the manuscript so that when the manuscript was scored by the second examiner ("judging examiner": MKB) there would be no potential for bias [
8,
19]. The data collected by the judging examiner was entered into a computerized spreadsheet that was different from the one linking the year-of-publication of the manuscripts with their identification numbers. The two spreadsheets were combined only when all the manuscripts had been read.
As mentioned above, allocation concealment was included in the list of quality criteria after the first evaluation of the manuscripts. Accordingly, the judging examiner re-read all the manuscripts specifically to determine the inclusion of allocation concealment. The manuscripts were still blinded as described above, and the data was entered into a third spreadsheet that was subsequently analyzed independently of the preexisting data.
Manuscripts in French and Spanish were read without written translation by the judging examiner, whereas written translations were provided to the judging examiner for manuscripts in Japanese (by CY), German and Italian (by Mrs. Margaret K. Borsody), and Chinese (by Language Line, Inc., document translation service).
Statistical analysis
After completion of the scoring process, statistical analyses were conducted by the judging examiner. The data was considered interval in nature and thus data analysis for discrete variables was used [
20]. Furthermore, since this study was constructed as a longitudinal analysis of the change in quality scores over time, it was necessary to use some form of regression analysis to examine the data. Considering these requirements, binary logistic regression analyses were performed for each individual quality criteria. All statistical analyses were done by SPSS (version 11.5, SPSS Inc.). Scores for the individual quality criteria were examined as dependent variables against the independent variable of year-of-publication. Significance is defined as a P < 0.05.
Since the samples of manuscripts from the PubMed MEDLINE search and the CIG trials registry are known to be derived from the same parent population of RCTs (i.e., RCTs in the field of brain injury), it is inappropriate to directly compare them against each other with statistical tests. Rather, it was our goal to analyze the two samples of RCT manuscripts separately, and to make likely conclusions about the parent population from each sample of manuscripts as if there was no other sample of manuscripts available for comparison. Then, knowing that the two samples of RCT manuscripts represent the same parent population, it was our intention to compare the conclusions derived from the separate analyses to determine the impact of the search technique thereupon.