Paper 1
This single-centre, prospective, controlled trial compared elastic IMN with non-operative treatment of displaced MTCF in adults aged 16–85. The paper lacked a clearly-focussed PICO-adhering research question [
23]. The study population was defined using detailed eligibility criteria, and the comparative treatment was well described, both of which demonstrate good study design. The intervention was lacking in detail and although a standardised surgical method was well described, no detail was provided regarding the operating surgeons, predisposing it to inter-surgeon variability and proficiency bias [
31]. However, the thorough description of the surgical method allows reproducibility of the study, making it generalisable. The study controversially selects two primary outcomes: time-to-union, for which the assessment process was explained in detail; and clavicular shortening, for which the assessment method was lacking. Secondary outcomes to be assessed are stated, but no description of data collection was provided, weakening study strength.
The study compared the efficacy of an interventional treatment with a comparison treatment and therefore an RCT is the preferred study design [
32]. Despite being a prospective, controlled study using a well-recognised randomisation technique, it is stated as not being a RCT, with little justification. This is an unclear statement, especially given the low level of present evidence, meaning a gold-standard RCT would be highly appropriate for this comparative-clinical question [
32].
The allocation-concealment process was briefly described as a single-block random assignment. This is a recognised, standardised method of true randomisation, which is a positive. However, no further information was provided regarding who performed the randomisation, use of blinding, sequence generation or treatment allocation. There was no mention of computer-assisted randomisation, and no audit trail to ensure reliability of the process. This lack of detail impacts negatively on the study, especially given that the authors later state it was not a RCT, raising suspicion regarding the validity of the randomisation process. Using a variable-block method is less predictable and would have strengthened the allocation process [
33]. A true randomisation process aims to prevent baseline confounding factors between study groups, ensuring they are well balanced and strengthening the study [
34]. Despite the process ambiguity, there were no significant differences (
P > 0.05) between the group demographics, increasing trial robustness.
A major flaw was the lack of blinding. Given the study’s nature, participant and radiographic-assessor blinding were not possible. However, blinding could have been employed for data collection at clinical assessment, shoulder function score (SFS) recording and at data and statistical analyses. This would have reduced the impact of observer or detection biases [
35].
The description of data collection methods were variable; however, a thorough description of the assessment technique for the primary outcome time-to-union is provided. This is a difficult end-point to assess, but a clear definition is denoted, with a standardised, reproducible technique described. The study uses 4-weekly radiographs, and although pragmatic and reproducible (enhancing external validity), this method only allows calculation of the time-to-union to the nearest 4-week interval, bringing detrimental imprecision to the study.
Other outcomes are measured more reliably, using contralateral comparison on standardised radiographs for shortening and computer tomography (CT) measurements for non-/mal-union. This is commendable as CT is the gold-standard assessment for union discrepancies, and the shortening measurement method is a standardised technique shown to have high agreement with CT measurements and high repeatability [
8,
36]. This makes the study reproducible, improving its external validity. Similarly, standardised, well-recognised SFSs are utilised [
37,
38]. However, description of the data collection method is brief; a negative point. The SFS results were collected via patient questionnaires at 2 years, leading to non-responder and recall bias as well as placing heavy reliance on self-reporting, which often results in a high loss to follow-up [
34,
39,
40]. However, no mention of this was detailed, with participants apparently accounted for throughout which, if true, is commendable in reducing the attrition bias effect [
34]. However, this is difficult to achieve, thus failure to mention it leads to suspicion.
Details of patients lost to follow-up, excluded from or declining to participate in the trial are not provided. Inclusion of a CONSORT-type flow diagram [
41] defining enrolment, allocation and follow-up numbers would resolve this and significantly strengthen the study. The authors disclose cross-over between treatment groups resulting in contamination bias [
42]. However, only per-protocol analysis is conducted with no intention-to-treat analysis, which would have reduced the impact on the randomisation process and avoided selection bias [
35]. This significantly weakens the study as intention-to-treat analysis would have provided the most conservative estimate of relative effect size, thus demonstrating the most reliable significant difference if found, despite the cross-over. Comparison of both analyses should have been performed as per-protocol analysis alone may distort the evidence [
43].
Another study weakness was discrepancies in the group’s follow-up, with the non-operative group unable to begin mobilising until 3 weeks post-injury compared with immediate post-operative mobilisation in the IMN group. Although difficult to assess, this may have introduced performance bias [
34], affecting shoulder stiffness or healing rates. However, the outcome methods for the groups were the same.
The lack of a sample-size calculation is a significant weakness, as achieving a statistically calculated sample size increases study strength due to increased power and probability that a significant difference will be discovered [
44]. Instead, the overall sample size is small with the study underpowered, more prone to Type II error, and hence less likely to find a significant difference [
45]. Conducting the trial over multiple centres would have improved this, as well as increasing external validity.
The study concludes that in patients with MTCF, when compared with non-operative management, IMN leads to significantly (P < 0.05) better shoulder function at 2 years follow-up, as well as faster time-to-union, lower non- and delayed union rates and less clavicular shortening. However, it found no significant difference in the total number of complications between groups.
Paper 2
This single-centre RCT assessed IMN with non-operative treatment in adult patients aged 17–40 with isolated MTCF. The study aimed to compare the efficacy of an intervention with a comparison treatment, hence a RCT is the preferred study design [
32], with the topic remaining relevant.
The paper fails to identify a clear research question at the outset, making trial specifics difficult to ascertain. When assessing PICO methodology [
23] the population is clearly defined using detailed eligibility criteria, demonstrating strong study design. However, criterion justification is lacking, e.g. ages 17–40. This is especially relevant given that MTCFs have a bimodal age distribution, occurring in the young adult and ages 55–75 [
46,
47]: with the latter therefore excluded. This introduces sample bias and substantially reduces the study’s generalisability and external validity as extrapolation to the older subgroup cannot be reliably performed. The authors’ affiliated institution is an Army Medical Centre. Therefore, the reason for this age criterion is likely due to a subgroup military population, a point not discussed but further evidenced by the demographic male majority and patient motivation to return to “duty”. If true, this should have been openly stated as this subgroup does not reflect the general population, further reducing external validity and generalisability.
The intervention and comparison techniques are well described, strengthening the study and enhancing reproducibility. Limited details regarding the operating surgeons are provided, however, which could mask inter-surgeon variability and proficiency bias [
31]. There are discrepancies between the time-to-theatre, (0–2 weeks), thus the amount of bone-healing underway at the time of surgery is variable which could affect results. Importantly this leads to a degree of cross-over and contamination bias [
42], as a third of participants in the operative group underwent up to 2 weeks non-operative management before surgery. This could be long enough for significant fracture callus formation [
48,
49], potentially predisposing union discrepancies. However, no consideration is given to this merging of the intervention and comparison techniques, reducing the likelihood of significant differences being found. Thus, results must be interpreted conservatively.
The outcomes are not clearly stated. Only on reaching the discussion section is “the goal” of the study detailed, implying the outcomes are SFS and non-union rate. Standardised, validated SFSs are used which is a positive due to their reliability, availability and validity [
50,
51], as well as ensuring the study’s reproducibility and generalisability. However, because they are patient-reported questionnaires they do carry the aforementioned negatives of non-responder and recall bias.
The randomisation process description is inadequate. There is no detail regarding how the initial randomisation was generated or who was conducting the randomisation and allocation processes. Hence this remains a questionable method of true randomisation with a lack of independent audit trail, leaving it open to potential tampering [
33]. Analysing group demographic data for significant differences can assess whether the randomisation process has overcome confounding factors: something this paper did not perform, another negative point.
The same blinding issues are true here as in paper 1, weakening the study by exposing it to detection bias [
35], with the aforementioned improvements to study design relevant.
The methods of data collection are relatively well described, with SFS questionnaires completed at initiation and at regular intervals up to 1 year post-injury, allowing progress monitoring. However, secondary outcome assessment methods for union and shortening were less reliable. Positives were standardised X-rays for each participant, reducing inter-participant variability regarding discrepancies in rotation or magnification on X-rays, and separate examiners performing radiographic measurements and averaging their individual findings for an overall result with increased accuracy. However, standard rulers and goniometers were used, both of which are open to instrument and assessor bias [
52]. Also, the definition given for “healing” (union final outcome point) was ambiguous, defined as “callus across the fracture site”, with no criteria provided. This lack of precision will lead to assessor inter-variability, contributing to decreased accuracy, as well as making the overall study less reproducible, reducing its generalisability.
The study lacks a CONSORT-type flow diagram [
41] and provides little information regarding the numbers of participants involved. It is stated that 57 enrolled, but no details are given concerning the overall number approached, participants changing treatment group from their random allocation, or any being lost to follow-up. If true, then both the latter strengthen the study considerably, but should not be assumed.
A positive point was the use of identical follow-up for both groups. This reduces treatment method confounding factors, and allows assessment of their pure effect more accurately. However, few details regarding post-treatment rehabilitation are provided, decreasing reproducibility and external validity. If rehabilitation involved intense, regular physiotherapy sessions, this may not be generalisable to most healthcare systems where multiple factors make this unfeasible.
There is no power calculation and the sample size is small, the negatives of which have been discussed previously. In the text, limited result information is provided and not easily extrapolated, e.g. SFS showed a significant difference (P < 0.04) at 3 weeks, but analysis to 1-year follow-up was not provided. Thus, the study temporally limits itself, and does not denote whether this difference is maintained long-term: information that is essential when considering the techniques for use in the general population, which is a major weakness.
The study concludes that in young adults with MTCF, when compared with non-operative management, IMN gives superior SFS at 3 weeks, no significant difference in union rates, but a higher overall complication rate.
Paper 3
This single-centre RCT compared elastic IMN with non-operative treatment of displaced MTCF in patients aged 18–65. As previously discussed, a RCT remains the preferred study design [
32]. Although the paper confirmed its aim, there was no fully PICO-adherent research question [
23]. Thorough eligibility criteria are provided, accurately defining the study population, though criteria justification was lacking. Age was again limited, with the aforementioned disadvantages remaining relevant. Treatment methods were described in detail, following standardised techniques allowing reproducibility, enhancing external validity. However, there was again a lack of detail regarding the surgeons operating. Outcome measurements were clearly identified and detailed, strengthening the study, enhancing readability and reproducibility. However, none were designated a priori, a limitation identified by the authors, suggesting they deduced the outcomes retrospectively.
The randomisation process was well described, using an accepted standardised balanced 4-block randomisation method. The paper excels where the others failed in providing specifics regarding the allocation process, detailing how the randomisation sequence was generated, allocated and by whom, enhancing study strength. However, the staff generating and allocating the randomisation sequence were the surgeons involved in the study, introducing bias and demonstrating a lack of blinding, a theme continued throughout the paper. This lack of an independent, external party and a defined audit trail reduces the process validity, leaving it exposed to tampering, resulting in a less robust trial design. The fixed-block randomisation method was somewhat predictable, especially as the block-size was known to the surgeons, and a variable-block randomisation would have been superior [
33]. The assigned treatment options of patients lost to follow-up were re-used in an attempt to maintain the original randomisation, but via a questionable method. Generating larger numbers of randomisation options initially with the allowance for drop-outs would have been more valid [
33]. Despite these limitations, no significant differences were found between the group demographics, a positive point in removing confounding factors and allowing a fairer comparison [
34].
The SFS outcomes used were the DASH and Constant scores which are validated, well-recognised, responsive, readily-available, reproducible scores [
37,
38]. The DASH questionnaires were assessed weekly for the first 6 months, allowing close observation of participant progression. However, due to expense and practicality, patients were seen monthly thereafter where four questionnaires were collected. This pragmatic approach introduces the risk that patients may complete all forms together retrospectively, an identified compliance limitation. The Constant score is used for the SFS at 6 and 24 months, though no justification is given for its replacement of the DASH score at this stage, which is especially relevant as the DASH scores demonstrated significant differences up to, but not after 18 weeks, whereas the Constant score showed significant differences at 6 and 24 months. Use of both scores throughout the follow-up would have increased the reliability and validity of the result, but would be less pragmatic, and may lead to increased loss to follow-up along with the aforementioned bias issues associated with questionnaire use [
34,
39,
40].
The radiological evaluation methods were well described, strengthening the study. Regular, standardised X-rays were used to reduce inter-patient variability and increase the chance of pinpointing the moment of union. However, no assessor details were provided and definitions of end-points were vague and non-reproducible, reducing external validity. Use of CT was employed if there was no obvious union after 24 weeks, which is a previously mentioned positive. However, surgery was then offered to those with a confirmed non-union. Given that follow-up lasted 2 years, this may have led to contamination bias and cross-over. Ten patients developed non-union, but no details regarding further surgery performed are provided. If significant cross-over did occur, this will bias results and appropriate intention-to-treat and per-protocol analyses for outcomes after that period should be performed and compared as discussed previously.
This paper provides good detail regarding patient numbers, including patients excluded, those declining to participate and those lost to follow-up. However, no sample-size calculation was performed and the study is underpowered, increasing the likelihood of false-negative results as previously discussed [
44,
45].
The paper concludes that in adults with MTCF, when compared with non-operative management, IMN demonstrated significantly (P < 0.05) better SFS, less shortening, fewer complications and shorter time-to-union.