Introduction
The efficacy of a new surgical procedure should be tested in an RCT if possible or at least against a control group. Depending on the research question, the follow-up period may vary from months to decades. In effectiveness studies, data from a high-quality registry offer a unique opportunity to study the outcome of interventions—whose efficacy is previously documented—on a large scale and in clinical praxis. To make conclusions about the outcome of surgery for degenerative spine conditions, most scientific journals and authors consider a follow-up period of at least two years to be necessary. Recent studies indicate, however, that a shorter evaluation time may be sufficient since the improvement curve for patient-reported outcome measures (PROMs) has been reported to level out and patients reach a threshold of change somewhere between 3 and 12 months after surgery [
1,
2].
The Swedish Spine Register, Swespine, is controlling the quality of lumbar spine surgery by sending follow-up questionnaires at 1, 2, 5 and 10 years post-operatively. It offers a unique possibility to evaluate any differences in outcome between one- and two-year follow-up in a large real-life database [
3]. The response rates are approximately 75% at follow-up 1 at one year post-op (FU1) and 65% at follow-up 2 at two years post-op (FU2).
Our aim was to study the outcome of lumbar surgery, measured with PROMs, at one and two years after procedure, with the specific question: are there clinically important differences in outcome between one- and two-year follow-up?
Discussion
This study confirms that potential improvements of clinical importance occur during the first year after lumbar surgery, irrespective of diagnosis and which PROM is being used. The mean differences between FU1 and FU2 in ODI, VAS
BACK/LEG and EQ-5D as well as the proportions reaching MIC indicated a minor deterioration in outcome between the first year and second year of follow-up, in all of the three diagnosis groups. However, a deterioration of this size could be expected in populations with degenerative conditions. Our data confirm the results by Adogwa et al. [
2] who concluded that ODI and VAS
BACK/LEG obtained at 12 months adequately predict the outcomes at 24 months in patients operated with lumbar nerve root decompression and fusion. Glassman et al. [
13] found no differences between FU1 and FU2 in the ODI and the NRS
BACK/LEG in patients with adult spinal deformity. In a recent study, Kim et al. concluded that 1-year outcomes reliably can predict 2-year outcomes for discectomy, but it was not clear for laminectomy or fusion procedures [
14]. The disconcordant results may be caused by the use of different methods compared to our study. In the report by Kim et al., the conclusion was based on a model where a change in ODI score of just one point could turn a meaningful outcome into a non-meaningful outcome, which was also pointed out by the authors. Small differences between two timepoints in a change score are likely to be seen and are possibly caused by normal fluctuations.
The proportion shifting from ‘success’ on GALEG and GABACK on FU1 to ‘not success’ on FU2 was 8.5% and 8% in the LDH group, 11% and 10% in the LSS group and 10% and 8% in the DDD group. Some of these individuals may represent well-known causes for reoperation such as recurring disc herniation, incomplete decompression, adjacent level stenosis and pseudarthrosis. The movements from ‘success’ to ‘not success’ and vice versa between the two follow-ups may also be manifestations of normal symptom fluctuations, recall bias or response shift and other measurement errors in PROMs.
Also, worth noticing is that the ‘not success’ allotment does not entirely consist of patients experiencing a deterioration. Included in this category are also individuals responding as ‘somewhat better’ and ‘unchanged’. Indeed, when investigating the number of patients shifting from ‘pain free’ or ‘much better’ to ‘worse’, it turns out that no more than 2.5% of the whole study population shift from ‘success’ at FU1 to ‘worse’ at FU2 on GALEG, the corresponding percentage for GABACK being 1.8%. Although there was a statistically significant deterioration in outcome between FU1 and FU2, the relevance in clinical practice can certainly be questioned.
These findings suggest that a follow-up at both one and two years post-surgery in effectiveness studies is unnecessary.
The difficulties in reaching a consensus regarding the definition of a minimal important change relevant to patients (MIC) and the most appropriate method to compute it [
15,
16] have led to the search for new strategies to define a clinically relevant outcome. Such an alternative is the definition of cut points of treatment success based on absolute scores as opposed to change in scores [
17]. Such a threshold is the value above (or below depending on the scale of the PROM) which a patient considers the magnitude of his or her encumbrance to be acceptable following the surgery. A final score may be less affected by response shift and recall bias than a change score would, although that possible advantage is yet to be proven. Thresholds of success based on absolute post-operative scores at FU1 and FU2, respectively, were estimated as previously described by Tubach et al. [
18] and recently by van Hooff et al. in a lumbar pain population [
17]. No relevant differences could be found.
The number of patients reaching the thresholds of success was rather low in the LSS group—approximately 47–60% depending on the PROM. The cause might be the stringent definition of a successful outcome, leaving those that responded as ‘somewhat better’ and ‘unchanged’ as unsuccessful. Other contributing factors to this finding may be that (i) the LSS population is relatively older, with a larger prevalence of comorbidity and probably other painful degenerative conditions such as hip arthritis, that may confound the outcome of the surgery if measured by PROMs, (ii) the indication for surgery is too wide and (iii) the degeneration of the spine is often a multisegmental process that opens for future pain and disability. In this study, we chose not to make a distinction between LSS patients undergoing decompression surgery only and those treated with decompression and fusion surgery. The decision was based on the studies by Försth et al., who concluded that the two groups had similar clinical outcomes at 2 years [
19,
20].
There appears to be no need for a 2-year follow-up of PROMs in this population and resources could be better spent on increasing the response rates at one year for registries, thereby improving the quality of the data and reducing the risks of assessment bias. Efforts could also be made to define certain groups of patients or procedures where a longer follow-up period such as 2 years may be needed—or shorter. The value of PROM assessment based on Swespine data collected at 5 and 10 years post-surgery is unclear and is yet to be studied. The more time has passed since the operation, the higher the risk of other health-related events aggravating the interpretation of the PROMs. More ‘objective’ endpoints, such as reoperation rates, cause of reoperation, time from index surgery to a new surgical event or in relation to implant survival, might be of greater importance than PROMs in the very long-term assessments.
Limitations
Although the strength of this study is the large real-life database, a selection bias may be present because of the proportion of non-respondents. In this study, the response rates were 75% at FU1 and 65% at FU2 and adjustments for risk factors associated with responding were not made. Solberg and colleagues concluded, however, that there were no differences in outcome between respondents and non-respondents in a population with degenerative lumbar disorders retrieved from the Norwegian spine registry and that the non-respondents could be treated as missing at random [
21]. The same conclusion was later drawn in a similar study based on data from the Danish spine registry [
22]. The high similarity between these countries should make the results applicable to the current study. The EQ-5D index is a measure designed for cost-effectiveness analyses and not for a similarity study such as this one. Therefore, the interpretation of the results involving EQ-5D should be made with caution. As in all studies where means are calculated and compared—longitudinally or cross-sectionally—there is a risk that patients worsening and improving neutralize each other to some extent. As illustrated in Fig.
3, this risk appears to be low. This study investigated whether there are any clinically important differences in PROMs between 1 and 2 years of follow-up or not. However, possible differences in outcome between 1 and 5 or 10 years are yet to be shown.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.