Background
Total knee or hip arthroplasty (TKA/THA) is an effective treatment for most individuals who suffer from pain and loss of function due to end stage symptomatic osteoarthritis of the hip and knee (OA). In 2010, 109 and 153 patients per 100,000 persons received a TKA or THA respectively in Europe [
1]. The development and progression of OA are strongly influenced by age and obesity and both occur more frequently in women [
1]. Parallel to the rising prevalence of knee and hip OA, due to an ageing society and obesity, surgery rates are rising as well [
2‐
4].
TKA and THA should not be performed too early since revision rates are higher in younger patients and the length of life of a prosthesis is limited [
5]. On the other hand performing a surgery earlier gives more years of productive quality-adjusted life years (QALY’s). However, outcomes after revision surgery are generally worse compared to primary surgery [
6]. Current practice shows that preoperative disease severity varies largely among centers and countries [
7,
8], suggesting differences in timing. In addition, about 10–20% of the patients is not satisfied after primary TKA/THA [
9‐
12], possibly caused by unmet expectations of patients or due to suboptimal timing of surgery.
Previous research has identified preoperative determinants that influence outcomes, but these differed between studies and had opposite directions [
13]. This may be due to lack of power so that some studies did not find any effect, while other studies did not adjust for confounders. In addition, most registries collect a minimal data set [
14] e.g. only the VAS scale for pain. Therefore, pooling the data from available cohort studies may provide more reliable evidence on which determinants influence the outcome after TKA/THA because of the larger sample size than separate studies and a more comprehensive set of questionnaires with the ability to measure each outcome more reliable compared to registry studies.
Objective
The present study aims to examine the independent effect of several preoperative determinants for outcomes after TKA or THA by pooling individual patient data from available prospective cohorts in the Netherlands.
Methods
Study design and setting
The ARGON-OPTIMA (Outcome Predictors for TIMing of ArthropLasty) study is part of the ARGON program (Arthritis Research Group Orthopaedics in The Netherlands). Within this study, we pooled individual patient data from all available prospective TKA/THA cohorts in the Netherlands. All orthopaedic clinics in The Netherlands were invited to participate and submit data. We included prospective cohorts among patients with primary OA who underwent TKA or THA, with at least one preoperative and one postoperative measurement on functional or clinical outcomes and a follow-up of at least one year. Cohorts regarding metal-on-metal (MoM) prostheses were excluded, since these are not recommended in current guidelines in The Netherlands.
Participants
Twenty hospitals submitted data and 20 cohorts from 11 hospitals were included. Nine hospitals were excluded because they did not meet the inclusion criteria. Of the included cohorts, 8 cohorts included 1783 knee OA patients undergoing primary TKA and 12 cohorts included 2400 hip OA patients undergoing primary THA. Table
1 shows the characteristics of patients per cohort.
Table 1
Description of included TKA and THA databases
TKAa | 1 | 340 | 228 (67) | 68.9 (9.3) | 29.3 (7.6) | 2 weeks, 3 months, 2–7 years |
2 | 382 | 271 (71) | 67.0 (9.7) | 29.5 (4.7) | 1 year |
3 | 45 | 20 (44) | 67.8 (6.5) | 29.3 (5.1) | 3, 6, 12 months |
4 | 101 | 66 (65) | 68.9 (9.1) | 30.9 (5.1) | 6 weeks, 6, 12 months, 5 years |
5 | 496 | 274 (55) | 65.9 (7.9) | 27.6 (3.5) | 6, 12, 24 months |
6 | 169 | 120 (71) | 69.8 (9.9) | 29.2 (4.7) | 6 weeks, 3 months, 1 year |
7 | 41 | 22 (54) | 62.2 (9.5) | 32.0 (5.4) | 3, 6 months, 4 years |
8 | 209 | 127 (61) | 66.4 (10.2) | 29.7 (6.4) | 6 weeks, 3, 6, 12 months |
THAb | 1 | 498 | 319 (64) | 65.7 (10.8) | 26.9 (4.0) | 2 weeks, 3 months, 2–7 years |
2 | 149 | 106 (71) | 60.4 (6.9) | 26.8 (4.2) | 6 weeks, 3, 6, 12, 24 months |
3 | 398 | 247 (62) | 66.6 (10.2) | 27.2 (4.5) | 1 year |
4 | 55 | 32 (58) | 67.7 (9.7) | 27.3 (3.6) | 3, 6, 12 months |
5 | 73 | 46 (63) | 65.2 (6.7) | 28.0 (4.6) | 6 weeks, 3, 6, 12, 24, 60 months |
6 | 26 | 18 (69) | 62.9 (5.0) | 24.5 (2.9) | 6 weeks, 3, 6, 12 months |
7 | 354 | 228 (64) | 65.9 (7.9) | 26.4 (3.4) | 3, 12 months |
8 | 100 | 58 (58) | 68.7 (10.0) | 28.2 (4.0) | 6 weeks, 3, 12 months |
9 | 287 | 188 (66) | 67.5 (10.6) | 26.6 (4.1) | 6 weeks, 3, 12 months |
10 | 73 | 46 (63) | 66.7 (12.0) | 26.5 (4.2) | 3, 6, 12 months |
11 | 33 | 22 (67) | 63.0 (11.9) | 26.6 (4.3) | 3, 6, 48 months |
12 | 354 | 257 (73) | 69.0 (10.9) | 28.2 (4.5) | 6, 12, 24 months |
Preoperative determinants
The assessed preoperative determinants were age, gender and BMIs. Furthermore, we examined the influence of preoperative health related quality of life (HRQoL), functioning and pain.
Postoperative outcomes
We studied the effect on the absolute level of the postoperative outcome, but also on the extent of improvement to assess which patients would benefit most from change in health related quality of life (HRQoL), functioning and pain.
Standardization
Since different cohorts used different questionnaires, these were standardized to compare the same domains across different questionnaires. Furthermore, multiple questionnaires were sometimes used to measure the same domain within a cohort. As each patient should be included only once for each domain, we ordered questionnaires in their ability to measure each outcome reliably. This was done during an ARGON consortium meeting. A group of experts within the ARGON consortium discussed about the ordering of questionnaires until consensus was reach. The following main points were taken into concern: is it a general or disease specific questionnaire, how many items are used to calculate the composite score, and is it a common used questionnaire in the Netherlands.
Only the highest rated questionnaire in each dataset was included. The following ordering was used:
-
Health related quality of life:
1.
Physical component summary scale of the 36-item short form health survey (SF-36/RAND-36) (36 items)
2.
Physical component summary scale of the 12-item short form health survey (SF-12) (12 items)
3.
EuroQoL 5 Dimensions (EQ-5D) (5 items)
-
Functioning:
1.
Hip/knee disability and Osteoarthritis Outcome Score (HOOS/ KOOS) subscale Activities of Daily Living (ADL) (17 items)
2.
Western Ontario & McMaster Universities Osteoarthritis Index (WOMAC) subscale Physical Function (PF) (17 items)
3.
HOOS-Short form (PS)/KOOS-Short form (PS) (5 items)
4.
Oxford Hip Score (OHS) subscale function (6 items)/ Oxford Knee Score (OKS) subscale function (5 items) according to Harris et al. [
15,
16]
-
Pain:
1.
HOOS/ KOOS subscale Pain (10 items)
2.
WOMAC subscale Pain (5 items)
3.
OHS subscale Pain (6 items)/ OKS subscale Pain (7 items) according to Harris et al. [
15,
16]
4.
Visual Analogue Scale (VAS) pain scale
For each patient we calculated the standardized score at each time point using the following formula (functioning as example):
$$ \mathrm{Standardized}\ \mathrm{Functioning}\ \mathrm{score}\ \mathrm{for}\ \mathrm{patient}\ \mathrm{X}\ \left(\mathrm{at}\ \mathrm{t}\mathrm{ime}\ \mathrm{point}\ \mathrm{t}\right)=\frac{\left(\mathrm{functioning}\ \mathrm{score}\ \mathrm{for}\ \mathrm{patient}\ \mathrm{X}\ \mathrm{in}\ \mathrm{cohort}\ \mathrm{Y}\ \left(\mathrm{at}\ \mathrm{t}\mathrm{ime}\ \mathrm{point}\ \mathrm{t}\right)\hbox{--} \mathrm{preoperative}\ \mathrm{mean}\ \mathrm{of}\ \mathrm{functioning}\ \mathrm{among}\ \mathrm{patient}\mathrm{s}\ \mathrm{in}\ \mathrm{cohort}\ \mathrm{Y}\right)\ }{\mathrm{preoperative}\ \mathrm{SD}\ \mathrm{of}\ \mathrm{functioning}} $$
Some questionnaires differed in the direction of the scale e.g. on the VAS pain scale, lower scores mean less pain whereas lower scores mean more pain on the HOOS/KOOS subscale pain. The direction of all scales were recoded so that higher scores referred to better values).
Statistical analysis
Data of TKA and THA were analyzed separately. As a first step, linear mixed models (LMM) were used to estimate the influence of each preoperative variable on each major outcome for each cohort separately, adjusted for the other variables. As determinants were included in the fixed part of the LMM: the standardized preoperative score (HRQoL, functioning and pain), age, sex, BMI and follow-up time. Interaction terms were fitted between the determinants and follow-up time. In the LMM the patients were specified as the subjects, with an unstructured covariance matrix. This was done for each standardized postoperative outcome. In the second step, the regression coefficients from all cohorts were pooled using a random effects model to obtain one pooled estimate for each preoperative variable and outcome. Given the pooled estimates of the impact of preoperative status on postoperative status, we can also determine the total improvement (postoperative minus the preoperative status). If patients would have the same amount of improvement, 1 point higher in preoperative status would result in a postoperative status of 1 point higher. So if the increase in postoperative status is < 1 (e.g. 0.4), this means that the improvement is 0.6 points smaller for every point increase in preoperative status.
Given that preoperative scores were standardized, the pooled regression coefficient should be interpreted as the number of standard deviations that an outcome will change, per point increase in the preoperative variable. For example looking at the effect of age on postoperative functioning with a standardized regression coefficient of 0.2 and the preoperative SD of functioning is 7, this means that one year increase in age is estimated to increase the postoperative functioning by: 0.2*7. To facilitate interpretation of the pooled standardized regression coefficients of age, BMI and gender, we transformed standardized regression coefficients back to a 0–100 scale (e.g. HOOS, SF-36), using the preoperative standard deviation (SD) of the study with the highest weight in the random effects model. In addition, we will illustrate the potential size of the effects by describing scenarios.
SPSS 20 was used to perform the LLM and Stata 11.1 for the random effects model. A p-value of 0.05 was considered significant in all analyses.
Assessment of heterogeneity
The I
2 statistic was used to quantify heterogeneity between cohorts. This can be interpreted as the percentage of total variability in a set of effect sizes due to between-studies variability. We considered results as heterogeneous when I
2 was 50% or greater [
17].
Ethics approval and consent to participate
The Medical Ethical Committee of the Leiden University Medical Center (CME P15.043/SH/sh) confirmed on February 13 2015 that ethical approval for this type of study is not required under the Dutch Medical Research (Human Subjects) Act. The hospitals that supplied anonymous data obtained written informed consent from the study participants.
Discussion
The present pooled analysis of 1783 knee and 2400 hip OA patients shows that patients with a higher preoperative quality of life or functioning and less pain also have better postoperative outcomes but that they improve less. Furthermore, women and patients with a higher BMI had more postoperative pain and less improvement after both TKA and THA. Higher age and higher BMI was associated with lower postoperative HRQoL and functioning and more pain after a THA. However, preoperative quality of life, functioning and pain seem to be most consistently associated with outcomes after both TKA and THA.
Our results regarding the effect of preoperative status on outcomes are consistent with other studies that also found that patients with worse preoperative functioning had greater improvements [
18‐
21], but did not achieve the postoperative level of those with higher preoperative functioning [
22‐
28]. Contrary, other studies showed opposite results regarding the direction and size of the effect of age, gender and BMI. Santaguida et al. [
29] performed a systematic review about patient characteristics affecting the prognosis after TKA/THA and concluded that an older age is related to worse functioning, but that age and sex do not influence postoperative pain level. We found that women had more pain after a TKA (4 points on a 100 point scale) and THA (2 points on a 100 point scale), even though this may not be a clinically relevant difference [
30]. For TKA no association with age or gender and functioning was found. In addition, a previous review about prognostic determinants in THA reported that preoperative functioning was most consistently associated with better outcomes [
13]. In addition, another systematic review on preoperative predictors on outcomes in THA [
31] concluded that only patients’ poor preoperative functioning affects the outcome after THA. This was also found for patients with a TKA [
32,
33]. Consistent with our finding, Lingard et al. [
33] found that patients with severe pain had worse outcomes after a TKA. Other studies also identified other determinants, such as radiological scores, severity of inflammations or comorbidities. A disadvantage of using multiple cohorts with different protocols for data acquisition was that we could not include these determinants. The linear mixed model had to be the equal for each cohort, so that regression coefficients in each cohort have the same meaning. Thus the prognostic determinants found in this present study are not exhaustive; there may be other determinants that have an additional effect on the outcome.
The effect of different preoperative determinants on the postoperative outcomes after TKA and THA may seem to be small on itself, but if taken together they may add up to a clinically relevant effect. However, the scenarios should be interpreted with care, because these are hypothetical examples based on observational data and cannot be interpreted causally. The overall effects of the virtual scenarios which were calculated as examples vary between 1.2 and 6.5 points better postoperative outcomes and between 1.6 and 9 points worse postoperative outcomes. These scenarios provide more insights how small differences may add up or cancel each other out. This probably explains why most effects do not reach a clinically significant difference. Usually a 10% difference (i.e. 10 points on a 0–100 scale [
30]) is considered as clinically relevant, but is a 10% difference the right criterion? Postoperative TKA/THA scores increases on average by 20–40 points on a 0–100 scale (results not shown) compared to preoperative scores regardless of the preoperative status. Thus is it realistic to use a difference of 10 points to define whether it is clinically relevant to operate now or wait, based on differences in preoperative determinants?
It is important to realize that the effects found in our study are not only the effect of the surgery, but also the effect of regression to the mean (RTM). RTM occurs because values are observed with random error, such as random fluctuations in a subject [
34]. This means that patients with low preoperative scores are more likely to have higher scores during the next measurement and that patients with high preoperative scores are more likely to have lower scores during the next measurement, even without surgery. This results on average in a larger “improvement” for patients with lower preoperative scores compared to patients with higher baseline scores. Although different methods have been proposed to estimate the size of the RTM effect, but no solution is available to distinguish the real change due to surgery from the change due to RTM. Furthermore, we had to standardize different questionnaires measuring the same domain. Ideally, a minimal dataset should be composed, so that is more easily comparable without the need of standardization since standardized regression coefficients are more difficult to interpret [
35]. A strength of our study is that we pooled existing cohort studies. Most of these studies collected a comprehensive set of questionnaires. Although national arthroplasty registries are established, these registries differ from clinical studies. Most registries focus on long-term data collection and therefore focus on minimal data sets and collect patient and operative information, but not all registries collect patient-reported outcomes [
36]. If registries collect patient-reported outcomes such as HRQoL, function or pain most often short questionnaires are used, with only 1 or 2 questions covering the domain e.g. VAS-scale for pain or the EQ-5D to measure HRQoL. Most of the in our study included cohort studies used more comprehensive questionnaires with the ability to measure each outcome more reliable. On the other hand using questionnaires with composite scores has some weaknesses. Different patients may have very different domain scores but these may still result in the same composite score. In our study we therefore used domain scores of different questionnaires (functioning and pain) besides the overall HRQoL composite score, which may reduce this problem. Another potential problem is that there may be cultural differences between countries in how questionnaires are answered, but this would only influence our results if these cultural differences would affect e.g. elderly patients differently than younger patients thereby resulting in a different relationship of age with the outcomes.
Conclusion
The information regarding the combined effects of preoperative determinants on postoperative outcomes will support orthopaedic surgeons to estimate differences in outcome after a joint replacement for specific patient groups, i.e. poorer outcomes for patients with a worse preoperative status, but with greater postoperative improvement compared to patients with higher preoperative scores. In addition, preoperative status may decline during a long surgical delay period and thereby lead to worse postoperative outcomes if no other non-surgical treatments are started. On the other hand, it may sometimes be better to first optimize the patient’s preoperative condition or to reduce for example their BMI. The present study may support orthopaedic surgeons in their decision making by giving an estimate of the magnitude of the effect for different scenarios. Future studies should combine the results of our study with observational cohort studies among OA patients who did not have surgery yet, specific survival data from medical literature and the effects on survival of the artificial joint to assess optimal timing of surgery. This is needed to assess the long-term impact for the patient of the decision to perform surgery at a certain preoperative state of specific patient groups.
Acknowledgements
We would like to thank the participating hospitals (in alphabetical order): Alrijne Hospital; HagaZiekenhuis; Erasmus MC, University Medical Centre Rotterdam; Kliniek ViaSana; Leiden University Medical Center; Medical Center Haaglanden; JointResearch, Onze Lieve Vrouwe Gasthuis; St. Antonius Ziekenhuis; Spaarne Gasthuis; University Medical Center Groningen; and VieCuri Medical Centre. These hospitals all have supplied anonymous data. We are thankful for the contribution made by the members of the ARGON-consortium for their help and advices (see
www.artroseresearch.nl). We thank dr. Leti van Bodegom-Vos for critically reading and modifying the manuscript.
Collaborating authors
Sita M.A. Bierma-Zeinstra, Martijn van Dijk, Sjoerd Kaarsemaker, Paulien M. van Kampen, Peter A. Nolte, Rudolf W. Poolman, Yvette Pronk, Max Reijman, Martin Stevens, Bregje J.W. Thomassen, Suzan H.M. Verdegaal and Thea P.M. Vliet Vlieland.