Data were collected in the context of routine clinical practice at three sites (ATI Physical Therapy, in partnership with Greenville Health Systems) in South Carolina. As part of ongoing quality improvement efforts for patients undergoing knee arthroplasty, outcomes data were recorded on a semi-weekly basis throughout postoperative rehabilitation. Therapists were trained in the standardized application of outcomes assessments. The postoperative rehabilitation regime was also standardized across clinic locations and therapists. Outcomes data were compiled in a quality improvement (QI) database, housed at the University of Colorado Denver, using Research Electronic Data Capture (REDCap), a secure web-based software for database development. All analyses complied with a non-human subject research designation and were approved by the Colorado Multiple Institutional Review Board (COMIRB #: 15–1797).
Patients
For the purposes of this analysis, the QI database was queried for all available (de-identified) patient records. At the time of data extraction, a total of 897 patient records were available, with surgical dates between January 2013 and May 2016. However, 321 records could not be used because they lacked postoperative flexion AROM data. This was not unexpected, as patients commonly travel to the surgical center for preoperative consultation and surgery but subsequently undergo postoperative rehabilitation at a clinic closer to home. An additional 59 records were excluded because patients underwent a procedure other than primary unilateral TKA (17 patients underwent revision arthroplasty and 42 patients underwent unicompartmental arthroplasty). For the remaining records, postoperative flexion AROM data were available between 2 and 857 days (median 36 days) following surgery. We utilized the first 120 postoperative days for this project. We reasoned that rehabilitation typically occurs during this time window, and recovery plateaus between months 1 and 3 following surgery [
16]. Thus, we felt a reference chart describing recovery over the first 120 days would adequately capture the relevant time frame. A total of 498 patient records (1550 observations of flexion AROM) were used for this analysis.
Knee flexion active range of motion (AROM)
Knee flexion AROM (in degrees) was measured in a supine position via long-arm goniometry (see
supplementary material). Briefly, patients were allowed to practice bending their knee 5 times, with therapist-assist as needed, prior to the therapist making the final assessment. For the final assessment the knee was placed in extension, and the patient was instructed to flex the knee as far as possible using only muscle power, leaving the heel on the surface. The fulcrum of the goniometer was placed at the medial joint line, with the lateral malleolus of the fibula and greater trochanter of the femur as distal landmarks [
17]. Physical Therapists were trained on a quarterly basis in this protocol, to standardize the collection of outcomes measures. Flexion AROM was measured on a semi-weekly basis throughout postoperative rehabilitation.
Reference chart development
Using data from the development set, a series of statistical models were examined describing the variation of flexion AROM over the first 120 days following surgery. Generalized Additive Models for Location Scale and Shape (GAMLSS, version 4.4.0) [
20] were used to obtain estimates of the median and other fitted centiles as smooth functions of the measurements in days. In GAMLSS, a variety of distributions can be used to fit the mean/median, variance, skewness, and kurtosis of the outcome. We selected 6 candidate distributions, of increasing complexity, for which to model knee flexion AROM. The Normal (NO) and Gamma (GA) distributions modeled 2 parameters (the median and variance) of the outcome. The t-family (TF) and Box-Cox Cole and Green (BCCG) distributions modeled the median, variance, and skewness of the outcome. The Box-Cox t distribution (BCT) and Box-Cox Power Exponential (BCPE) distributions modeled the median, variance, skewness and kurtosis of the outcome [
21,
22]. Model fit was adjudicated numerically by the Schwarz Bayesian Criterion (SBC) [
23]. To protect against over-fitting, we also calculated the Mean Square Error (MSE) via 5-fold cross validation of each model (i.e. by developing the model in 80% of the development set and testing the model in the left-out 20%). Based on these metrics, we pursued model selection by the following approach:
First, we examined whether fitting cubic splines for each of the different parameters (i.e. median, variance, skewness, kurtosis) improved model fit. Next, we optimized: 1) the number of knots specified in splines for each parameter, and 2) the power-transformation of time, using the “find.hyper” function in GAMLSS. We then constructed reference charts and calculated the percentage of observed values captured below each of the specified centiles (5th, 10th, 25th, 50th, 75th, 90th, and 95th centile). Of the candidate models, the best solution would minimize the SBC, demonstrate low MSE by within-sample cross validation, and accurately describe percentiles in the dataset, both within the development set as well as when applied to the test dataset (e.g. 5% of the observed data would be captured below the 5th percentile, 10% below the 10th percentile, etc.).
Preliminary validation
Reference chart performance was examined by applying the reference curves to a test set of patients with later surgical dates. This approach was designed to mimic the process for development and subsequent use of the reference chart in practice. The accuracy with which the reference curves fit the new data was examined by z-test for proportions, and the average bias (difference between predicted and observed values) was calculated. Ideal performance would be reflected by accurate representation of the test set data (e.g., 5% of the observed data captured below the 5th percentile, 10% below the 10th percentile, etc.), and zero bias.