Capturing disease activity in multiple sclerosis (MS) trials is a challenge and traditional outcome measures all have clear limitations. |
Newer measures are being developed and increasingly used in trials. |
Multidimensional outcome measures are promising because they have the potential to capture the full extent of disease activity by assessing various functional domains relevant for MS. |
1 Background
Age at onset of MS (years) | Optic neuritis | Diplopia or vertigo | Acute motor symptoms | Insidious motor symptoms | Balance or limb ataxia | Sensory symptoms |
---|---|---|---|---|---|---|
<20 | 23 | 18 | 6 | 4 | 14 | 46 |
20–29 | 23 | 12 | 7 | 6 | 11 | 52 |
30–39 | 13 | 11 | 7 | 14 | 15 | 44 |
40–49 | 9 | 17 | 3 | 31 | 13 | 33 |
≥50 | 6 | 13 | 4 | 47 | 11 | 32 |
Primary outcome measures | |
---|---|
Clinical | Expanded Disability Status Scale (EDSS): 3 or 6 months confirmed disability worsening or improvement |
Relapses: annualized relapse rate, time to second relapse (conversion to clinically definite MS) |
Secondary outcome measures | |
---|---|
Clinical | MS Functional Composite (MSFC): timed 25-foot walk test, nine-hole peg test, paced auditory serial addition task or symbol digit modalities test |
Paraclinical | T2-hyperintense lesions |
Gadolinium-enhancing T1 lesions | |
Whole brain atrophy |
Exploratory outcome measures | |
---|---|
Clinical | As candidate component of MSFC: low-contrast letter acuity test
|
Patient-reported outcome measures: e.g. quality of life, depression and anxiety, fatigue, specific functional domains | |
Paraclinical—imaging | Volumetric measures of specific structures (e.g. thalamus, upper cervical cord area) |
Persisting black holes | |
Functional MRI for analysis of functional connectivity | |
Diffusion tensor imaging to examine brain tissue integrity | |
Magnetization transfer ratio MRI as a marker for brain myelin content | |
Optical coherence tomography | |
Paraclinical—biomarkers | Biomarkers in body fluids: in CSF or blood |
Composite | No evidence of disease activity (NEDA): typically covering (confirmed) EDSS progression, relapse rate and formation of MRI lesions; whole brain volume increasingly included (i.e. ‘NEDA-4’) |
Electronic devices | Assess MS system, Glove analyzer, accelerometers, etc. |
2 Clinical Outcome Measures
2.1 The Expanded Disability Status Scale
2.1.1 Limitations and Caveats
Limitations and caveats | Improvements |
---|---|
Expanded Disability Status Scale (EDSS)
| |
High intra- and inter-observer variability Non-linearity (bimodal distribution) Limited responsiveness Necessity to use non-parametric statistics (ordinal scale) Uneven distribution of relapsing–remitting and progressive patients Several functional domains not assessed | Accounting for baseline score when determining change (e.g. change ≥1.0 with baseline score 0–5.5, and ≥0.5 for higher baseline scores) Determining disability worsening with confirmation of the EDSS progression after at least 6 months Using standardized scripts for questioning patients (improving reliability and decreasing risk of unblinding) Simplification of scoring rules (decreasing variability) Streamlining by stripping components of the functional systems that are less informative Modification to improve linearity and facilitate statistical analysis |
Relapses
| |
Strong subjectivity Recovery of signs or symptoms before confirmation of relapse Recall bias of patient and observer bias of examiner Newly reported symptoms not always clearly depicted in change of the EDSS Identification largely depends on patient reporting it Higher relapse rate prior to inclusion: over-reporting to fulfil inclusion criteria, high relapse rate inclusion criterion leading to decrease of relapse rate because of regression to the mean, placebo effect, decrease of relapse due to natural course of MS | Confirming a relapse by another examiner Increasing number of visits to identify more relapses |
Multiple Sclerosis Functional Composite (MSFC)
| |
Moderate reliability, sensitivity and responsiveness of the PASAT The PASAT often disliked by patients, requirement of mathematical ability and ceiling effect Several important functional domains are not assessed Lack of a clear dimension of the overall score (resulting in difficult interpretability) Z scores are influenced by results of the reference population and obscure the meaning of crude scores | Replacing the PASAT with the symbol digit modalities test Adding the low-contrast letter acuity test (covering visual domain) Adding other functional domains Determining minimal clinically relevant changes of the Z scores and confirming change after 6 months Determining clinical relevance Keeping elements separated instead of combining them into a single score |
Patient-reported outcome measures (PROM)
| |
Unblinded nature Potential expectance bias Assessment of quality of life may be influenced by multiple factors Possible response shift over time | Weighing of individual questions appropriately Using (computer) adaptive testing to reduce test length and improve tolerability |
2.1.2 Suggested Improvements
2.2 Relapses
2.2.1 Limitations and Caveats
2.3 The Multiple Sclerosis Functional Composite
2.3.1 The Original Components
Original components | |
---|---|
Timed 25-foot walk test (T25W) | The patient is directed to one end of a clearly marked 25-foot course and is instructed to walk 25 feet as quickly as possible, but safely. The task is immediately administered again by having the patient walk back the same distance. Patients may use assistive devices when doing this task. In clinical trials, it is recommended that the treating neurologist select the appropriate assistive device for each patient [42] |
Nine-hole peg test (9HPT) | The patient is asked to take nine small pegs one by one from a small shallow container, place them into nine holes and then remove them and place them back into the container. Results are depicted in seconds to complete the task of both the dominant and non-dominant hand; two trials for each side [42] |
Paced auditory serial addition task (PASAT) | The PASAT is presented on audiocassette tape or compact disc to control the rate of stimulus presentation. Single digits are presented either every 3 s (or every 2 s for the optional 2-second PASAT) and the patient must add each new digit to the one immediately prior to it. The test score is the number of correct sums given (out of 60 possible) in each trial. To minimize familiarity with stimulus items in clinical trials and other serial studies, two alternate forms have been developed; the order of these should be counterbalanced across testing sessions. The PASAT is the last measure of the MSFC that is administered at each visit [42] |
Candidate components | |
---|---|
Symbol digit modalities test (SDMT) | Patients are presented with a key that includes nine numbers, each paired with a different symbol. Below this key is an array of these same symbols in pseudo-random order paired with empty spaces. Patients must then provide the correct numbers that accompany the symbols as indicated in the key [64] |
Low-contrast letter acuity test (LCLA) | Seven charts with different levels of contrast (0.6–100%) are presented to the patient. On each chart, multiple rows are depicted with gray letters with decreasing size on a white background. The letter scores indicate the number of letters identified correctly. Each chart is scored separately |
2.3.2 Candidate Components
2.3.3 Limitations and Caveats
2.4 Patient-Reported Outcome Measures
Measure |
---|
Quality of life |
MS Quality of Life-54 [103] |
MS Quality of Life Inventory [86] |
European Quality of Life-5D [87] |
Health Utilities Index Mark 3 [87] |
World Health Organization Quality of Life Brief Form [100] |
Sickness Impact Profile [83] |
Life Satisfaction Questionnaire [96] |
Hamburg Quality of Life Questionnaire in MS [91] |
Quality of Life Index [85] |
Leeds MS Quality of Life Scale [90] |
Disability and Impact Profile [101] |
The MS International Quality of Life Questionnaire [102] |
Functional Assessment of MS [84] |
Depression and anxiety |
Beck Depression Inventory [82] |
Patient Health Questionnaire-9 [95] |
Hospital Anxiety and Depression Scale [94] |
Fatigue |
Modified Fatigue Impact Scale [89] |
Fatigue Impact Scale for Daily Use [88] |
Single functional domain |
MS Walking Scale-12 [93] |
Arm Function in MS Questionnaire [98] |
Visual Function Questionnaire-25 [99] |
Multiple domains |
Short Form-36 [104] |
MS Impact Scale-29 [92] |
Guy’s Neurological Disability Scale [97] |
MS Impact Profile [105] |
3 Paraclinical Outcome Measures
3.1 Magnetic Resonance Imaging
3.1.1 White Matter Pathology
3.1.2 Atrophy
3.1.3 Persisting Black Holes
3.2 Optical Coherence Tomography
3.3 Biomarkers in Body Fluids
4 No Evidence of Disease Activity
5 Future Perspectives
Interpretation may not be straightforward, particularly if clinical relevance of (some) components are not immediately obvious |
An overall score lacks a clear dimension, which complicates the interpretability of the score |
Components should be normalized or weighted without obscuring the clinical meaning |
Components may shift in opposite directions (improvement vs harm) which might obscure interpretation of treatment efficacy |
Components should capture the expected (biological) effects of the intervention under investigation |
Increasing the number of components not necessarily increases sensitivity |
Redundant components might cause a large change in the composite score in patients that have symptoms in that domain, while the change may be smaller or absent in patients with symptoms in other domains |
Increasing sensitivity to change does not necessarily lead to higher sensitivity for treatment effects |
Dichotomization of the results (e.g. ‘no evidence of disease activity’) will inherently cause loss of information |