Background
What are patient-centered outcome measures and what is their value to healthcare stakeholders?
Disease context: Myelofibrosisis is a rare disease of the bone marrow that disrupts the body’s normal production of blood cells. Sometimes the spleen or the liver takes over some of the blood production; these organs then enlarge which causes abdominal discomfort and pain. Typical symptoms also include feeling of fullness, night sweats and itching. Some patients with myelofibrosis develop leukaemia. |
Research context & question: Phase 3 study assessing efficacy of ruxolitinib for myelofibrosis. Does shrinking a patient’s spleen lead to a patient-meaningful benefit? |
Why add a PCOM? Spleen volume, as such, is a surrogate endpoint that ‘may’ predict treatment benefit, but is not in itself a direct measure of treatment benefit. Thus, as ruxolitinib was being developed, its sponsor chose – after sustained interactions with the U.S. FDA – to supplement the Phase 3 study primary endpoint on the reduction in spleen size with a newly-developed disease-specific patient-reported outcome (PRO) questionnaire (MSAF). |
Result: Using a direct measure of treatment benefit from treated patients proved to be an effective complement to the primary surrogate endpoint to allow for fast regulatory approval and the avoidance of the requirement for additional post-marketing confirmatory trials. Its impact extended to reimbursement and HTA outcomes, where the improvement in disease-related symptoms were considered to be very important (for example in Germany and Canada), as very aligned with patients’ experience and values. This subsequently allowed patients affected with myelofibrosis to gain access to ruxolitinib as a new treatment option. |
Can we really measure what matters and how?
Challenges to outcome measurement in rare diseases
Patients as partners to understand disease burden
Disease context: Idiopathic pulmonary fibrosis (IPF) is a chronic and ultimately fatal disease characterized by a progressive decline in lung function. The term pulmonary fibrosis means scarring of lung tissue and is the cause of worsening dyspnoea (shortness of breath). |
US FDA’s commitment to gain the patients’ perspective: In September 2014, the U.S. FDA held a public meeting to hear perspectives from people living with idiopathic pulmonary fibrosis, its impact on their daily life, and currently available therapies. FDA conducted the meeting as part of the agency’s Patient-Focused Drug Development initiative, an FDA commitment under PDUFA V to more systematically gather patients’ perspectives on their condition and available therapies to treat their condition. At this meeting, patients clearly described the major issues associated with uncontrollable, prolonged episodes of coughing such as: shortness of breath, physical fatigue or overall malaise, and the overall impact on work and home life, including stigma. |
Discordances: While patients with IPF identified cough as a central symptom during an investigation about core outcome parameters, it did not come out of the Delphi panel of 254 medical experts. |
Result: It was recognised that the traditional physiological measures measured in clinical trials, such as forced vital capacity (i.e. the amount of air which can be forcibly exhaled from the lungs after taking the deepest breath possible) do not fully capture the potential benefits of a treatment that would matter to individuals affected by IPF. Although cough and fatigue are great concerns in IPF patients, traditional outcome measures have omitted to capture them adequately. |
-
In-depth pre-trial concept elicitation patient interviews to enable extensive exploration of the disease experience, such as the most significant symptoms and overall disease impact on daily life;
-
Interviews in a clinical trial setting (e.g. study exit interviews or the so-called ‘subject experience interviews’ that are spread across the duration of an investigational trial) can bring insightful information on how patients define ‘improvement’ and ‘treatment benefit’;
-
Focus groups provide a platform for patients to interact and to compare their experiences;
-
Use of internet and social media;
-
Direct observation allows researchers to ‘shadow’ patients while doing day-to-day activities to gain first-hand experience through observation of what it means to have a rare disease. This method can be conducted in conjunction with patient interviews to provide rich data; and
-
Audio/written diaries provide rare disease patients with an immediate medium to record their experiences.
Disease context: Phenylketonuria (PKU) is a genetic disorder characterised by a deficiency of the hepatic enzyme, phenylalanine hydroxylase; left untreated, it can lead to intellectual impairment, deficit in cognitive functions, seizures, behavioural problems and psychiatric symptoms. |
Study & methods: To develop a new PRO instrument based on a conceptual model of PKU through exploratory interviews with clinical experts, adults and adolescents with PKU, and parents with young children with PKU. |
Result: The conceptual model of PKU impact included health status (cognitive function, symptoms, monitoring); psychological function; social function; and diet. Potential mediators of disease impact included adherence, coping, social support, and other sociodemographic characteristics. For illustration purposes, the conceptual model is available in Additional file 1. |
Disease context: Duchenne muscular dystrophy (DMD) usually presents itself as muscle weakness at around the age of four in boys, which then rapidly deteriorates. Typically muscle loss occurs first in the upper legs and pelvis followed by those of the upper arms. Many are unable to walk by the age of 12 years. |
Drug trial focus: Until recently, the focus of drug trials in DMD has been the ambulant stage of the disease. Motor function assessment has been the main focus, with the use of the 6-min walk test and ClinROs, such the North Star Ambulatory Assessment. |
A shift in focus: Since the average age at loss of ambulation is ca. 10.5 years and median survival is 30 years, most individuals affected by DMD are non-ambulant. Under the leadership of the Netherlands-based advocacy group Duchenne Parent Project, a multidisciplinary and multi-stakeholder group identified the need for novel outcome measures for use across the whole spectrum of DMD patients. They developed the Performance of the Upper Limb module (PUL), a ClinRO designed specifically for DMD. |
Adding the patient voice: In addition to the PUL, this group recognised the need to develop in parallel a patient-reported outcome measure to complement information on daily living that cannot otherwise be observed in a clinical or research setting and focusing on outcomes that are meaningful to patients. As boys and young men with DMD were interviewed in that context, they confirmed that what mattered to them included: ‘to be able to put their arms on the table’, ‘to retain the ability to use a computer keyboard’, ‘to brush their teeth’, ‘to pour a drink’ etc. – in other words, their hopes focused on retaining upper body function; not necessarily to see improvements in their ability to walk. An example of such a patient interview (for the Upper Limb PRO) is now available online [86]. |
When should we select, adapt or develop new PCOMs?
From clinical concepts to measurement: The power of mixed methods psychometric research in rare diseases
The routes to PCOMs in rare diseases
Disease context: Amyotrophic lateral sclerosis (ALS) is a neurological disease that attacks the motor neurons, the cells that the brain uses to keep muscles moving. Over the course of three to five years, people with ALS progressively lose the ability to move their fingers and toes, their arms and legs. Then they lose the ability to speak, to turn their head, and to swallow food. When the diaphragm and chest muscles give out, they can no longer breathe and die. |
A legacy instrument: The Amyotrophic Lateral Sclerosis Functional Rating Scale – Revised (ALFRS-R) is an established rating scale for measuring the global function of patients with ALS. |
Gaps identified and troubleshooting: When Cathy (a research psychologist affected by advanced ALS) came to complete the ALSFRS-R she was frustrated that despite her ability to participate in family life and write poetry (with the aid of assistive technology, such as an eye tracking machine and a computer to communicate), the scale reflected her as ‘a zero’. When answering the questionnaire: - ‘Compared to the time before you had symptoms of ALS [...] have you noticed any changes in your speech?’ She could no longer speak. Zero point. - ‘Have there been any changes in your ability to swallow?’ She hadn’t swallowed in years. Zero point. - ‘Has your ability to walk changed?’ She could not walk or move her legs. Zero. etc. |
Resolution: Though a valuable rating instrument, ALFRS-R was deemed not fit-for-purpose in advanced stages of disease. In response, Cathy reached out to the online community PatientsLikeMe to develop new items with input from over 300 ALS patients. Three new items were selected, relating to: the ability to show emotional expression in the face, the ability to use fingers to manipulate devices, and ability to get around inside the home. Subsequent research using Rasch analysis confirmed that a refinement of ALFRS-R was required. The ALSFRS Extension is now used in ALS research. |
Objective: To develop a new measure suitable to cover all the aspects of upper limb function – a concept valued across the whole spectrum of DMD patients (i.e. from younger ambulant to older weaker adults who may only have limited finger movements). |
An iterative and multi-stakeholder process: Development of the PUL involved several steps: [1] A systematic review was performed to identify existing measures assessing upper extremity functional aspects used in DMD. Only four ClinROs were found to have been previously used in DMD; [2] An exploratory study was performed to assess the suitability of the existing scales across 61 DMD patients aged 11–30 years. The study identified shortcomings related to posture, pattern of weakness and contractures requiring compensatory strategies; [3] A conceptual model reflecting the progression of weakness and natural history of functional decline in DMD was hypothesized during a multi-stakeholder workshop. Functional tasks were subdivided into three main levels reflecting disease progression from proximal to distal and different stages of the disease: shoulder dimension, elbow dimension, and wrist and finger dimension; [4] An initial set of items was determined based on expert opinion, input from patients and families. Items were refined, added, or eliminated based on feedback; [5] An iterative consultative process with patients, families as well as experts ensured that items in PUL were clinically meaningful and relevant to DMD. Patients and families identified gaps in the proposed assessment; [6] A preliminary pro forma was developed and piloted in 86 patients across seven international sites in Europe and the USA; [7] Rasch analysis was used to create a scale and to review item fit to the underlying construct. A revised version of the PUL including 22 items and a manual were developed and agreed by all the participants. The PUL continues to be reviewed. |
Impact: A multi-stakeholder collaboration, where patients with DMD and their families had a prominent role, was key to the successful development of the PUL. Modern psychometric methods were used to create a scale with robust internal reliability and validity. |
Moving beyond the standard: Many PCOMs, such as ClinROs or PROs, typically include a standard set of items (or tasks) each rated on a standard set of response options, regardless of the relevance of specific items to each individual patient. When rare disease patients are in very different stages of their disease or when a rare disease is ultra-rare and affects a handful of individuals worldwide, these types of instrument may not have sufficient discriminatory capacity to detect change in clinically meaningful dimensions that are important to patients. In other words, a health outcome or an improvement that is relevant or resonates with one patient, may not with another. Two alternatives currently stand out: Goal Attainment Scaling and Computer Adaptive Testing. |
Goal Attainment Scaling (GAS): GAS allows patients and their treating professionals to work together to identify individual treatment goals that have the greatest relevance. A key feature of GAS is the ‘a priori’ establishment of criteria for ‘successful’ outcomes, which are agreed with the patient and family before a health intervention starts so that everyone has a realistic expectation of what is likely to be achieved and agrees that this would be worth striving for. An example of GAS for use in haemophilia (named GOAL-Hem) covers four broad categories: managing haemophilia (e.g. being able to administer factor), haemophilia complications (e.g. bleeds, pain, joint problems), impact on activities, and impact on emotions and relationships. The applicability of each goal area is determined for different age groups (i.e. adults, adolescents, children). For instance, a common goal for paediatric patients (aged <15) is to become competent and responsible for self-infusion of factor concentrate. This goal area can be selected, current baseline ability assessed and quantifiable degrees of improvement described (a priori) to define potential outcomes. |
Computer Adaptive Testing (CAT): Whilst CAT has been used most notably in educational testing, the approach has more recently been applied to health outcomes, such as the Patient Reported Outcomes Measurement Information System (PROMIS®) measures and the European Organisation for Research and Treatment of Cancer CAT (EORTC CAT). In CAT, the computer administering the ‘test’ selects questions or ‘items’ from an item bank based on a patient’s response to previously answered questions. Although patients receive different questions based on their individualized responses, scores are standardized and can be compared using a common scale. The goal of CAT is to improve measurement precision for each individual for the specific domain of interest (e.g. physical functioning, depression) being measured using the least number of items. |
Prospects: GAS and CAT are promising methodologies. But, nonetheless, in relation to PCOM, these are still in their infancy. Whilst GAS has the potential for greater relevance sensitivity over standard measures, the appropriateness of comparing scores between patients has not been proven yet, and is in fact a real challenge. Alternatively, CAT provides a common frame of reference for direct measurement comparability between patients, but items are selected by the computer algorithms with no recourse to patient preferences. One promising initiative developed for visually impaired patients may provide a bridge between GAS and CAT. As such, the Activity Inventory (AI) is an adaptive visual function CAT that consists of 459 tasks grouped into 50 goals. Visually impaired patients rate the importance of each goal, allowing for a CAT to deliver an individually tailored set of items specific to patients. |