Background
Psoriatic arthritis (PsA) is a chronic, systemic, immune-mediated inflammatory disease with multiple disease manifestations, including peripheral arthritis, enthesitis, dactylitis, spondylitis, and psoriatic skin and nail disease [
1‐
3]. Owing to the multiple diverse disease manifestations involved in PsA, the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) bases its treatment recommendations on the domains affecting an individual [
1]. Consequently, composite endpoints, which allow the assessment of multiple clinical outcomes in a single instrument, have been suggested to be particularly useful to assess changes in the multiple disease domains of PsA over time [
3,
4]. Composite endpoints also have the potential to simplify statistical testing in clinical trials as a summary or total score is usually generated, thus requiring only a single hypothesis test, thereby avoiding issues with multiplicity and allowing for appropriate statistical power with relatively small numbers of patients [
5].
A number of composite endpoints have been developed for PsA in order to assess multiple aspects of disease activity and identify patients who have achieved treatment targets of remission or minimal disease activity (MDA). Available instruments incorporate different types of assessments, including clinical (for example, tender and swollen joint counts [TJC and SJC]), laboratory (for example, C-reactive protein [CRP]), and patient-reported outcome (PRO) (for example, Health Assessment Questionnaire-Disability Index [HAQ-DI]) endpoints. Although there is no clear agreement on a standardized composite assessment approach that provides the optimal combination of individual variables [
6], agreement has now been reached on a core domain set of variables that should be included [
7].
Tofacitinib is an oral inhibitor of the Janus kinase (JAK) family for the treatment of PsA. Tofacitinib preferentially inhibits signaling via JAK3 or JAK1 (or both) with functional selectivity over JAK2 [
8]. The efficacy and safety of tofacitinib 5 and 10 mg twice daily (BID) have been demonstrated in patients with PsA with an inadequate response to conventional synthetic disease-modifying anti-rheumatic drugs (csDMARDs) in Oral Psoriatic Arthritis triaL (OPAL) Broaden [
9] and in patients with PsA who were tumor necrosis factor inhibitor (TNFi)-inadequate responders (IRs) in OPAL Beyond [
10]. In both studies, tofacitinib had greater efficacy than placebo on the basis of the primary endpoints: a higher proportion of patients receiving tofacitinib than placebo achieved greater than or equal to 20% improvement according to the criteria of the American College of Rheumatology (ACR20 response) at month 3, and the mean change from baseline to month 3 in HAQ-DI score was greater in patients receiving tofacitinib versus placebo at month 3. In addition, between 21% and 26% of patients receiving tofacitinib and between 7% and 15% of patients receiving placebo had MDA responses at month 3 in OPAL Broaden and OPAL Beyond [
9,
10]. This analysis evaluated the effect of tofacitinib on three disease-specific composite endpoints in patients with PsA by using data from the two placebo-controlled, double-blind, multicenter, global phase 3 studies of tofacitinib detailed above: OPAL Broaden and OPAL Beyond [
9,
10].
Methods
Patients
Details of patient populations and study designs for both OPAL Broaden (A3921091; ClinicalTrials.gov Identifier: NCT01877668) and OPAL Beyond (A3921125; ClinicalTrials.gov Identifier: NCT01882439) have been published in detail [
9,
10]. In brief, for inclusion in either OPAL Broaden or OPAL Beyond, patients were required to have active PsA with a duration of at least 6 months, to fulfill ClASsification criteria for Psoriatic ARthritis (CASPAR) at screening, and to have evidence of active arthritis with both a TJC and SJC of three or higher. Patients in OPAL Broaden had an inadequate response to at least one csDMARD and were TNFi-naïve, whereas patients in OPAL Beyond had an inadequate response to at least one TNFi. The primary endpoints in both studies were ACR20 response rate and change from baseline in HAQ-DI score at month 3.
Study design
OPAL Broaden was a 12-month study in which patients were randomly assigned 2:2:2:1:1 to receive tofacitinib 5 mg BID, tofacitinib 10 mg BID, adalimumab 40 mg subcutaneous (SC) injection once every 2 weeks (Q2W), placebo advancing to tofacitinib 5 mg BID at month 3, or placebo advancing to tofacitinib 10 mg BID at month 3. OPAL Beyond was a 6-month study in which patients were randomly assigned 2:2:1:1 to receive tofacitinib 5 mg BID, tofacitinib 10 mg BID, placebo advancing to tofacitinib 5 mg BID at month 3, or placebo advancing to tofacitinib 10 mg BID at month 3. In both studies, patients also received one concomitant treatment with a stable dose of either methotrexate or another csDMARD (for example, sulfasalazine or leflunomide).
Assessments
Three disease-specific composite endpoints are discussed in this analysis. The Psoriatic Arthritis Disease Activity Score (PASDAS) (score range of 0–10) includes the following components: patient’s global joint and skin assessment (visual analog scale; VAS [in millimeters]); physician’s global assessment of PsA (VAS [in millimeters]); SJC (66 joints) and TJC (68 joints); Leeds Enthesitis Index (LEI) score; tender dactylitic digit score; physical component summary (PCS) score of the 36-item short-form survey version 2 (SF-36v2 acute, norm-based scores); and CRP (in milligrams per liter) (Table
1) [
11]. The Disease Activity Index for Reactive Arthritis/Psoriatic Arthritis (DAREA/DAPSA) (score range not defined; referred to as DAPSA herein) includes the components SJC (66 joints) and TJC (68 joints); patient’s global assessment of arthritis and patient’s pain assessment (both measured by VAS [in millimeters]); and CRP (in milligrams per liter) (Table
1) [
6]. The Composite Psoriatic Disease Activity Index (CPDAI) (score range of 0–15) includes the components peripheral arthritis (SJC, TJC, and HAQ-DI); skin disease (Psoriasis Area and Severity Index [PASI] and Dermatology Life Quality Index [DLQI]); enthesitis (LEI score and HAQ-DI); dactylitis (number of digits and HAQ-DI); and spinal disease (Bath Ankylosing Spondylitis Disease Activity Index and Ankylosing Spondylitis Quality of Life [ASQoL]) (Table
1) [
12]. For each of these composite endpoints, a higher score indicates higher disease activity. For comparison, a non-disease-specific composite outcome measure was also assessed: the three-component Disease Activity Score using 28 joints with CRP (DAS28–3 [CRP]; score range of 0–9.4, a higher score corresponds to worse symptoms) includes the components SJC (28 joints) and TJC (28 joints) and CRP (in milligrams per liter) (Table
1) [
13].
Table 1
Components of the composite endpoints PASDAS, DAPSA, CPDAI, and DAS28–3(CRP)
Statistical analysis
The full analysis set (FAS) comprised all patients who were randomly assigned to the study and received at least one dose of study medication. Changes from baseline analyses were based on a repeated measures model, without imputation for missing values in the FAS, with the fixed effects of treatment, visit, treatment-by-visit interaction, geographic location, and baseline value; an unstructured covariance matrix was used. For results up to month 3, patients randomly assigned to the two placebo sequences were combined into a single placebo group. The repeated measures model included data from all visits up to month 3 for the treatment groups of tofacitinib 5 mg BID, tofacitinib 10 mg BID, adalimumab 40 mg SC Q2W (OPAL Broaden only), and placebo. For results beyond month 3 to the end of study, the two placebo sequences were analyzed separately. The calculation of effect sizes and standardized response means for treatment groups of tofacitinib 5 mg BID, tofacitinib 10 mg BID, and adalimumab (OPAL Broaden only) at months 3, 6, and 12 (OPAL Broaden only at month 12) was based on patients with greater than or equal to 3% baseline psoriasis body surface area (BSA) in the FAS in order to permit comparison based on the same set of patients, with no missing values for any of the three disease-specific composite endpoints at baseline or months 3, 6, and 12 (OPAL Broaden only at month 12).
The effect size for a given composite endpoint at a time point was defined as (mean at baseline – mean at time point)/(standard deviation [SD] at baseline). The standardized response mean for a given composite endpoint at a time point was defined as (mean at baseline – mean at time point)/(SD of change from baseline at time point). Effect size and standardized response mean are unitless measures and are adjusted for the endpoints’ variability, which allows comparisons to be made. For both effect sizes and standardized response means, levels of responsiveness have been proposed as small (≥0.20 to <0.5), moderate (≥0.50 to <0.8), and large (≥0.80), respectively [
3,
14].
In order to investigate the relative strength of the composite endpoints in predicting MDA response at a given time point, multiple logistic regression was used to model MDA response as a dependent variable and the mean changes from baseline of the three disease-specific composite endpoints at the same time point as predictors. The estimated slope coefficient from this regression model is the change in log-odds of MDA response resulting from a 1-unit increase in change from baseline of the composite endpoint. It represents the strength of association between the composite endpoint and MDA response and is standardized (STB, range unbounded) to adjust for the variability of the composite endpoint to permit comparison of their associations with MDA response. In order to compare the correlations of the three disease-specific composite endpoints with MDA response, another standardized measure related to STB above, called logistic pseudo partial correlation (denoted as R, range of −1 to 1), was also calculated [
15]. A value of R closer to 1 or −1 indicates strong correlation, whereas a value of 0 indicates lack of correlation. This regression analysis was performed separately for months 3, 6, and 12 (OPAL Broaden only for month 12) and separately for tofacitinib 5 mg BID, 10 mg BID, and adalimumab 40 mg SC Q2W. These analyses included the same set of patients with baseline psoriasis BSA of greater than or equal to 3% in the FAS with no missing values for any of the three disease-specific composite endpoints and MDA at months 3, 6, and 12 (OPAL Broaden only at month 12). MDA was defined as any five of the following seven criteria being met: TJC ≤1, SJC ≤1, PASI score ≤1 or psoriasis BSA ≤3%, patient arthritis pain (VAS) ≤15 mm, patient’s global assessment of arthritis (VAS) ≤20 mm, HAQ-DI ≤0.5, tender entheseal points (using LEI) ≤1 [
16].
The PASDAS response rate was calculated at months 3, 6, and 12 (OPAL Broaden only for month 12) as the percentage of patients who had a good response (defined as a PASDAS score of less than or equal to 3.2 and a decrease from baseline in PASDAS score of greater than or equal to 1.6 at the relevant time point for patients with baseline PASDAS score of greater than 3.2 in FAS) [
17]. Non-responder imputation was applied, and a missing response was treated as non-response.
The derivation of the composite endpoints was pre-specified in the original study protocols and statistical analysis plans; except for analysis using a repeated measures model, all analyses were performed post hoc. P values are reported for comparisons with placebo in repeated measures model analyses and for testing slope coefficients in multiple logistic regression analyses without adjustment for multiplicity. The significance level was set at two-sided, less than or equal to 0.05.
Discussion
In the phase 3 studies OPAL Broaden and OPAL Beyond, patients with active PsA receiving tofacitinib 5 and 10 mg BID showed improvements versus placebo throughout the 3-month placebo-controlled period for the composite endpoints assessed. These improvements were subsequently maintained to month 6 in OPAL Beyond and month 12 in OPAL Broaden. Adalimumab had comparable efficacy to tofacitinib across the composite endpoints in OPAL Broaden.
OPAL Broaden and OPAL Beyond involved two distinct populations of patients with PsA: csDMARD-IR/TNFi-naïve patients in OPAL Broaden and TNFi-IR patients in OPAL Beyond. Despite the difference in patient populations, baseline values for the composite endpoints were broadly similar across studies and treatments. Generally, LS mean changes from baseline were greater, and the effect size and standardized response mean were higher, in the OPAL Broaden study compared with OPAL Beyond. This suggests that the TNFi-naïve patients in OPAL Broaden showed more marked treatment responses than the TNFi-IR patients in OPAL Beyond, similar to previous reports for PsA treatment [
18‐
20].
PASDAS baseline scores in OPAL Broaden were comparable with values reported in an equivalent study population [
3]; however, along with the PASDAS baseline scores in OPAL Beyond, they were somewhat higher than those reported in a study of standard care [
21] and patients in clinical practice [
22]. In the GRACE (GRAPPA Composite Exercise) study, designed to develop composite disease activity and responder measures for PsA, a mean score of 5.30 for PASDAS was reported for patients changing treatment and this was taken as a surrogate for high disease activity [
11]. The mean baseline PASDAS levels reported in this study were therefore suggestive of high disease activity in both OPAL Broaden and OPAL Beyond, and following 3 months of treatment, PASDAS levels dropped below this threshold. In addition, the GRACE study defined a good response as a PASDAS score of less than or equal to 3.2, following a decrease in score of greater than or equal to 1.6 from baseline [
17]; in this study, this was achieved at month 12 in OPAL Broaden by 44.2% and 47.5% of patients receiving tofacitinib 5 and 10 mg BID, respectively, and at month 6 in OPAL Beyond by 28.5% and 28.9% of patients receiving tofacitinib 5 and 10 mg BID, respectively. Of note, a PASDAS score of less than or equal to 3.2 has been defined as low disease activity [
17] and less than or equal to 1.9 as very low disease activity [
23].
OPAL Broaden DAPSA baseline scores were slightly lower than baseline scores in an equivalent study population [
3] but higher than reported in clinical practice [
24]. In the GRACE study, patients changing treatment (considered to have high disease activity) had a mean DAPSA score of 41.91 [
11], suggesting that patients in OPAL Broaden and OPAL Beyond had high levels of disease activity. Indeed, in a recent study analyzing data from 30 patients with PsA in an observational database, the cutoff for a DAPSA score indicating high disease activity was greater than 28 [
25]. In this study, mean DAPSA scores were below the high disease activity score reported in the GRACE study after 3 months of active treatment in all groups [
11].
In contrast to the findings with the other composite measures, the baseline CPDAI scores reported for OPAL Broaden and OPAL Beyond were somewhat lower than mean CPDAI score of 11.65 reported for patients changing treatment (surrogate for high disease activity) in the GRACE study [
11]; thus, CPDAI scores did not appear to indicate patients with high baseline disease activity in these patient populations. However, another study has suggested a high disease activity threshold of greater than 7 for CPDAI [
26]; mean CPDAI scores were below this threshold after 3 months of active treatment across all groups and both studies.
The DAS28–3(CRP) was included for comparative purposes only. Baseline DAS28–3(CRP) scores were somewhat higher than the mean DAS28–3(CRP) score of 3.96 observed for patients changing treatment (a surrogate for high disease activity) in the GRACE study [
11]; however, DAS28–3(CRP) scores in this study were reduced below this level following 3 months of treatment. It should be noted, however, that this measure was developed and validated for rheumatoid arthritis and there are several reasons why it is inappropriate as a composite measure for assessing PsA, particularly as it measures only articular outcomes and excludes joints of the foot and ankle, potentially missing important inflammatory disease [
27].
All reported effect size and standardized response mean values were greater than 0.80, the value generally taken to indicate a large treatment effect or response [
3]. The largest effect size was observed at all time points and treatments for the composite endpoint PASDAS; this is consistent with findings reported for golimumab [
3]. Effect size and standardized response mean generally showed increases with time on treatment, indicating that the composite endpoints demonstrated time-dependent improvement, as might be expected. Analysis of the percentage of PASDAS responders over time also demonstrated the ability of the PASDAS instrument to detect treatment-related changes in PsA disease activity.
The definition of MDA using the criteria applied in this analysis and in previous tofacitinib publications [
9,
10] has utility for identifying treatment response and as such may be used as a target to guide treatment decisions [
16]. When the standardized slope coefficients of the composite endpoints (STBs) from a multiple logistic regression model were compared, the change in PASDAS had the largest magnitude of association with MDA response among all the composite endpoints examined, suggesting that it had the strongest predictive ability compared with DAPSA and CPDAI; CPDAI had the lowest predictive ability of the endpoints.
The differing findings with respect to tofacitinib treatment for the three disease-specific composite endpoints considered in this analysis could have resulted from the different composition of the endpoints evaluated. The PASDAS and CPDAI both include assessment of the skin manifestations of PsA (the PASDAS by inclusion of the patient’s global “arthritis and psoriasis” VAS) and the severity of enthesitis and dactylitis as well as TJC and SJC. DAPSA, however, is focused on TJC and SJC, with no consideration of skin disease, enthesitis, or dactylitis and an arthritis-focused global score. The PASDAS and CPDAI also both incorporate PROs; the PASDAS incorporates the PCS score of the SF-36v2 acute, and the CPDAI the DLQI and ASQoL. In this analysis, the PASDAS appeared to be the most sensitive to improvements in the signs and symptoms of PsA related to treatment with tofacitinib and adalimumab; the effect size observed with the PASDAS was higher than for any other endpoint at all time points in both studies. The ability of the PASDAS to detect change in these two studies might reflect the components of the measure; skin manifestations, enthesitis, dactylitis, and PROs all appeared to be sensitive to treatment-related changes in OPAL Broaden and OPAL Beyond, although the adoption of a hierarchical testing scheme for key secondary endpoints precluded demonstration of significance for all measures and time points [
9,
10]. The CPDAI also incorporates skin, enthesitis, dactylitis, and PROs but appeared less sensitive to treatment differences than PASDAS though with generally higher effect size and standardized response mean than DAPSA. Inclusion of the axial disease domain in CPDAI (which does not feature in the other composite endpoints assessed) could offer an explanation as to why tofacitinib had the least impact on this composite; it may be that axial disease responds to a lesser extent than the other domains to treatment with tofacitinib and this may have impacted the final composite score. The CPDAI may also be less responsive because of the way it is constructed: the CPDAI is essentially a categorical measure re-expressed as a continuous scale and the hierarchical thresholds may blunt responsiveness. As previously discussed, the utility of DAS28–3(CRP) is limited because of the small number of components included in the composite and the lack of inclusion of measures of skin disease, enthesitis, dactylitis, or PROs.
It is clear from these analyses that PASDAS has superior performance in this context and it has already been reported that the consensus view is that PASDAS should be the outcome measure of choice in PsA clinical trials [
28]. The DAPSA is easier to evaluate but there are arguments against this measure; PsA is a complex multifaceted disease which requires appropriate evaluation across domains, and measures such as the DAPSA, though easy to perform in practice, do not fulfill this function. In terms of clinical practice, the PASDAS does provide a challenge in both acquiring the data and processing the result: the first challenge represents the general case of clinical assessment in PsA; the second challenge is easily overcome by the use of predefined spreadsheets and web-based resources.
This analysis had a number of limitations. The OPAL Broaden and OPAL Beyond studies were not designed for evaluation of the composite endpoints’ longitudinal validity and sensitivity to change. In addition, for the calculation of effect size and standardized response mean, only patients with greater than or equal to 3% psoriasis BSA affected at baseline were included, with no missing values of the composite endpoints across multiple visits. Consequently, patient numbers were relatively low in some cases; CPDAI data were available for only 63% and 52% of patients receiving tofacitinib 10 mg BID in OPAL Broaden at month 12 and OPAL Beyond at month 6, respectively, and effect size and standardized response mean were calculated in only 47–64% of patients. Also, there was no adjustment for multiplicity; therefore, the P values reported for comparison with placebo should be considered nominal.
Acknowledgments
The authors thank the patients who participated in the OPAL Broaden, OPAL Beyond, and OPAL Balance clinical studies. Medical writing support under the guidance of the authors was provided by Richard Knight, of CMC Connect, a division of Complete Medical Communications Ltd (Macclesfield, UK), and Carole Evans, on behalf of CMC Connect, and was funded by Pfizer Inc (New York, NY, USA) in accordance with Good Publication Practice (GPP3) guidelines (Ann Intern Med. 2015;163:461−4).
Competing interests
PH has received research grants from AbbVie, Janssen, and Pfizer Inc and has received personal fees from AbbVie, Amgen, Janssen, Pfizer Inc, and UCB. LCC has received research grants from AbbVie, Celgene, Janssen, Novartis, and Pfizer Inc and has received personal fees from AbbVie, Amgen, BMS, Celgene, Galapagos, Janssen, Lilly, MSD, Novartis, Pfizer Inc, Prothena, and UCB. OF has received grants from AbbVie, BMS, and Pfizer Inc and has received personal fees from BMS, Celgene, Janssen, Novartis, Pfizer Inc, and UCB. PN has received grants or fees for participating as a speaker or in advisory boards from AbbVie, BMS, Janssen, Lilly, MSD, Novartis, Pfizer Inc, Roche, Sanofi, and UCB. ERS has received grants or personal fees for participating as speaker or in advisory boards from AbbVie, BMS, GlaxoSmithKline, Janssen, Lilly, Novartis, Pfizer Inc, Roche, Sandoz, and UCB. MEH has received consultant or personal fees for participating in advisory boards from AbbVie, Genzyme/Sanofi, Janssen, Lilly, Novartis, and UCB. MAH, KSK, TH, JW, and EK are employees of Pfizer Inc and hold stock/stock options in Pfizer Inc.