As mentioned above, two identical trials were designed to evaluate the efficacy and safety of enobosarm for the prevention and treatment of muscle wasting in patients with NSCLC undergoing first-line platinum-based chemotherapy either with a taxane (POWER 1) or non-taxane (POWER 2). A range of potential standard chemotherapy regimens allow the treatment of the study population to reflect standard of care at the community level, including the most common chemotherapy in this setting. Additionally, similar outcomes were expected across these chemotherapy regimens in terms of toxicity and clinical response within each trial, reducing potential heterogeneity associated with treatment within the taxane trial and within the non-taxane trial. First-line treatment with tyrosine kinase inhibitors is prohibited in these studies to maintain a homogenous patient population in terms of first-line chemotherapy and avoid any potential concerns related to the ability of the tyrosine kinase inhibitors to exacerbate muscle wasting [
23]. Importantly, this exclusion criterion allows study participation by the majority of patients with NSCLC undergoing first-line treatment and allows subjects to receive tyrosine kinase inhibitors if clinically warranted after potential tumor progression during the trials (failed first-line chemotherapy).
As muscle wasting has multifactorial etiologies that differ depending on the type of malignancy with which it is associated (pancreatic cancer vs head and neck cancer vs esophageal cancer, as examples), the success of a phase 3 clinical trial depends on further limiting heterogeneity by studying one specific tumor type at a time. NSCLC was chosen as a representative cancer for these phase 3 studies primarily because lung cancer is the leading cause of cancer death in the western world, including the USA [
24] and up to 85–90 % of lung cancer cases are NSCLC [
25]. There were approximately 1.8 million new cases of lung cancer reported worldwide in 2012 [
26]. Approximately 50 % of patients with NSCLC and greater than 60 % of men with NSCLC have already developed severe muscle wasting by the time their malignancy is diagnosed [
4]. Moreover, in the preceding phase 2b trial of enobosarm in patients with NSCLC, significant losses in LBM occurred over the course of the study (4 months) in the placebo arm, while enobosarm improved physical function and LBM.
Importantly, the FDA is in agreement with including NSCLC as an appropriate cancer type to target in the phase 3 trial as these patients would likely present with a median survival of a sufficient duration to measure the effect of the therapy. As other cancer types are associated with more aggressive muscle wasting (e.g., pancreatic cancer), the shorter overall survival represents a potential challenge for a 5-month intervention.
Rationale for Physical Function Tests
Physical function tests have been utilized in the approval of medications to treat diseases associated with functional limitations such as multiple sclerosis, pulmonary arterial hypertension, and HIV-associated wasting [
29‐
31].
The stair climb test was chosen for these trials based on its association with everyday living and is associated with strength, balance, mobility, speed, and endurance [
32]. The stair climb power has been a physical function method of choice on previously conducted clinical studies in populations with or at risk for muscle loss (including NSCLC patients). It is a simple and safe measure associated with measures of lower-limb muscle strength and power and functional performance in older adults [
33]. Decreases in stair climb power in elderly patients have been associated with detrimental changes in balance and falls and morbidity and mortality, whereas increases have been associated with improvements in QOL [
32].
Stair climb power is calculated as power (watts) = work/time = force × velocity. In addition to strength, it takes into consideration a constellation of muscle-related attributes including balance, mobility, and endurance [
32]. Due to the level of physical intensity required to climb stairs, increases in stair climb power should equate to similar or greater improvements in other less physically intense daily activities that are either short in duration or utilize smaller muscle groups (i.e., walking a short distance, rising to a standing position from a chair, or lifting or carrying household items). Furthermore, the stair climb test is a direct measure and is a well-accepted, reproducible, portable, and objective measure of physical function [
32].
Regardless of the physical function test used, thresholds of clinically meaningful change have been established. A minimally clinically meaningful change in physical function is a 5 % increase from baseline and a substantial clinically meaningful change is a 10 % increase from baseline. Published literature in healthy elderly and mobility limited subjects has correlated measures of physical function with clinically meaningful changes as established in the Short Physical Performance Battery (SPPB). In a large randomized trial with adults aged 70–89 (
N = 424) Kwon et al. utilized 400-m walk and gait speed and demonstrated that a 4–4.5 % improvement in physical function translates into “minimally meaningful” change while an improvement of 10 % represents a “substantial meaningful” change [
27]. Perera et al. showed similar results with gait speed, 6-min walk distance, and self-reported mobility in older adults with mobility disabilities (
N = 492) concluding that an improvement in performance of 6–8 % represents a “small meaningful change” and 11–17 % a “substantial meaningful change” [
28]. These studies define thresholds for “minimally meaningful” and “substantial meaningful” clinical change that can be applied regardless of the physical function test used.
Additional Assessments
CT scans will also be performed at the days-84 and day-147 visits to assess tumor status, using contrast of the chest and abdomen, including the liver and adrenal glands. If subjects have brain metastases at baseline, CT or MRI will also be assessed at these time points.
Quality of life will be assessed using five different tools: the Functional Assessment of Anorexia and Cachexia Therapies (FAACT-12®), the Functional Assessment of Chronic Illness Therapy–Fatigue Scale (FACIT Fatigue Scale®), Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Functioning Short Form 10a, PROMIS® Emotional Distress-Depression Short Form 8b, and the EQ-5D-5 L™ (Table
2). These QoL instruments were intended to provide insights regarding how the patients’ health, physical functioning, and ability to care for themselves have been affected by their disease, as well as their level of emotional distress and fatigue and perceptions about the importance of various disease characteristics.
Statistical Methods
The two trials differ only in the choices of chemotherapy. Both trials will evaluate the same endpoints of lean body mass and stair climb power. After discussion with US and European regulatory authorities, it was decided that different methods of analysis of the SCP and LBM end points would be used in the two regions.
For the US authorities, a responder analysis will be performed for LBM and SCP as coprimary endpoints.
For European authorities, SCP will be the primary endpoint and LBM secondary. Both will be analyzed by longitudinal analysis of percent change from baseline through days 84 and 147. Both analyses have different strengths and weaknesses, but will be complementary to understanding the treatment outcomes for enobosarm in the population and are further described below.
The design proposed to US regulatory authorities is a responder analysis consisting of co-primary end points, one for physical function and one for LBM. Physical function response is defined as ≥10 % improvement from baseline to day 84, and LBM response is defined as no loss of LBM from baseline to day 84. Non-response is a failure to meet the response definition or not having the day-84 assessment for any reason. Missing data is accounted for by this definition of non-response. The design assumes a proportion of response among treated subjects of 0.20 above the control response proportion for each of the physical function and LBM endpoints. Retrospective application of the response definitions to the subset of NSCLC subjects in the predecessor phase 2b trial showed the maximum control response was 25 % for the LBM endpoint (19 % for SCP); this was inflated to 30 %, that the difference in the proportions responding to both end points was approximately 0.20, and with other parameters α = 0.05 and power = 90 %, the sample size required was 124 subjects per arm. Built into the computation above was the assumption that 30 % of subjects would be considered non-responders due to missing the day 84 primary endpoint assessment. FDA requested that the sample size be increased to 150 subjects per arm for the purposes of the safety data base. The 0.20 difference in proportion of response between the two arms covers a wide range of possible control response proportions so that for control, response proportions from 0.20 to 0.75 all have power >90 % to detect a 0.20 difference at α = 0.05 with 150 subjects per arm. Specifically, at the aforementioned 0.30 control, response power is 93.3 %. Overall study success for US purposes is defined as rejecting the null hypothesis for both primary endpoints using a two-sided type I error probability of 0.05 for each. Considering the need for both co-primaries to be statistically significant, the power for each trial is at least 86.5 % under the assumption of no correlation between endpoints.
Each endpoint will be tested separately for significance using a Monte Carlo approximation to the exact Cochran-Mantel-Haenszel test stratified by chemotherapy regimen (platinum plus paclitaxel or platinum plus docetaxel for POWER 1 and platinum plus gemcitabine or platinum plus pemetrexed or platinum plus vinorelbine for POWER 2), gender, and cancer stage (III or IV). Importantly, a patient may be a responder for one endpoint but not the other. Because all subjects randomized and treated will have a response classification, the primary analysis is intent-to-treat.
Analyses associated with the continuous form of the data, rather than the dichotomous form of the data used in the responder analyses, will be used to analyze secondary endpoints. As noted, the percentage change in power will be the sole primary endpoint (stair climb power) for European regulatory authorities. Random coefficients models (RCM)—also termed mixed model repeated measures (MMRM) analyses—will be used in order to include all available data, including the day 42 assessment, for the physical function endpoint including replicates, study day of assessment, treatment arm, and the interaction between treatment arm and study day of assessment. A significant interaction (p < 0.05) would indicate significantly different slopes (rates of change per day) between the enobosarm-treated arm and the placebo arm. The same methodology will be applied to the continuous form of the LBM data; however, there are not replicates of DXA results at each time point, unlike the physical function test. For each of the physical function and LBM endpoints, an MMRM analysis that compares the mean of post-baseline measures between the two arms will be undertaken as well. Hochberg’s methodology for controlling alpha will be applied, so that, e.g., if the slope analysis are not significant for say physical function, testing will move to the post-baseline mean analysis but the alpha level required for a significant result will be α = 0.025, and the physical function endpoint will be considered significant. Hierarchical testing will be used to control alpha across all of the secondary endpoints so that if an endpoint is deemed non-significant, i.e., both the slope and post-baseline mean analyses are non-significant, formal statistical testing will halt, and all further secondary endpoints below that endpoint in the hierarchy will be considered non-significant.
Responder analyses will be undertaken for QoL instruments that have individual changes that are considered a response, as already described for the coprimary endpoints. For quality-of-life instruments without a defined response, appropriate parametric or nonparametric tests will be used to compare differences in distributions. Additional sensitivity and subgroup analyses are planned.
Survival analysis will be conducted as a predefined safety endpoint of the clinical program to ensure that there was no detrimental effect of enobosarm on the underlying cancer. Additionally, the survival outcome data will be pooled to formally assess superior survival in the enobosarm group compared with placebo. The total number of patients expected for the survival assessment is 600; the survival assessment is event-driven and will require at least 450 deaths. Survival will be estimated by the Kaplan–Meier method; differences in survival distributions will be compared with a stratified log-rank test, stratified by chemotherapy (taxane, non-taxane; effectively stratifying by trial), sex, and stage. It is projected that the 450 deaths will be realized at approximately 2.84 years after accrual of the first patient. Although the trials are not prospectively powered to detect a survival difference, if median survival in the combined placebo arm is assumed to be 1 year and uniform accrual of all 600 patients occurs in 1 year (both trials starting at approximately the same time), then the test would have 86.6 % power to detect a hazard ratio of ≤0.75. The actual critical hazard ratio estimate of ≤0.831 would be associated with a significant (p < 0.05) result and lead to a conclusion of survival superiority.