Assessments of pretreatment measurements
The potential for item reduction was assessed by applying the item-inclusion criteria from the original validation study
[
6]. Criteria were applied to the IBS-QOL instrument to see if differences exist between the current large IBS-d sample and the smaller, original, non-subtyped IBS validation sample. The criteria assessed included:
-
50% of patients responded “not at all” and therefore could not improve on the item
-
5% or more missing data
-
an item-to-total correlation of <0.4 indicating that the item may be measuring a different latent construct
-
pairwise correlations between individual items that exceeded 0.7 indicating redundancies in measurement.
The original factor structure of the IBS-QOL and possible alternative subscale structures were assessed. Several Confirmatory Factor Analysis (CFA) models were fit via maximum likelihood. Diagrams outlining the different conceptual models are included in Additional file
3: Figure S1. The first model corresponds to the original PCA and assumes orthogonal factors adequately measure independent subdomains of IBS-related QOL. The likelihood ratio χ
2, Akaike Information Criterion (AIC), and Schwartz’s Bayesian Information Criterion (BIC) were used as indicators of model fit; smaller values are generally considered better. A second, hierarchical CFA was fit which imposed the original structure, but also assumes that the subscales themselves form a generalized factor,
[
12] presumably, the latent construct of HRQOL in IBS. Hierarchical factor analyses employ a two-step approach; the items are grouped into factors and then the factors are submitted to factor analysis. The third and fourth models fit were confirmatory bi-factor models
[
13]. The bi-factor approach was employed to refine the strictly hierarchical HRQOL conceptualizations as this method allows for a structure whereby subscales may explain variance not necessarily associated with the general QOL factor. Items were organized into 8 subscales and a general factor with each item having hypothesized relationships to one subscale and the general factor
[
14]. For example, Item 12 is hypothesized to load onto the Sexual subscale and also the general construct of HRQOL in IBS. The two bi-factor models included: one with orthogonal factors, i.e., in which the model does not allow factors to correlate with one another; and an oblique one in which correlations between factors are allowed. A single factor model in which all items load onto a single factor was also fit for reference to the other CFA models. Such a model imposes a structure in which all items load onto one general factor representing HRQOL in IBS.
Because CFA models involve fitting complex multivariate data, model fit is evaluated by inspecting several fit indices
[
15]. Numerous fit indices have been suggested, but minimally, a CFA model should be evaluated for fit based on a combination of different indices
[
16] as each assess different aspects of the model. Indices fit included:
To investigate possible misspecification of any CFA models, an Exploratory Factor Analysis (EFA) of the items was conducted to evaluate what structure would be suggested by the current data sample. EFA imposes no a priori structure to the data
[
12] and is similar to the approach taken by Patrick, et al.
[
6].
So-called internal consistency reliability was assessed by computing Coefficient α for the 34-item IBS-QOL total score as well as the Coefficient α-value for all (n-1) combinations, i.e., the so-called α-if-item-deleted, to gauge influence of single items. Values of α above 0.7 indicate a good level of consistency with values above 0.9 being considered excellent. Extremely high values can call into question whether scale items could be eliminated because of redundancy.
Assessments including postbaseline measurements
Consistency of the IBS-QOL total score over time is usually evaluated by correlating responses over repeated measurements. All administrations of the IBS-QOL post Baseline were in the presence of treatment, thus, a traditional ICC would be biased by treatment in the current case. To account for treatment effects and time trajectories on the IBS-QOL total score, reliability was assessed by first estimating variances via a linear model and utilizing the resultant conditional variances to establish reliability. Such an approach has been developed and described in two papers by Laenen, et al.
[
18,
19]. Their reliability measures, R
Λ and R
T, utilize estimated variances from a linear model to calculate reliability over the set of repeated measurements, conditional on the covariates. Thus, in lieu of calculating an ICC, reproducibility was assessed via fitting longitudinal models to the repeated administrations of the IBS-QOL accounting for treatment effect over the treatment period. Details of the approaches are given in Additional file
4.
Construct validity of the IBS-QOL total score was assessed by evaluating it in relation to other clinical outcomes. For the IBS-SSS and EQ-5D, Pearson correlations at Baseline and Week 12 were calculated. Since the scale of the IBS-SSS is opposite to that of the IBS-QOL, a negative correlation between it and the IBS-QOL total score indicates convergence. A positive correlation with the EQ-5D indicates IBS-QOL converging with general HRQOL.
Further, change from Baseline to Week 12 in IBS-QOL total scores were correlated with similar changes from Baseline for IBS-SSS, EQ-5D, and average worst abdominal pain (WAP)
[
7]. The change score for the WAP variable was calculated as the average of WAP ratings for Weeks 11and12 compared to the average for the two weeks prior to dosing and the average of WAP for Week 12 compared to the average in the week prior to dosing.
Additionally, correlations between the IBS-QOL total score with the IBS-AR and FDA Clinical Responder status were calculated. The IBS-AR is a historically-used global measure of change used for assessing relief in IBS. A single item, “Over the past week have you had adequate relief of your IBS symptoms?” is administered to the patient and they respond either “Yes” or “No.” Despite its established value as an endpoint measure for clinical trials dissatisfaction by regulatory agencies with the IBS-AR has led to the desire to develop quantifiable symptom based patient-reported outcome (PRO) measures for IBS
[
20]. Pending the development of a final IBS PRO, the FDA issued a guidance document in 2012 for drug development in IBS in which they formulated responder analysis definitions based on diary collection of pain and stool consistency ratings. One of the FDA Clinical Responder definitions from the Guidance, utilized by Dove, et al., is also used in the current paper as an additional criterion for assessing the validity of the IBS-QOL total score
[
5]. The definition is based on a percentage of days a patient has a simultaneous improvement in both pain and stool consistency on the same day—the so-called daily responder definition
[
21]. Since these outcomes are measured on a dichotomous scale two different biserial correlation approaches were calculated to account for non-continuous variables
[
22,
23]. See Additional file
5 for a full description of the approaches.
The IBS-QOL was previously assessed for responsiveness
[
7] using Cohen’s d statistic
[
24]. In that analysis, effect sizes for the change in scores from pre-treatment to post-treatment were computed similarly to standardize mean differences by putting changes in scores into standard deviation units. The d statistic originally employed a standard deviation value that was either based on the Baseline pooled data or on the control group only. Both of these methods inherently assume homogeneity of variance, either across time points or across treatment groups. To account for potential heterogeneity of variance across treatment groups and also handle data dependencies due to repeated patient measurements an additional assessment was calculated by estimating a longitudinal model for the change in IBS-QOL total score between Baseline and Week 12 administrations. To visually assess the IBS-QOL total score, the cumulative proportion of patients meeting a certain change from Baseline to Week 12 was also plotted by treatment group and the proportion of patients meeting certain thresholds of improvement for Placebo and Eluxadoline 100 mg treatment groups were compared.
Statistical analyses were performed with R version 3.0,
[
25] R-package ltm,
[
26] and SAS® software version 9.3
[
27].
Trial registration: ClinicalTrials.gov identifier
NCT01130272
Discussion
The goal of the current paper was to replicate and expand on the original psychometric assessment of the IBS-QOL when applied to an IBS-d-specific patient set. Our results indicate that male and female IBS-d patients who are highly compliant with daily diary entry and who have a minimal requirement for pain as well as explicit criteria for diarrhea as defined by the BSS share commonalities with a general population of non-subtyped IBS patients, but that the originally-proposed subscale structure doesn’t apply as well as one might anticipate to our patient set. The deviations observed from the original assessment could be attributed to the fact that we evaluated IBS-d patients or due to the much larger sample size employed here. Without such large-scale data on other IBS subtypes, it is difficult to discern the cause of the departures from the original analyses, but in the case that one or both differences are influencing the current results, it is still clear that the IBS-QOL performs well for IBS-d patients.
The item reduction criteria applied to the 34-item version of the IBS-QOL resulted in many items having high bivariate correlations, as defined as r ≥ 0.7. A possible factor influencing the high correlations between items could be due to priming or order effects, i.e., responses on subsequent items being influenced by earlier-answered items. As the IBS-QOL is a static instrument with only one item order presented to patients, however, testing whether priming influences responses by patients to single items is not possible.
Alternatively, high correlations between items could suggest that the items are measuring a single latent trait. Items 6 (“I feel like I’m losing control of my life because of my bowel problems”), 7 (“I feel my life is less enjoyable because of my bowel problems”), 9 (“I feel depressed about my bowel problems”), and 10 (“I feel isolated from others because of my bowel problems”) all showed a fairly high level of correlation with one another. The α-value for the overall sum scale of the IBS-QOL is also very high, suggesting redundancies across these items.
Similarly, Items 12 (“Because of my bowel problems, sexual activity is difficult for me”) and 20 (“My bowel problems reduce my sexual desire”) exhibited a high correlation with one another (r = 0.741) as expected. Both items make up the Sexual subscale and while the language of the two items respectively imply physical and psychological aspects of sexual activity, patient responses tended to suggest that one does not occur without the other.
There were several other pairs of items that exhibited high inter-item correlation values (cf., Table
1). Our results suggest that a possible future research path for the IBS-QOL is to explore whether a shortened version of the IBS-QOL targeted toward IBS-d could be constructed from the current items while maintaining its measurement properties and still being relevant to IBS-d patients. If items have redundancy, then one could conceive of an item pool that supplies items to each of slightly different versions of the IBS-QOL. Alternatively, specific cognitive debriefing may also help isolate whether any of these items are truly redundant or if items all closely measure HRQOL in IBS-d and simply represent very closely related aspects of IBS-d-related QOL.
Conversely, in IBS-d patients, a departure from the original validation analyses was not surprising either. For example, 72.8% IBS-d patients answered “Not at all” to Item 32 (“I fear I won’t be able to have a bowel movement”) at Baseline. This result fits, conceptually, with how patients should answer items that are not geared toward their IBS subset. This item, therefore, could be taken out of a targeted IBS-d instrument or, perhaps, could simply be included with a binary, “yes” versus “no”, response instead of the 5-point graded response set.
While some of the results suggested that certain items in the IBS-QOL may be candidates to remove if a reduced-item version were to be sought for IBS-d patients, other results support that the full set of items is relevant and psychometrically sound, consistent with conclusions of previous validation studies of the IBS-QOL. This result is not surprising given the extremely high value of Cronbach’s Coefficient α (α = 0.963). This is consistent with the interpretation of the bi-factor and EFA models because the common interpretation of Coefficient α analyses is that the items are internally consistent and therefore represent a unidimensional latent construct.
This conclusion is reinforced by a high observed average item-to-total correlation of 0.642. However, one of the limitations here is that modern applications of α analysis stretch interpretation of the statistic beyond its original intent
[
28]. Coefficient α was intended to substitute alternate forms reliability—in which two equivalent forms of the instrument were to be administered and the results correlated with one another. As most instrument developers do not have the resources to develop two instruments together, Coefficient α was devised as a means of assessing agreement between an instrument and a theoretical one of same length, comprised of items randomly drawn from all possible content valid items. The coefficient, therefore, is laden with assumptions and also is, ostensibly, a lower bound for the theoretical true internal consistency of a measure. Many have criticized the use of α for this and other reasons
[
29‐
31]. Further, while an α assessment assumes sum of item responses, the IBS-QOL standardizes responses to a 0-100 scale, so without further study, it is not clear how the scoring algorithm relates back to a simple sum score. Structural equation modeling techniques, e.g., extensions of the CFA models, actually offer the best alternatives to α and other individual indices as they are better equipped to handle multivariate item data
[
32,
33]. However, any positive or negative bias around the α-value of 0.96 would likely still yield acceptable levels of consistency.
In terms of how the items structurally relate to one another at the instrument level, the fact that the oblique bi-factor model fits the data the best, and better than the orthogonal bi-factor model, suggests that the original factor structure is redundant to the total sum score because factors that are allowed to be correlated fit better with the data than hypothetically independent subscales. We do note, however, that more complex CFA models tended to fit better by both standard fit indexes and usual assessment of residuals and that, generally, increasing model complexity provides better fit in most statistical models. While the oblique bi-factor model accounted for a marginal amount of variance (GFI = 0.8616), an acceptable improvement in variance above a null model (CFI = 0.9073) was observed. The RMSEA index imposes a penalty for higher complexity models, thereby allowing us to infer whether the bi-factor models fit better according to other indices based on their complexity. The observed RMSEA value of 0.069, although moderate in size, comparatively supports the oblique bi-factor conceptualization of IBS-d, i.e., that the best of all CFA models fit is on with an overall latent factor supported by the original substructure whilst allowing the substructure factors to correlate with one another. The model fit may have room for improvement as 10.5% of residuals are outside of the preferred limits potentially indicating that some items may not fit well within the proposed structure.
The EFA model supports that there may be pairs or subsets of items of the IBS-QOL that group together more so than with others—an observation that is not surprising given the observed inter-correlations between items. Interestingly, though, the EFA fit did not produce a factor structure in line with the original substructure, suggesting that HRQOL may be qualitatively different for IBS-d as compared to non-subtyped IBS patients as a whole.
Despite the extraction of multiple factors from the analysis, the EFA fit actually further strengthens the interpretation that the IBS-QOL is unidimensional in IBS-d patients. This is because the EFA model has a large first eigenvalue (37.8) as compared to the second (3.8). Eigenvalues of extracted factors measure the amount of variance observed in the items making up that factor. Here the first extracted factor accounts for 79.5% of the total variance in the items. Further, a test of whether the items suggest a structure in which there is at least one common factor to all items was also significant [χ
2(561) = 16,080.2, p <0.0001] implying that any structure extracted after that first factor is residual information that could enhance interpretation of the first factor, but one factor would be adequate to interpret the construct under study. This observation indicates that imposing the original factor structure
[
6] is helping model fit, implying that the original subscale structure of the IBS-QOL seems to be beneficial in accounting for information above and beyond the total sum score. Furthermore, combined with the fit of the orthogonal bi-factor model results, one could conclude for IBS-d patients that the IBS-QOL may be measuring a unidimensional construct, both because of the need to allow factors to correlate and that the original substructure seems only approximately correct.
The CFA and EFA modeling, taken together, suggest that perhaps the best means of assessing the psychometric properties of the IBS-QOL would be to employ Item Response Theory (IRT) methods
[
34]. IRT approaches estimate a latent construct via a joint model of the individual items. IRT models can also help determine if individual items are performing as intended within the IBS-QOL because relationships between items and the latent trait under study are estimated, directly.
In terms of test-retest reliability, the current analyses demonstrated good levels for the IBS-QOL total score in this regard. Both RΛ and RT exceed the traditionally-accepted reliability threshold of around 0.7 and were comparable to the ICC calculated by the original validation study. Both reliability measures employed here are similar to ICCs with slightly different interpretations. RΛ is the multivariate reliability of the sequence of scores while RT is the average reliability for the total score over any arbitrary number of administrations. Both will tend to increase for a consistent instrument with more administrations because additional information is being taken into account with each added administration. Therefore, with 3 post-Baseline administrations of the IBS-QOL, we have substantial evidence for good reliability of the total score. Contrastingly, even with less information, e.g., two administrations of the IBS-QOL, we would expect that a reliability level would still be approximately 0.75 by our estimates.
The analysis of IBS-QOL total scores with regard to responsiveness were consistent across effect size definitions for different paired comparisons, with moderate increases in effect sizes seen for higher doses of eluxadoline versus placebo. Interestingly, the pattern of effect size estimates suggest that the 100 mg dose of eluxadoline had the largest impact, the same conclusion as was reached by the analysis of clinical measures
[
5] as defined in FDA’s 2012 IBS Guidance
[
21]. This conclusion is bolstered by evaluating the cumulative proportions of change from Baseline to Week 12 scores for the IBS-QOL total score with better improvements seen at higher dose levels, specifically 100 and 200 mg. We especially note that within a wide range of improvement levels, the proportion of patients in the eluxadoline 100 mg group meeting given improvements was dramatically higher than those patients receiving placebo. This indicates that the observed treatment effect in the IBS-QOL total score is consistent. Visually, this result is apparent by the wide gap between the placebo and 100 mg eluxadoline lines on Figure
1.
Of note, all treatment groups showed large increases in IBS-QOL total scores at Week 12 as compared to Baseline. Even the Placebo group showed an approximately 17-point increase in total score—higher than the 14-point clinically-significant difference found by Drossman, et al
[
10]. While further longitudinal study is warranted, we believe that the improvement may be due to natural cycling of disease or due to potential Hawthorne effects, ie, improvements by patients as a result of simply being observed. We do, however, also note that the treatment group differences approximate a dose response that peaks at 100 mg and plateaus with 200 mg. This pattern mimics that of the other outcome measures reported elsewhere
[
5].
Our analyses suggest that a reduced-form IBS-QOL specific for IBS-d sufferers may improve measurement of IBS-related QOL for these patients. However, further research is necessary to determine which of the items may be ideally suited for a reduced form. We suggest that a better characterization of item-level properties of the IBS-QOL via IRT methodology would be helpful in determining an optimal item configuration.
Competing interests
The current project was funded by Furiex Pharmaceuticals. DAA and PSC are employees of Furiex and both own stock in the company. DLP and DAD are paid consultants to Furiex.
Authors’ contributions
DAA contributed to the trial design, conceived the analysis strategy, analyzed the data, and drafted the manuscript. DLP and DAD aided in interpretation of the data and provided critical input to the manuscript development. PSC helped design the clinical trial from which the data were generated, edited and gave significant commentary to the content, and aided in drafting the manuscript. All authors read and approved the final manuscript.