Background
Fatigue is a common and disabling symptom in many chronic diseases. It often co-varies with depressive symptoms [
1‐
4] and sleep impairment [
5,
6] and is associated with poorer self-reported health status [
7,
8] and reduced quality of life [
9‐
12]. Because fatigue is a perceived phenomenon, researchers and clinicians rely on subjective measures to indicate need for intervention or effectiveness of treatment. When 18 fatigue measures used in chronic illness research were reviewed, the Fatigue Severity Scale (FSS) [
13] was rated highest on robust psychometric properties [
14]. An advantage of the FSS is that it is a short 9-item measure with items formulated as statements about the fatigue experience itself (item #3), what causes fatigue (item #2), and how fatigue interferes with daily life (7 items).
In our prior work [
15‐
17] we used Rasch analysis to evaluate the psychometric properties of the original 9-item version of the FSS within several chronic illness groups, including multiple sclerosis (MS), stroke and HIV/AIDS. These three illnesses were selected because fatigue is a well-documented and prevalent symptom in each of these patients groups [
18‐
25]. The prior studies [
15‐
17] each concluded that a 7-item version of the FSS (FSS-7) has better psychometric properties and is a reliable and valid measure of fatigue
interference rather than
severity as indicated by the title of the measure. These studies, as well as a study by Mills et al. [
26], provided consistent evidence regarding the relationship between the included items and the underlying latent trait in diagnosis-specific samples. However, we do not know whether the FSS items function similarly across the samples and are thus appropriate for use in comparative studies.
Rasch analysis is a useful tool for evaluating these types of psychometric properties. It is part of the group of modern psychometric approaches in item response theory used to evaluate the relative endorsement, or hierarchy of items within a measure. Rasch models support the process of validation analysis by providing a transformation of an ordinal score into a linear, interval-level variable. The Rasch model shows what item responses would be expected if interval scale measurement is to be achieved. Actual response patterns, identified in a questionnaire with a set of items intended to be summed together, are tested against what would be expected by the model. The hierarchical ordering of items within the scale can also be affirmed [
27]. That is, the items can be arranged in order of relative difficulty based on their likelihood of being endorsed given an underlying level of the measured construct.
Although the original FSS has been used to measure fatigue in a variety of different populations, we found no studies that explored the stability of the item hierarchy across different disease populations. In this context, “stability” does not mean reliability as defined in classical test theory, but rather reflects the similarity, consistency or invariability of the item hierarchies across groups. If item hierarchies vary across groups (some items are easier or harder to agree with for one group compared to another group), the resulting scores will be biased. Rasch analysis addresses and evaluates the impact of such issues using differential item functioning (DIF) [
27]. Studies encompassing several disease groups are critical for integrating evidence to transfer knowledge about potential interventions from one clinical specialty to another. Measures such as the FSS must therefore be evaluated across diagnostic groups to ensure that the scale functions comparably across groups and is not overly influenced by diagnostic variations.
Thus, the purpose of this study was to evaluate whether the FSS-7 demonstrates a similar item hierarchy across three chronic illness groups (MS, stroke and HIV/AIDS) to ensure valid comparisons between groups and provide further evidence of internal scale validity. We conducted a secondary analysis of FSS data from three different studies of samples with potential diagnostic differences in item hierarchy. Each of these studies had previously concluded that the FSS-7 has better psychometric properties for measuring fatigue interference than the original FSS-9 [
15‐
17], and this study aims to determine whether the FSS-7 can be validly and reliably compared across illness groups.
Discussion
In the present study, fatigue interference, as measured with the FSS-7, was compared in three chronic illness samples (MS, stroke, and HIV/AIDS). Overall, the MS sample demonstrated more fatigue interference than the stroke and HIV samples. However, four of the seven items functioned differently for two of the three samples, thereby failing to meet the set criterion for stability across diagnostic groups and raising questions about item validity across groups. Nonetheless, when disease-specific scores were compared to the disease-generic scores, person measures were generally placed in the same position when considering the level of precision evident in individual standard errors, thus indicating that individual fatigue interference in these three chronic illnesses might still be reliably comparable when using the FSS-7.
The MS group had more fatigue interference than both the older group of people with stroke and the HIV group of similar age. These differences may be due to the fact that almost half of the MS sample was employed and more were partnered compared to the other two groups. These findings are also consistent with prior reports of higher prevalence rates of fatigue among people with MS (55% to 83%) [
18‐
20] than in other groups living with chronic illness, including stroke (24% to 77%) [
21,
22] and HIV (37% to 65%) [
23‐
25].
With respect to the specific FSS-7 items, people with HIV/AIDS were less easily fatigued (item #3) than people with MS or stroke. Given the higher rates of employment among those with MS and stroke, these groups may be more likely than people with HIV/AIDS to encounter situations that demand energy, and thus contribute to fatigue. In addition, living with a partner, which was more common among the people with MS or stroke in this study, has been associated with increased fatigue [
2]. Living with a partner might interfere with adjustments to one’s level of activity in an attempt to manage fatigue and may help explain the group differences on item #3. In fact, such contextual issues may potentially be influencing the hierarchies more than the specific chronic illness.
People with MS were more likely than people with stroke to agree that fatigue causes frequent problems for them (item #5). However, differences in the specific position of each item may not be as important as overall hierarchical order, and while item #5 was the hardest item to agree with for people with stroke and for people with HIV/AIDS, it was still the second hardest item to agree with for people with MS. The fact that item #5 was relatively hard for all three diagnostic groups to agree with may also reflect how people with fatigue learn to cope, and thereby reduce its impact on their lives. People who experience low energy often reduce their activity according to their perceived capacity for mental and physical work, and thus their fatigue may not necessarily result in frequent problems [
37]. In addition, the finding that fatigue interference more easily causes problems for people with MS could reflect ongoing disease activity or the progression of impairment in people with MS, which might make it more difficult for them to compensate for fatigue-related problems.
The samples with MS and HIV/AIDS differed significantly regarding the effect of fatigue on sustained physical functioning (item #6), a difference which might be explained by the fact that people with MS are more likely to be bothered by heat sensitivity [
3], which is often regarded as a barrier to being physically active. In addition, people with HIV/AIDS were more likely than people with MS or stroke to report that fatigue interferes with their work, family, or social life (item #9). This finding may be associated with the experience of stigma, isolation, and medical disability documented in people with HIV/AIDS. The extent to which fatigue interference adds to the challenges faced by people with HIV/AIDS needs further research to better understand how it affects one’s relationships and work life.
The three samples in this study not only differed in diagnosis, but also differed with respect to nationality, culture, and language. Thus, the potential influence of the different social systems in Sweden, Norway, and the USA on whether and how people perceive fatigue as problematic needs to be considered, particularly since various social and health care systems place different demands on individuals living with chronic illness. These additional differences across samples might be considered a limitation, however, the cultural and language differences would be expected to increase differences across diagnostic groups, and yet in our prior psychometric studies of these three groups [
15‐
17], the results have been quite congruent, providing further evidence that the FSS-7 is functioning well in different cultural and diagnostic populations.
The DIF findings pose an interesting interpretative challenge from a test validity perspective. In general, in order to conclude that a test is not biased and provide evidence of cross-diagnostic scale validity, we would expect there to be no DIF related to diagnosis. Our findings did not support the validity of FSS-7 from this perspective. However, the broader issue here relates to whether the findings of systematic bias are the result of true clinical differences between the diagnostic groups. The above discussion about the specific DIF findings suggests that these differences may have logical and empirical support, and therefore not be a major threat to internal scale validity. Still, it is important to evaluate whether such item calibration differences will have an impact when comparing measures between diagnostic groups. The disease-specific Rasch person measures of the FSS-7 did not differ from the disease-generic Rasch person measures, and thus people are placed in relatively the same place along a continuum of fatigue interference, irrespective of whether the generated Rasch measures are disease-specific or disease-generic. Based on the findings of this study, we can therefore conclude that a generic tool of fatigue may generate unique diagnostic profiles for different target groups, and at the same time still generate individual measures of fatigue that are not overly biased by diagnosis.
People with MS, stroke, or HIV/AIDS do experience fatigue interference differently, as evidenced by the varying relative item hierarchies and the diagnostic differences in four of the FSS-7 items. Therefore, comparisons of FSS-7 scores in these populations should be performed with these issues in mind. The empirical findings from this study demonstrate that there is a parallel need to also evaluate DTF when exploring DIF. There is a balancing act between having sensitive clinical tests that identify unique clinical profiles for diagnostic groups, and at the same time allow for valid comparisons of the generated measures across different samples.
The findings of this study also raise important issues about the use of generic or specific outcome measures in health care research. Many concepts used in research are generic in nature (e.g., quality of life, fatigue, ADL ability) and not specifically constructed with a particular diagnosis in mind. By using generic measures and allowing comparisons between diagnostic groups, we can generate new knowledge that can be used across diagnoses to generate a deeper understanding of a target phenomenon such as “fatigue”, as well as evaluate potential therapeutics. However, today the trend is toward using diagnosis-specific outcome measures for generic phenomenon. Although we might assume that this development allows the test developers to generate more specific items that may match the unique profile of that specific population, there are marked similarities across these diagnosis-specific measures.
The aim of this study was to evaluate whether the FSS-7 demonstrated similar item hierarchies across people with MS, stroke, and HIV/AIDS. Given that four items did not perform in a similar hierarchy for the clinical groups, caution is warranted when comparing fatigue across diagnostic groups using the FSS-7. However, a replication of this study using different diagnostic groups is needed to validate our results on a group level. This could be performed with larger data sets and/or with item split techniques [
38]. Such studies on item hierarchies in different groups may also provide a better understanding of diagnostic profiles – if and how different groups experience a specific phenomenon. Such understanding can also be of particular importance for the evaluation of the effectiveness of targeted interventions for different diagnostic groups.
Results from the present study are also important to consider for clinical trials. The effects of a fatigue intervention should not only provide results on sum scores but should also report changes in the specific item hierarchies, particularly since an item-specific change might be the reason an intervention had the reported effect in a specific diagnostic group or failed to demonstrate the desired effect. In addition, according to the findings of this study, both an item reduction and Rasch analysis of the raw scores of the original FSS would be required to facilitate transfer of such knowledge across diagnostic groups. Earlier studies showed non-linear relationships between the item raw sum scores and Rasch-generated measures of the original FSS, which indicates the risk of either over- or under-estimating fatigue interference in people with chronic illness by continuing to use the item raw sum score of the original FSS [
15]. An additional benefit of using Rasch-generated measures is the provision of an individual precision estimate (i.e., standard error) for each client, which minimizes the risk of overestimating changes or differences on an individual level, as demonstrated in this study when exploring DTF.
Limitations
In addition to the cultural and language differences mentioned above, the main limitation of this study is that the three diagnostic groups were not matched on any potentially confounding variables which might also explain the observed group differences. There were limited clinical and socio-demographic data available for these samples, and additional detail regarding potentially confounding variables may have facilitated interpretation of the results, particularly regarding DIF. One possible strategy could have been to match the groups regarding age and gender. However, as these socio-demographic differences probably reflect true differences between these populations, this strategy might result in findings that have higher internal validity, but limited external validity. In addition, possible variations in depressive symptoms in the different groups were not taken into account as part of this study. The severity of depressive symptoms can influence how fatigue is experienced [
39] and thus, might have influenced the results. Another possible confounder which was not taken into account in the analyses is the fatigue-associated side effects of various medications. Fatigue is a common side effect of immunomodulatory medications commonly used to treat both MS [
40] and HIV [
41], and future studies should consider the potential influence of these and other medications. Although the variation in sample size between the three groups may have some impact upon their representativeness and generalizability, a sample size of at least 100 will generate relatively stable item estimates. Thus, the variation in sample sizes between the groups is likely to have minimal impact on the results. Finally, the current findings can only be generalized to the chronic illnesses included in this study and warrant exploration in other populations.
Acknowledgements
The authors wish to acknowledge the support and assistance provided by Research Fellow Linda N. Bakken and Research Assistant Gunn Pedersen and various staff members of Buskerud Hospital Trust in Drammen and Oslo University Hospital – Aker in Oslo, Norway in carrying out this research project. In addition, the authors wish to acknowledge the contributions to the study from Bradley E. Aouizerat, Traci Coggins, Skip Davis, Ryan Kelly, Yeonsu Song, Kristen Nelson, and Matthew Shullick.
The MS study was funded by grants from the Swedish Research Council (Grant: K2005-27X-14634-03A), the Swedish Association for Persons with Neurological Disabilities, and the Health Care Sciences Postgraduate School, Karolinska Institutet. The stroke study was funded by the Research Council of Norway (RCN) (Grant: 176503) and Buskerud University College, Drammen, where Dr. Hesook Suzie Kim is the project director and Drs. Grethe Eilertsen, Anners Lerdal, and Heidi Ormstad are the principal researchers. Anners Lerdal has received funding from the RCN (Grant: 19256) and the U.S. Norway Fulbright Foundation. The HIV/AIDS study was supported by a grant from the National Institute of Mental Health (NIMH, 5 R01 MH074358). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health or the National Institutes of Health. Data collection was supported by the General Clinical Research Center in the UCSF CTSA (1 UL RR024131).
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
SJ collected the data in the MS study, participated in the analyses of data and drafted the manuscript. AK analyzed and interpreted the data and contributed to the writing of the manuscript. KAL and CLG participated in the collection of data in the HIV/AIDS study, participated in the analyses of data and contributed to the writing of the manuscript. AL participated in the collection of data in the stroke study, participated in the analyses of data and contributed to the writing of the manuscript. All authors read and approved the final manuscript.