The literature search identified 2540 articles which were narrowed to 26 [
9‐
14,
16‐
35] (Figure
1). After reviewing the full texts, 14 studies were further excluded for various reasons: 6 studies used qualitative methodologies [
9‐
14]; 2 studies measured only one single dimension of HRQL [
16,
17]; 1 study [
18] used the Short-Form 36 (SF-36) but the response options of SF-36 were modified to 3 levels (i.e., the same as before, better, and worse) without providing validation data; 1 study [
19] used one single question from a structured instrument; 1 study was a duplicate and the earlier version was excluded [
20,
21]; 1 study [
22] used a generic instrument, the General Quality of Life Interview (GQOLI-74), however, no relevant references were provided to track the origin and the psychometric properties of this instrument; 2 articles [
23,
24] were published from the same study, and therefore only included as one study for the review; another 2 articles, Marra et. al. [
25] and Guo et. al. [
26], reported longitudinal and cross-sectional results from one same study respectively, and thus only one study was counted for the review. Therefore, a total of 12 original studies were included in this review [
21,
23,
25,
27‐
35] and an overview is presented in Additional file
1.
Of the 12 included studies, one was published in 1998 [
27] and the remaining 11 were published after 2001 [
21,
23,
25,
28‐
35]. Nine studies were published in English and 3 in Chinese [
27,
29,
33]. The included studies were carried out within different countries: 3 in China [
27‐
29]; 1 in both China and southern Thailand [
33]; 2 in India [
21,
35]; 2 in Turkey [
30,
31]; 2 in Canada [
23‐
26]; and 2 in the USA [
32,
34]. Seven of the included studies were cross-sectional [
27,
29‐
31,
33‐
35] and 4 were prospective cohort studies [
21,
23,
25,
28]. The remaining one study was a randomized controlled trial (RCT) [
32], but only baseline HRQL assessment data was reported in the published article. Among the 12 studies, three studies included a comparison group either from the general population [
28] or from a "healthy" non-TB sample [
27,
29]; one study used the normative data from the Canadian population as the reference group [
23,
24]; two studies included people with LTBI as controls [
25,
34]; one study compared TB patients with a group of chronic obstructive pulmonary disease (COPD) patients [
31]; and the remaining 5 studies did not include proper comparison groups. Sample size (i.e., number of subjects included in the statistical analysis) varied among the 12 studies, from 46 to 436. Only one study [
23] reported how the sample size was estimated statistically. A wide range of TB patients were included in this review: pulmonary TB and extra-pulmonary TB, active TB disease and LTBI, and current TB and previously treated TB.
Psychometric properties of HRQL instruments in tuberculosis
The SF-36 was used in 6 studies, and overall it showed acceptable validity and reliability. Chamla [
28] validated the Chinese version SF-36 among active pulmonary TB patients and the general population in China. The reliability was tested by Cronbach's α, ranging form 0.88 to 0.97 for the eight SF-36 subscales. All 36 questions of the SF-36 had internal item consistency coefficients between 0.56 and 0.86. In Dion et. al. [
23,
24], the reliability of SF-36 was evaluated among a mixture of TB patients, including 25 with LTBI, 17 with active TB on treatment, and 8 with previously treated TB. The internal consistency of the SF-36 responses was strong, with coefficients of 0.86–0.92 for the two summary scores and 0.73–0.94 for the subscale scores. The test-retest reliability (2-week interval) of SF-36 was tested by calculating Intraclass Correlation (ICC) coefficients: 0.66–0.79 for the two SF-36 summary scores. He et. al. [
33] also reported good reliability of the Chinese version SF-36 (Cronbach' α > 0.7) among the two groups of TB patients from China and Thailand.
Validity of the SF-36 was evaluated by examining the correlations between SF-36 outcomes with other external variables, including clinical criteria, responses from other HRQL measures, and physician's evaluations. It was reported that SF-36 scores were able to discriminate between TB patients with different severity levels [
21,
26] and between patients at different stages of treatment (i.e., the start, middle, and end of the treatment) [
21,
25,
28]. In Guo et. al. [
26], the correlations between SF-36 summary scores (PCS and MCS) and four utility instruments (SF-6D, HUI-2, HUI-3, and VAS) were tested by calculating Spearman's coefficients. SF-6D scores were strongly correlated with both PCS and MCS (0.79, 0.80), and HUI-2, HUI-3, and VAS scores were more strongly correlated with PCS (0.59, 0.66, and 0.67) than with MCS (0.37, 0.48, and 0.59). Similarly, in the study by Dion et. al. [
23,
24], SF-36 scores were observed moderately correlated with EQ-5D and VAS scores, but poorly correlated with SG scores (Pearson coefficients < 0.2). Wang et. al. [
27] reported that patient-reported SF-36 scores were well correlated with physician proxy-reported Quality of Life Index (QLI) and Karnofsky Performance Status (KPS) scores, with correlation coefficients of 0.78 and 0.89 respectively. However, it was not reported which type of correlation coefficient was calculated.
The structural validity of SF-36 was tested in two studies, but the results were not consistent. In Chamala [
28], factor analysis was applied to evaluate the 2-dimensional model of the SF-36. Two factors (physical health and mental health) were extracted and subjected to orthogonal rotation using the Varimax method. The observed pattern of correlations between the 8 subscales and the 2 factors supported the authors' prior hypothesis. For example, it was reported that the 4 physical subscales (PF, RP, BP, and GH) were correlated strongly with the physical health factor, but only poorly correlated with the mental health factor. On the other hand, the 4 mental subscales (MH, RE, SF, and VT) were strongly correlated with the mental health factor, but not the physical factor. He et. al. [
33] used principle component analysis to test the structural validity of SF-36. However, the results showed that the 8 subscales were not well independent, and there were overlapping items between different subscales. For example, RE and RP subscales were both strongly correlated among the two groups of patients (correlation coefficient 0.82 and 0.77). Based on their findings, the authors concluded that the SF-36 did not show satisfactory construct validity in the studied TB patients.
The application of SF-36 among TB patients also revealed some problems. In the study by Dion et. al. [
23,
24], SF-36 subscales demonstrated a remarkable ceiling effect problem. Over 50% participants with concurrent or previous TB reported the highest scores for 5 of SF-36 subscales (PF, RP, RE, BP, and SF).
Ceiling and floor effects are a common problem for the application of health utility instruments in TB. In Dion et. al. [
23,
24], 42–53% participants reported the best possible EQ-5D health state. Guo et. al. also observed ceiling and/or floor effect problems with three commonly used health utility instruments. HUI-2 and HUI-3 suffered from a serious ceiling effect problem, both in global score and single dimension level. For example, 25% of active TB patients scored 1.0 (perfect health) using the HUI-2 and 98% of them reported the best level of hearing for HUI-3. SF-6D, on the other hand, was primarily limited by its narrow range of available utility values, from 0.30 to 1.0. Health states at the lower end may not be adequately represented by the SF-6D. Despite these problems with the application among TB patients, some positive aspects of these utility instruments were also observed. For example, these utility instruments showed moderate to strong correlations with the SF-36 responses as stated before [
23,
24,
26]. Guo et. al. [
26] also reported moderate to strong agreement among SF-6D, HUI-2, HUI-3, and VAS, using ICC: the overall ICC coefficient among these 4 instruments was 0.65 and paired ICC coefficients ranged from 0.53 to 0.67. In addition, these four utility instruments were all able to discriminate between TB patients with different severity levels.
Pasipanodya et. al. [
34] administered the lung disease-specific SGRQ among people with treated pulmonary TB disease or LTBI. Test-retest reliability of the SGRQ was examined by ICC coefficients, 0.93 for the total score and 0.83–0.91 for subscale scores. Internal consistency was tested by Cronbach's α, at 0.93. To evaluate its validity, SGRQ responses were correlated with a previously validated MOS core questionnaire and a couple of clinical pulmonary function tests, such as the forced vital capacity (FVC). Overall, SGRQ scores and MOS scores agreed on similar health constructs and diverged on dissimilar constructs. Low but significant correlations were observed between SGRQ scores and pulmonary function test results (-0.12 to -0.29, p < 0.05). On the other hand, a ceiling effect problem for SGRQ was observed. In both treated pulmonary TB patients and people with LTBI, the distribution of SGRQ scores was skewed toward higher HRQL. In addition, considering varied levels of reading and understanding in English in respondents, different language versions of SGRQ were used, but the potential impact of combining results from these on HRQL outcomes was not known.
Dhingra and Rajpal [
21] applied the new TB-specific instrument, DR-12, among TB patients under directly observed therapy (DOT). It was reported that, at the beginning of treatment, DR-12 scores demonstrated significant differences between pulmonary and extra-pulmonary TB patients, and between sputum positive and sputum negative patients. Over the treatment period, higher DR-12 score gains were observed among patients who positively responded to the treatment compared to those who did not. Based on these evidences, the authors came to the conclusion that DR-12 had strong construct validity in the studied population. However, the clinical criteria or indicators were not well defined in the published work. All comparisons were performed by using paired or unpaired t-tests. Potential confounders such as socio-demographic and clinical variables were not controlled in the final data analysis.
Impact of tuberculosis on HRQL
Overall, active TB disease had significant and encompassing impacts on patients' HRQL. Using the SF-36, Chamla [
28] found that, compared to the general population, people with active TB disease scored significantly lower on PF, RP, GH, BP, and VT (p < 0.05), but no significant differences were observed on RE, SF, and MH subscales (p > 0.05). In general, physical health subscales were more affected than mental ones. Dion et. al. [
23,
24] also found active TB patients scored significantly lower in SF-36 PCS scores, but not in MCS scores, when compared to people with LTBI and those with previously treated TB disease. In terms of health utility outcomes, Dion et. al. found that active TB patients scored significantly lower in VAS (median 92.5 VS. 97.5, p = 0.02) and SG (median 80.0 VS. 90.0, p = 0.002) than others at the baseline assessment. However, no significant difference was observed in EQ-5D scores between active TB patients and others. It is likely that the small sample size and the heterogeneous composition of subjects could have prevented the authors from detecting the small but important differences in the sample. Wang et. al. [
27] found that active TB patients reported lower scores (p < 0.01) across all SF-36 subscales than healthy non-TB people, with RP and RE being most affected. Marra et. al. and Guo et. al. [
25,
26] found that, compared to those with LTBI, people with active TB scored significantly lower at all SF-36 subscales, SF-6D, HUI-2, HUI-3, and VAS. In contrast, SF-36 scores among people with LTBI before the preventative therapy were very similar to the U.S. norm references.
In the study by Marra et. al. [
25], Beck-DI scores showed substantial impairment on mental well-being in active TB patients, compared to people with LTBI. However, many aspects of the Beck-DI (such as fatigue) can also be symptoms of TB and might not be necessarily indicative of mental health impairments. Aydin and Ulusahin [
31] compared TB patients to COPD patients and found that TB patients had a lower prevalence of depression and anxiety and a lower level of disability, suggested by GHQ-12 and BDQ scores. The authors postulated that the chronic duration of COPD and the older age of the COPD patients may result in a higher prevalence of psychological impairments. Within TB patients, multi-drug resistant TB patients reported the worst disability level, according to BDQ outcomes. Yang et. al. [
29] found that pulmonary TB patients reported more psychological symptoms listed in the SCL-90 and a lower degree of social support using SSRS compared to healthy controls. However, SCL-90 scores did not show significant correlation with SSRS scores, which is not consistent with the established relationship between social support and health [
46], as discussed by the authors.
The impaired HRQL experienced by TB patients may be a reflection of socio-demographic status (e.g., age, gender, and socio-economic status) and other underlying co-morbid conditions, besides TB and its treatment. A few included studies explored the relationship between socio-demographic features and clinical factors and HRQL in TB patients. In general, the findings were consistent, but some discrepancies existed. Yang et. al. [
29] and Nyamatihi et. al. [
32] observed that females were more likely to report poorer health than males, especially on mental health problems, such as depression and anxiety. Chamla [
28] and Guo et. al. [
26] found older people tended to have poorer HRQL than younger ones. But Duyan et. al. [
30] did not find significant associations between gender, age and HRQL in TB patients. On the other hand, they [
30] found that better HRQL was correlated with higher income, higher education, better housing conditions, better social security, and closer relationships with family members and friends. Some clinical factors that were observed to correlate with poorer HRQL in TB patients include size of pulmonary TB infection, duration of TB disease, reactivation of previous TB infection, number of symptoms before treatment, development of hemoptysis, hospitalization, underlying chronic conditions, anemia, and count of white blood cells before treatment [
27,
28].
Effect of anti-tuberculosis treatment on HRQL
Chamla [
28], Dhingra and Rajpal [
21], and Marra et. al. [
25] prospectively measured active TB patients' HRQL at the start, middle, and end of treatment. In the study by Chamla [
28], after the anti-TB treatment, significant improvement was observed in all physical health subscales of the SF-36 (PF, RP, BP, and GH, p < 0.05); two mental health subscales, RE and SF (p < 0.05), improved significantly, but not VT and MH (p > 0.05). During the treatment, RP, VT and MH scores decreased after the initial 2 months and but showed overall improvement at the end of the treatment, while all other subscale scores showed gradual increase over the treatment [
28]. Dhingra and Rajpal [
21] observed a gradual improvement on DR-12 scores in active TB patients over the course of the treatment. Overall, a more identifiable improvement was observed in symptom scores than that in socio-psychological and exercise adaptation scores. Consistently, Marra et. al. [
25] also found significant HRQL improvement in active TB patients over the 6 months of treatment, using SF-36 and Beck-DI.
Although anti-TB treatment improved HRQL overall, active TB patients still had poorer HRQL at the end of the treatment compared to the general population or people with LTBI, especially in psychological well-being and social functioning. Chamla [
28] observed that, at the end of the treatment, active TB patients still scored significantly lower at RP, VT, and MH subscales compared to general population comparisons. Marra et. al. [
25] found that, after the 6 month of treatment, active TB patients scored significantly lower at SF-36 PCS and MCS summary scores compared to people with LTBI. An interesting finding by Marra et. al. [
25] is that, after the preventive treatment, MCS scores among people with LTBI decreased significantly, while PCS scores remained unchanged. Pasipanodya et. al. [
34] measured HRQL among pulmonary TB patients who completed at least 20 weeks of treatment, using the SGRQ. Compared with those with LTBI, treated TB patients had lower SGRQ scores. Those with better lung functions and/or born in the U.S. (against foreign-born) tended to have better HRQL outcomes. No gender difference was observed in SGRQ scores.
Muniyandi et. al. [
35] assessed the HRQL in a sample of previous TB patients one year after successful completion of treatment. 40% of these people reported persistent symptoms, such as breathlessness, cough, chest pain, and occasional fever. The authors calculated three SF-36 component scores: the physical well-being, mental well-being, and social well-being. Based on their results, there was no gender difference on physical well-being score; but females scored much lower at mental and social well-being scores. Compared with younger people, older ones had significantly lower physical and mental well-being scores, but not the social score. They also presented the U.S. general population norms for the three component scores and concluded that TB patients' HRQL returned to normal level one year after the completion of treatment. However, the way of calculating the three SF-36 component scores is not commonly seen in literatures, and the reference regarding the U.S. general population norms provided in the published paper cannot be located.