Original Article
The SF-36 summary scales were valid, reliable, and equivalent in a Chinese population

https://doi.org/10.1016/j.jclinepi.2004.12.008Get rights and content

Abstract

Objectives

To find out whether the SF-36 physical and mental health summary (PCS and MCS) scales are valid and equivalent in the Chinese population in Hong Kong (HK).

Study Design and Setting

The SF-36 data of a cross-sectional study on 2,410 Chinese adults randomly selected from the general population in HK were analyzed.

Results

The hypothesized two-factor structure of the physical and mental health summary scales (PCS and MCS) was replicated and the expected differences in scores between known morbidity groups were shown. The internal reliability coefficients of the PCS and MCS scales ranged from 0.85 to 0.87. The effect size differences between the U.S. standard and HK-specific PCS and MCS scores were mostly <0.5. The effect size differences in the standard PCS and MCS scores of specific groups between the U.S. and H.K. populations were all <0.5.

Conclusion

The PCS and MCS scales were applicable to the Chinese population in HK. The high level of measurement equivalence of the scales between the U.S. and H.K. populations suggests that data pooling between the two populations could be possible. To our knowledge, this is the first study to show that the SF-36 summary scales are valid and equivalent in an Asian population.

Introduction

Health-related quality of life (HRQoL), defined by Bullinger et al. [1] as “the impact of perceived health on an individual's ability to live a fulfilling life,” is becoming an important outcome measure in health services and clinical trials. The MOS 36-item Short-Form Health Survey (SF-36) is a popular HRQoL measure that has been translated and validated for Chinese adults in Hong Kong (HK) [2], [3], [4], [5]. The SF-36 has eight scales measuring eight domains of HRQoL: physical functioning (PF); role–physical (RP), or limitation in daily role functioning due to physical problems; role–emotional (RE), or limitation in daily role functioning due to emotional problems; bodily pain (BP); general health perception (GH); vitality (VT); social functioning (SF); and mental health perception (MH). Each scale consists of 2 to 10 items, and each item is rated on a two- to six-point Likert scale. The scale score is calculated by summation of all the scores of items belonging to the same scale. A profile of eight scale scores, although informative, can be difficult to interpret as an outcome measure in clinical trials [6]. Ware et al. [6], [7], [8] hypothesized that there are two principal factors, namely the physical and the mental components, underlying the eight SF-36 scales. This two-factor structure was demonstrated in the general population in the United States (U.S. standard): the physical health summary (PCS) and mental health summary (MCS) components explained 60% of the total variance of the SF-36 scale scores [6], [7], [8]. The physical component correlated strongly (r ≥ .7) with the physical functioning (PF), role–physical (RP), and bodily pain (BP) scales but weakly (r ≤ .3) with the mental health (MH), role–emotional (RE), and social functioning (SF) scales. The mental component correlated strongly with the MH, RE, and SF scales but weakly with the PF, RP, and BP scales. The general health (GH) and vitality (VT) were bipolar scales, loading moderately (.3 < r < .7) on both physical and mental components [6], [7], [8].

The PCS and MCS scales summarize the eight SF-36 scale scores into two summary scores that give an overall assessment of quality of life related to physical and mental health, respectively. The PCS and MCS scores are easier to interpret and simpler to analyze statistically in clinical trials and longitudinal studies [6], [7]. Because different SF-36 scales correlate with each of the two factors differently, they are weighted by the appropriate physical or mental factor coefficients before aggregation to form the two summary scores. Norm-based scoring with z-score transformation, calculated as (observed score – population mean)/population standard deviation, and standardization of the population mean and standard deviation (SD) to 50 and 10, respectively, are recommended for easier interpretation [6]. The SF-36 PCS and MCS scoring algorithm is summarized below:SF-36PCS=(z-score of each scale×respective physical factor coefficient)×10+50SF-36MCS=(z-score of each scale×respective mental factor coefficient)×10+50

The standard SF-36 PCS and MCS scales scoring algorithm uses the population means, SD, and factor coefficients derived from the U.S. general population [6]. A multinational study showed similar factor structures and equivalent population mean PCS and MCS scores between the United States and nine European countries [8], [9]. Ware et al. [8] recommended that the U.S. standard SF-36 PCS and MCS scales and scoring algorithm should be used in these countries, instead of country-specific approaches. Data from the Japanese general population, however, and from several Chinese populations, showed the two principal factor structure and loadings of the SF-36 scales differing from those found in the U.S. population [10], [11], [12], [13]. These studies found that the role–emotional scale loaded more strongly (r = .62–.82) on the physical than the mental component (r = .19 to .49), which was the reverse of that found in the U.S. data (physical: r = .17, mental: r = .78). The vitality scale loaded strongly (r = .79–.88) on the mental component but only weakly (r = .21–.37) on the physical component in these populations, instead of the moderate correlations with both components found in the U.S. data (physical: r = .47, mental: r = .64). This raised a concern of whether the standard PCS and MCS scales are applicable to Asian populations, whose cultures may differ more than the European cultures from that of the United States.

Our objective was to find out whether the SF-36 PCS and MCS scales are valid, reliable, and equivalent for the H.K. Chinese adult population. We also wanted to find out whether a HK-specific scoring algorithm using factor coefficients derived from the H.K. general population would give results equivalent to those of the standard algorithm. Evidence on validity and reliability would support the use of the SF-36 PCS and MCS scales in HK. Equivalence in results between the U.S. and H.K. Chinese populations implies that the standard SF-36 PCS and MCS scales can be used as a cross-cultural HRQoL measure in international studies and global drug trials [14].

Section snippets

Methods

Data of 2,410 Chinese adults randomly selected from the general population in HK that were collected in a cross-sectional norming study of the Chinese (Hong Kong) SF-36 Health Survey in 1998 were used. The detailed sampling and data collection methods have been described elsewhere [3], [5]. The sociodemographic characteristics of the subjects are compared to those of the H.K. general adult population in Table 1.

The data were tested against the following hypotheses.

  • 1.

    Two principal component

The Hong Kong–specific SF-36 PCS and MCS scales

Two principal component factors were extracted from the eight SF-36 scale scores and the eigenvalues were 3.4968 and 1.1118 for the first two components, respectively. The two principal factor structure and factor loadings, after varimax rotation, of the SF-36 scale scores of the H.K. Chinese adult population are given in Table 2. The physical (first) component correlated more strongly with the physical functioning (PF), role–physical (RP), bodily pain (BP), and general health (GH) than with

Construct validity and reliability of the SF-36 physical and mental health summary scales

The hypothesized two principal factor structure of the SF-36 scales was replicated in the general Chinese population in HK, and the factor loadings were similar to those found in the U.S. population [6], [7]. The physical factor loading in the general health (GH) scale was relatively stronger than hypothesized, but similar to that found in the U.S. population. This confirms the construct validity of the internal factor structure of the SF-36 PCS and MCS scales for the H.K. Chinese population.

Conclusions

The hypothesized two-factor structure of the SF-36 scales was replicated from the SF-36 data of the H.K. Chinese general population, and the two factors explained 57.6% of the total variance of the SF-36 scale scores and 63%–88% of the reliable variance of each scale. The SF-36 PCS and MCS scores showed the expected difference between known chronic disease groups, further supporting their construct validity.

The mean standard PCS and MCS scores of the H.K. Chinese general population differed

Acknowledgments

The general population norming survey of the Chinese (Hong Kong) SF-36 was funded by the Health Services Research Grant, the Government of Hong Kong SAR (HSRC no. 711026). Thanks go to thank Alex Chan, Willis Ho, Joanna Shing, Ka-Lai Chan, Wai-Hung Yu, June Chan, Chi-Kwan Wong, Wing-Yee Lai, Yick-Lok Chan and Hing-Wai Tsang, for their help in data collection and analysis. Parts of this work have been submitted to the University of Hong Kong toward the award of the Doctor of Medicine degree

References (27)

  • J.L. Fuh et al.

    Psychometric evaluation of a Chinese (Taiwanese) version of the SF-36 Health Survey amongst middle-aged women from a rural community

    Qual Life Res

    (2000)
  • J. Thumboo et al.

    A community-based study of scaling assumptions and construct validity of the English (UK) and Chinese (HK) SF-36 in Singapore

    Qual Life Res

    (2001)
  • X.S. Ren et al.

    Psychometric and clinical evaluation of a Chinese version of the SF-36 Health Survey among cancer patients in China

    Qual Life Newsl

    (2003)
  • Cited by (240)

    View all citing articles on Scopus
    View full text