Introduction
Recent years have witnessed considerable interest in quality of life (QoL) research which spans multiple disciplines [
1‐
3]. This international interest has been impacted by people living longer, the increase in chronic conditions and rising costs of healthcare delivery [
4‐
7]. Health professionals and researchers also agree that health services, policy making, and the efficacy of treatment interventions should be evaluated by its impact on QoL [
8,
9]. These developments have resulted in a proliferation of assessment instruments [
6,
10‐
12]. The World Health Organization’s Quality of Life Questionnaire (WHOQOL-Bref) is one of the most known generic questionnaires for the assessment of QoL in both healthy and ill populations [
3,
10,
13,
14]. Over 20 years, a WHOQOL-Bref manual has facilitated around 100 culturally adapted translations of this instrument globally and completed by over 60,000 adults from both healthy and diseased populations [
15]. Validation of measurement instruments, the WHOQOL-Bref included, is an ongoing process—and accumulated evidence of validity is needed if any inferences and interpretations of instrument scores are to be supported [
16]. Although inquiries of the psychometric properties of the WHOQOL-Bref report that the validity and reliability of the scale is generally satisfactory [
10,
13,
14], some inquiries fail to support the theoretical four factor dimensionality of the WHOQOL-Bref without adding modifications to the instrument—and sometimes a poor reliability of the social and environmental domain is evident [
14,
17‐
20].
Furthermore, support of construct validity in terms of measurement invariance is reported by some studies [
21,
22], but not others [
23], and one study reported measurement invariance across gender, but not across age [
2].
Finally, normative cross-cultural data are also relatively scarce given the worldwide use of the instrument [
24‐
26]. The WHOQOL-Bref was translated for use in Norway according to WHO international guidelines [
27], but no population norms from Norway have been provided in over 15 years [
28].
Thus, in the current inquiry, we evaluate the psychometric properties of the Norwegian WHOQOL-Bref, taking advantage of a Norwegian general population study. We aim to replicate previous investigations of psychometric properties, but also extend existing research by testing for measurement invariance across age, gender, and education level, and lastly provide updated normative data for the Norwegian population.
Psychometric qualities of the WHOQOL-Bref
Construct validity refers to the ongoing process of examining the theoretical relationship between items and to the hypothesized scale [
29]. Despite the frequent use of the WHOQOL-Bref and evidence for its psychometric soundness, questions remain about whether data are well presented by the theorized four-factor structure, and whether the WHOQOL-Bref is measuring the same structure in different populations.
The results of several inquiries support an appropriate fit of a four-factor structure of Qol in general populations [
3,
25,
26,
30] and in disease populations [
18‐
20]. However, rescoring or omitting items to conform to acceptable fit indices of the four-factor model is reported [
31,
32]—and although items are found to correlate most strongly with their theoretically intended domain, the items may correlate highly across other domains as well [
14,
33]. Indeed, reports on modified versions of the four-factor structure of the WHOQOL-Bref is quite common [
32], which was also found in the earlier Norwegian population study by Hanestad and colleagues [
28]. High correlations between items across domains have also lead to questions whether the WHOQOL-Bref is best represented by one domain of overall QoL [
33]. Good fit of data to a one-factor structure is also supported by others [
30,
34].
Testing for factorial invariance of a measurement instrument is an important step in the evaluation of the scales construct validity [
35]. When an instrument operates equally, and the underlying constructs have the same theoretical structure across different groups, evidence of factorial invariance is strengthened [
35]. In a Taiwanese national survey, evidence of measurement invariance of the WHOQOL-Bref was supported, after controlling for age and gender among healthy and disease populations, between disease and matched healthy groups and across disease groups [
21]. One study, among 1972 undergraduates from nine Spanish-speaking countries, found evidence of factorial invariance of the WHOQOL-Bref across countries, even though the initial testing yielded a poor fit to the original 4 factor theoretical model [
26]. The final model showed a structure which was a different and more complex configuration from that of the original. The social domain, originally tapped by items 20 and 21, was in this new factor structure tapped by items 10, 11, 12, 19, 20, 23, 24 and 25. Other findings are not as supportive of invariance across nations; Theuns and colleagues [
23] explored whether the scale measured the same construct across Belgium and Iran and found that eleven out of 24 items had invariant factor loadings and thresholds, mainly in the physical and psychological domains.
Perera, Izadikhah, O'Connor and McIlveen [
2] explored competing latent structures in a general Australian population and investigated the retained model across gender and age. Their findings supported a two-factor solution with measurement and structural invariance across gender. A curvilinear relationship between age and the QoL domains were evident—thus the QoL dimensions might not be comparable across younger and older individuals.
Based on the above, measurement invariance of the WHOQOL-Bref is supported across some countries, healthy versus disease populations, and across gender. Invariance across age is, however, less certain.
Reliability and scaling qualities
Although most studies support the psychometric fitness of the physical and psychological health domains, several studies have reported low internal consistency of the social domain [
17,
18,
36], as was found in the older Norwegian study [
28]. The item focused on safety is also shown to have low internal consistency with the environmental domain [
33].
Ceiling effects is a well-known problem in QoL research and indicates that items/scales have poor discrimination and thus impaired sensitivity and responsiveness [
37]. In a comprehensive study with WHOQOL-Bref data from 23 countries, results indicated that the 5 items—cognitive ability, body image, information, personal relationships and access to health services—had marginally skewed distributions with few responses (< 10%) at the lower ends of the scale [
10].
Study aims
Based on data from a random sample of the Norwegian population, the primary aim was to examine construct validity and reliability of the Norwegian WHOQOL-Bref, addressed by the following research questions:
1.
Does the original four-factors model of the WHOQOL-Bref have a better fit than the one-factor model to the general Norwegian population data?
2.
Does the four-factors model of the WHOQOL-Bref reveal satisfactory construct validity in terms of dimensionality, convergent and discriminant validity, and reliability (internal consistency, floor-ceiling) in the general Norwegian population?
3.
Are the underlying dimensions of the WHOQOL-Bref stable (invariant) across gender, age and education?
A secondary aim was to generate up-to date Norwegian normative data for the WHOQOL-Bref.
Discussion
The present study was centrally concerned with examining the construct validity of the Norwegian WHOQOL-Bref, and secondary with generating new normative data for this frequently used instrument. By means of data from a random sample from the Norwegian population we tested the complete factorial invariance of item responses across gender, education and age. The results of the study demonstrate acceptable validity and internal consistency (reliability) of the scale, however, the social domain demonstrated marginal reliability. Evidence was obtained that the WHOQOL-Bref was invariant across gender and education. However, scalar invariance could not be established for age. The model fit was slightly poorer for the older age group (60–75 years) compared to the younger groups.
The current study found that the hypothesized four-factor model did not yield an adequate model fit. Subsequent CFA’s were therefore carried out to explore the sources of misfit. The current investigation is in line with several inquiries that report on a poor fit of the original four-factor model [
31‐
33,
44]. The same items are reported as problematic (i.e. low factor loadings, high error correlation, cross-loadings). Xia and colleagues [
44] reported that a correlation between the items “enjoy life” and “meaningful life” would improve the fit of their model, similar to our findings. Furthermore, several studies report on ceiling effects for some items (24 “access to health services”, 25 “satisfaction with transport”, 4 “medical treatment”, 20 “personal relationships”) [
14,
38]. In our study some of these same items were allowed to covary with each other or some other item (3 “physical pain” and 4 “medical treatment”; 5 “enjoy life” and 6 “meaningful life”; and 24 “access to health services” and 25 “satisfaction with transport”). Shared error variance and ceiling effects may both be the result of some common factor—other than the hypothesized latent domain—explaining variation in the data, thus representing a serious threat to the validity of the instrument. Items with high loadings on more than one domain are found to be more complex; for example item 8 (“safety in daily life”) is shown to have strong loadings to both the environmental domain and the psychological domain [
33]. Likewise, item 8 and item 10 (“energy”) are both more strongly associated with the psychological domain than their intended domains [
10]. When items display high loadings across several domains, this may indicate that Qol is better represented by one dimension. In diseased populations—in patients with coronary artery disease, and other populations with physical disorders and mental problems—only the one-factor solution had acceptable fit to the data [
33,
34]. We might suppose that these groups of patients have a more holistic perception of QoL. That is, it has been suggested from a conceptual standpoint, it is conceivable that people possess a holistic sense of their functioning in addition to more differentiated subjective evaluations of domain-specific health and wellness. Consequently, some people may be informed by their cross-domain experiences in addition to a more differentiated subjective evaluation of specific domains which may be more context dependent [
2]. However, the one-dimensional factor structure was not supported in our general population sample.
Despite that we found a slightly dissatisfactory four-factor solution to the original WHOQOL-Bref, a few modification (i.e. adding correlations between error variances of some items) resulted in a good fit to the four-factor model.
Although the present findings supported an acceptable fit of a modified four-factor model of QoL, the social domain displayed a marginal reliability, equal to what others have found [
13,
14,
17,
18,
28,
37,
45‐
49]. A reason for the low reliability may be the low number of items [
3] since the internal consistency tends to improve with increasing number of indicators [
50], and thus the
true reliability may be underestimated when the items are few [
51]. Despite poor reliability of the social domain, each item had medium to strong factor loadings and explained a substantial amount of variance in the latent domain. These modifications should be considered when evaluating the overall construct validity and consistency of the instrument.
The response distributions showed that data were skewed to higher scale scores on all items and domains. Both single items and the four domains showed non-normal distributions. Such ceiling effects are well documented in QoL research [
10,
14,
37], and may indicate that the range of response options is inadequate and causes poor sensitivity and responsiveness of specific items/scales [
29]. However, the environmental domain is reported to discriminate sufficiently between those living in residential and those of slum areas [
52], and thus the discriminatory power of the environmental domain may be better with people experiencing distinct differences in environmental resources, or with populations suffering permanent changes in their environmental well-being (i.e. in polluted areas or in physical disasters).
Results of a recent meta-analysis (24 studies, n = 2084) found evidence of small changes for the social and environmental domains and recommended investigating selected settings where, apriori, the social and environmental domains could be expected to respond significantly (positively or negatively) to types of events [
15]. Importantly, one of the strengths of the WHOQOL-Bref is the inclusion of an environmental domain which often is lacking in other QoL instruments. Further work should therefore consider developing more sensitive response options for the most affected items.
In the current investigation, measurement invariance was supported for both gender and education, which findings are in line with Lin, Li [
22], who reported the same results for an older Thai population.
In general, measurement invariance was supported across gender, age and education. Separate models showed a good fit for ages 18–39, but an increasingly poorer fit for age groups 40–59 and 60–75 years of age. One explanation for our findings may be that different groups may have varying linguistic interpretations of test items and category labels [
30]. A differentiated subjective evaluation among older individuals are reported among a sample of older adults with post-polio syndrome [
32]. Likewise, Liang and colleagues [
53] found three items showing Differential Item Functioning (DIF), indicating a potential bias when using the scale in different age groups. Finally, others have noted a linear effect on the environmental domain, that is, with increasing age environmental QoL increased [
2,
54]. Conceptually, it is therefore conceivable, that aged people may possess a more holistic sense of their functioning, in addition to a more differentiated subjective evaluation of specific health and QoL domains which differs from other age groups [
2]. In addition, older people are to a larger degree impacted by their cultural and environmental contexts in different ways [
55,
56]. Notably, over a decade ago, the WHOQOL assessment group, questioned whether other factors may be specifically important to older adults’ QoL which were not included in the WHOQOL-Bref. Consequently, an add-on module, known as the WHOQOL-Old Module, was developed and tested among 5566 older adults worldwide. Domains in this model included items related to sensory abilities, autonomy, past-present-future activities, social participation, death and dying and intimacy which have been found to be particularly important to older adults [
57‐
60]. The results of our study may lend theoretical justification for the use of this WHOQOL-Old module together with the WHOQOL-Bref in future studies focused on older adults.
Convergent validity of the scale was shown as scale domains were found to be significantly positively correlated with overall quality of life and satisfaction with health. Furthermore, the four domains of WHOQOL-Bref were all positively correlated with each other, and work engagement. Convergent and discriminate validity of the WHOQOL-Bref has been supported in several international studies [
10,
14,
15,
38].
Our normative data presesented in Table
8 are similar with the findings of Hanestad et al. [
28]. In addition, we extend previous research by providing normative data for gender, different groups of age and education.
In summary, the present study has yielded updated validation data for the Norwegian WHOQOL-Bref and provided population norms. Normative data is especially useful for defining a baseline to compare the QoL in different populations. Population norms are also important to interpret Qol scores in clinical settings and to further develop and provide adequate treatments and policies. On an empirical level, it seems logical to conclude that there exist scale differences in generic Qol across cultures and that Qol is affected in a complex way by a broad array of factors [
61]. Therefore, issues of invariance should not be underestimated in the performance of the scale items and domains [
62]. Future studies should continue to examine measurement equivalence among various groups, especially among aged persons across different demographics. We recommend studies of individuals older than 75 years, which was the oldest age in the present study.
The results presented here cannot directly generalize to other cross-national samples. Our response rate was only 22%. The use of postal survey data makes it difficult to assess bias and reasons for non-responses [
63]. Furthermore, the vast majority of participants appraised themselves as rather healthy which may explain the poor fit of the “medical treatment” and “health services” items on the domain of physical quality of life.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.