Background
Several large cohort studies have shown that long-term survivors of childhood cancer are at high risk of developing serious health problems [
1,
2] and this risk increases with time [
1]. Interestingly, self-reported HRQoL or quality of life (QoL) among long-term survivors has been shown to be almost equal to or higher, than that of controls [
3‐
5]. Survival rates have improved remarkably over recent decades, and survival probability at ten years among those diagnosed with cancer in childhood, is approximately 75% [
6]. This means that society has a growing population of long-term childhood cancer survivors, and a significant proportion of them have chronic health conditions. It is of great importance to follow HRQoL among survivors, particularly since there seems to be a discrepancy between clinical health outcomes and the self-reported HRQoL.
In a European collaboration project, researchers have developed the KIDSCREEN instruments, which are designed for the assessment of HRQoL in both chronically ill and healthy children and adolescents, aged 8–18 years [
7]. HRQoL is described as a multidimensional concept, elucidating respondents’ own views regarding their health state, and should include aspects of physical, mental and social health [
8]. The developmental process, which included literature reviews, expert consultation, and focus groups with children and adolescents as well as their families in the 13 participating European countries, resulted in three versions of the instrument [
7]. The three versions differ in length and included dimensions. KIDSCREEN-52 provides detailed information within ten HRQoL dimensions, KIDSCREEN-27 is a shorter version of KIDSCREEN-52 in which the ten dimensions are summarised into five dimensions. Finally, KIDSCREEN-10 was developed from the 27-version and provides one global HRQoL-score [
7]. Determination of the degree of accordance between corresponding dimensions in the 27-version and the longer 52-version have shown coefficients ranging from r = 0.63 to r = 0.96 [
9].
Internal consistency, as measured by Cronbach’s Alpha, has shown acceptable results for the KIDSCREEN-27 [
10,
11] and KIDSCREEN-52 [
12,
13], as has test-retest reliability, for both versions [
9,
13]. Regarding construct validity, investigations of convergent validity, measured by correlations between the KIDSCREEN dimensions and other HRQoL measures assessing similar aspects, have shown moderate to high correlation coefficients for both the −27 and the −52 versions [
9,
12,
13]. Furthermore, confirmatory factor analysis has shown that most dimensions fit data well for both the −27 [
11] and the −52 versions [
12‐
14]. Analyses of outcomes in relation to socioeconomic status and health problems have shown socioeconomic status to have a positive association with most of the dimensions for the −27 version [
9] and for all in the −52 version [
13]. Additionally, statistically significant differences have been found within all dimensions, in both versions, between children with and without physical and mental health problems, whereby those with health problems showed lower mean values compared to those without health problems [
9,
13].
Aspects of the Rasch model have been used in a few studies [
11,
13‐
15]. The results have generally been promising regarding the KIDSCREEN instruments, both from a developmental point of view as well as regarding usage among children and adolescents, both healthy and with cerebral palsy [
14]. However, as evidence of validity of an instrument is sample dependent it is of great importance to perform more in-depth validity studies with different target groups, e.g. childhood cancer survivors in this study, as specific psychometric issues in certain groups may not be detected in large population studies. To our knowledge, some studies have been published regarding the clinical usage of KIDSCREEN in children with cancer or tumour experience [
16‐
19], but so far no results have provided evidence of the validity of the KIDSCREEN measures in relation to children and adolescents with cancer experience.
There is a growing population of children and adolescents that have survived their cancer diagnosis. Therefore, it is of great importance to perform follow-up studies with relevant, valid, and sensitive measures in order to make comparisons among children and adolescents by subgroups (sex, age, diagnoses). It is of interest both to follow changes over time and to compare results from childhood cancer survivors with those from persons who have not experienced cancer, to fully understand the impact and complexity of childhood cancer in regard to different aspects of quality of life. Furthermore, it is of value to find a reliable instrument to be able to use as a screening tool for identifying those survivors in need of extra support. Even though the KIDSCREEN instruments have been psychometrically tested, using classical test theory and to some extent also Rasch, it’s robustness among survivors of childhood cancer has not been investigated, which could be of importance due to a growing number of survivors in society. Do the actual data patterns support the assumption of an underlying construct from an item as well as a person perspective? Taking the above factors into account, the aim of this study was to evaluate the psychometric properties of the five dimensions in KIDCSREEN-27 for use in survivors of childhood cancer. The specific research questions were:
1.
What are the psychometric properties of the different rating scales used in KIDSCREEN-27?
2.
Is there satisfactory evidence of internal scale validity and person response validity in the generated KIDSCREEN-27 measures?
3.
Is there evidence supporting unidimensional underlying constructs within the different dimensions?
4.
Do the items in KIDSCREEN-27 function in the same way, indicated by no presence of differential item functioning (DIF), among childhood cancer survivors compared to a comparison group?
Discussion
The aim of this study was to evaluate the psychometric properties of KIDSCREEN-27 with a Rasch analysis in a national cohort of childhood cancer survivors. Overall, the results were satisfactory, with acceptable item goodness-of-fit in 23 of 27 items, acceptable unidimensionality for four of the five dimensions, and acceptable person goodness-of-fit in four of the five dimensions. No uniform DIF was detected between the childhood cancer survivors and the comparison group. With regard to the growing number of survivors in society it is of importance to find a instrument to be able to use as a screening tool at follow up visits and this Rasch analysis could be the first step towards choosing an appropriate instrument. However, given the relatively small sample size (N = 63) the results presented in this paper must be applied with some caution even if it has been suggested that the Rasch model can be used to perform exploratory work with small samples. Based on the results from the Rasch analysis of KIDSCREEN-27 we recommend the instrument to be used among populations of childhood cancer survivors with similar age ranges. Thirty items administered to 30 individuals should have the ability to deliver statistically stable measures, given reasonable targeting and fit [
29].
The response categories and threshold disordering that were found were based on a small number of responses/scores, and therefore the number of observations in each rating scale category did not always meet the criterion suggested by Linacre [
24]. Taking action (e.g., by collapsing response categories) based upon very few unexpected responses in a small sample may also be inappropriate. If the response categories had been collapsed within this study, it would probably have contributed to an even lower number of misfits, and thus to an improvement of internal scale validity and person response validity. As a small sample may limit the inferences of the fit statistics the findings presented here may actually be underestimating the psychometric performance of KIDSCREEN-27 in a sample of childhood cancer survivors.
Item goodness of fit revealed that 23 of 27 items fitted the model. Three of the items: “Have you been able to run well?” (1.60); “Have your parent(s) treated you fairly?” (1.62); “Have you been able to rely on your friends?” (1.51) displayed underfit to the model, i.e. too much variation in the data, compared to expectations from the Rasch model [
21]. These items were all outside the critical range for rating scales (0.6-1.4) but when comparing them to the range for clinical observations (0.5-1.7) [
21] all items fitted within the range. It should also be noted that a high proportion (16%) of respondents did not answer one of the items (“Have you been able to run well”). Most of these participants had of different reasons not run the previous week. As the content of these items is relevant for cancer survivors [
30,
31] we chose not to omit them from the scale, an approach that previously has been used in scale evaluation [
32]. It has been stated in the literature that the guidelines regarding fit statistics are supposed to help in detecting problems with items; not just with the decision on which items should be excluded from a test [
21]. However, as our criterion was set that no item would display unacceptable goodness-of-fit, the findings in relation to scale validity were mixed. Considering that the sample is fairly small, and previous studies have shown reasonable item fit for both KIDSCREEN-27 [
11] and KIDSCREEN-52 [
13,
14], we need to verify whether these findings are stable with larger samples of cancer survivors, or if they are due to individual variations in this limited dataset. As none of the items did display DIF, when compared to the comparison group, the interpretation of fit statistics is not seen as a major threat to validity, but more a concern to monitor in further studies since the findings do indicate that some individuals score these items differently than expected based on the overall pattern found in the sample. The item “Have you felt fit and well?” showed overfit (less variation) which can indicate redundancy or similar ratings across all participants. As low
MnSq values may not be a major threat to validity, this item may be of less concern when KIDSCREEN-27 is validated within this sample.
Regarding unidimensionality, the results revealed that the underlying constructs were measured to an acceptable extent, except for the Autonomy & Parent Relation dimension, which showed indications of multidimensionality. Therefore, this dimension is recommended to be further tested among childhood cancer survivors. The possible weakness may have been because this dimension represents a merge of three separate dimensions in the 52-version: Autonomy, Parent Relations & Home Life, and Financial Resources [
11]. In contrast, Robitail et al. [
11] showed that all five dimensions in the 27-version, for the whole sample (n = 22827), were unidimensional, with regard to infit statistics. They also performed a confirmatory factor analysis that showed acceptable fit to the model. However, an exploratory factor analysis showed that a few items loaded similarly to more than one dimension. Additional analysis, such as PCA of residuals, to measure unidimensionality, was not performed in that study [
11]. In the present study the variance explained by the secondary dimension (1st contrast) also showed higher values than the recommended 5% in all dimensions, which can be explained by the fact that there are relatively few items within each dimension in KIDSCREEN-27. The concept of HRQoL has many different aspects [
33] and they should measure distinct parts of the concept but still be considered to be interrelated with each other. Qualitative interviews were conducted with the same sample [
34], previous to the collection of the questionnaire based data, which revealed results supporting content validity of the KIDSCREEN-27 among childhood cancer survivors.
Person goodness of fit revealed that one dimension (Psychological Well-being) displayed a value above 5% (Table
2). As the number of participants that did not demonstrate acceptable goodness of fit was small, there was no possibility of carrying out more in-depth analyses on subgroup level in this study. On an individual level, no clear pattern was found among the participants that did not demonstrate acceptable goodness of fit; three females and two males, age ranged from 13 to 22 with different diagnoses represented. Future studies with larger sample size would allow for more in-depth explorations, and also for monitoring associations between item and participant misfit. A limited number of responses due to a small sample will also impact on the precision of the item calibration measures. Larger samples will therefore allow for more precise analyses providing evidence of scale validity (e.g., collapsing response categories and exploring residual correlations).
According to the person item map the most challenging dimension was Physical Well-being. The most challenging item was “Was physically active?” and the least challenging item was “Able to talk to parent(s) when wanted to?” It is not surprising that Physical Well-being was the most challenging dimension, since this aspect of HRQoL is the one where impairments and difficulties are expected for the survivors, related to complications because of diagnosis and treatment.
According to the results of the DIF analyses, the items do not appear to work differently for survivors of childhood cancer compared to young people of the same age without a cancer experience. To our knowledge, one previous study has provided results of DIF for KIDSCREEN-27, across different European countries [
11], but no study has provided DIF between the sexes, age groups or health status. Regarding KIDSCREEN-52, previous results have shown that none of the items displayed any sizeable DIF by age groups (8–11 vs. 12–18 years), sex or health status [
13]. However, in a study comparing children with or without cerebral palsy (CP) some items showed statistically significant DIF; however, this was more frequently seen in the proxy version of the instrument [
14]. Based on our findings further validation studies are suggested to explore unique diagnostic profiles in HRQoL, even though this study did not indicate such profiles in relation to survivors after childhood cancer.
An important strength of the present study is that a unique and representative (for five years of survival) national cohort of childhood cancer survivors in Sweden is being followed from 2004 and onwards, with several data collection occasions. However, there are some limitations to the present study that should be mentioned. Firstly, the small sample of survivors of childhood cancer limits the possibility of drawing firm conclusions regarding the robustness of the instrument. Because of the relatively small groups, more sophisticated analyses regarding DIF [
22], e.g. for different specific diagnoses, could not be performed. Secondly, as time since diagnosis was relatively short, conclusions regarding the instrument’s performance cannot be drawn for the entire follow-up period after diagnosis. Continued evaluation of the instrument’s psychometric performance in a long-term perspective is recommended, especially as health problems are known to increase over time [
1]. Larger cohort studies in a European context would be of value in order to achieve a higher power and also to monitor item and person response validity in more detail. Some participants exceeded the recommended age limits for the instrument of 18 years but no uncertainties were expressed among those older than 18 years when responding to the items.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
AJ collected and analysed the data and drafted the manuscript. AK participated in interpreting the data, and helped to draft and revised the manuscript. LW conceived and designed the study, interpreted the data, helped to draft and revised the manuscript. All authors read and approved the final manuscript.