Background
Cerebral palsy (CP) is a permanent and nonprogressive developmental disability. Despite medical treatment and rehabilitation, various motor limitations associated with CP may reduce functionality and affect skills required for the performance of activities of daily living (ADLs) [
1]. Premature mortality in children with CP is rapidly decreasing, and most of them survive until adulthood [
2,
3]. The acquisition of high-level information about the functional status of children with CP has progressively become imperative [
4,
5]. However, information on health-related quality of life (HRQL) in children with CP, which could provide critical perspectives in preparation for the future of these children, remains lacking [
6].
Unlike in the past, the evaluation of children with CP focuses on activity level measurement to examine the effect of health-care interventions on their physical functioning in the home, school, and community settings [
7,
8]. The International Classification of Functioning (ICF) has provided a framework for the collection of data on aspects of activity limitation and impairment, urging the exploration of the correlation between activity limitation and impairment. The ICF defines activity as the execution of any specific task by an individual [
9]. When assessing disability, all relevant circumstances must be taken into account, and the extent to which individuals with disabilities can perform essential functions or major life activities should be measured.
Recently, evaluation has included the role of childhood or how children with disabilities feel in the course of solving obstacles that they face [
10‐
12]. HRQL, a concept pertaining to aspects of life quality that are directly associated with health status, has been assessed [
13,
14]. A HRQL assessment inventory had been developed for the past 10 years, with some general scales of HRQL having already been applied to children with disabilities and used for the evaluation of physical and psychological damage [
15‐
18]. Various assessment inventories have been adapted in Korea, including the Korean version of EuroQol-5 Dimensions, which is designed to assess health status with respect to five areas of HRQL, namely mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The indices for each area are substituted with the assessment index equation to calculate the subjective HRQL indices [
19,
20]. The Korean version of the 12-item Short Form Health Survey developed for the Medical Outcomes Study is also a tool used to measure HRQL and comprises physical and mental component summaries (12 items in total), with a higher score indicating a higher HRQL level. Moreover, the entire inventory was reported to be reliable (Cronbach’s α = 0.810) [
21]. The Korean version of the World Health Organization (WHO) Quality of Life Scale, an abbreviated modified version of the WHO Quality of Life Assessment Instrument-100 translated into Korean [
22], is a 5-point scale inventory consisting of four domains (physical health, psychological, social, and environmental) for each of 24 facets related to quality of life. A response of “never” and “always” corresponds to a score of 1 point (lowest score) and 5 points (highest score), respectively.
These assessment tools have been used to evaluate HRQL in children with different types of disabilities. Based on the results, various plans have been proposed, and assistance has been recommended and provided. However, general scales for HRQL do not directly address functionality or ADL-related concepts, with only few tools among several scales being suitable for children with CP [
23‐
25]. In particular, there remains a lack of information about HRQL in children with CP in Korea, possibly owing to the absence of feasible tools for HRQL measurement.
Recently, the Childhood Health Assessment Questionnaire (CHAQ), a tool specially developed for the assessment of functional capacity and independence in everyday life, has been utilized in children with CP. The CHAQ is a validated questionnaire comprising specific items used to evaluate juvenile idiopathic arthritis in children and adolescents [
26] and has been considerably applied to patients with current mobility restrictions due to other chronic diseases such as pediatric spondyloarthropathies, spina bifida, joint hypermobility syndrome, and systemic lupus erythematosus [
27‐
31]. The CHAQ has already been translated into several languages and used in many countries [
32]. In Korea, the CHAQ was adapted by Park [
33]; since then, its usefulness for the health-related assessment of children with CP has been reported.
The rapidly growing development of comprehensive question items to measure functional health status and quality of life in children has left a task of whether measurement of general aspects or specific conditions should be considered in selecting a tool [
34‐
36]. The tool should provide appropriate information, and its psychometric properties could be measured to check its validity. Further, the tool can be selected only when it is practical, reliable, and appropriate and is able to measure change or sensitive to the change [
37].
The classical method of scale verification is to confirm construct validity using factor analysis. However, determining the construct validity of the scale by factor analysis is limited because it is not a confirmation at the level of question [
38]. The scale verified by factor analysis is occasionally adapted in other cultural regions in the course of its utilization and is applied to other groups with different characteristics from the respondent group participating in scale development. As scales are diversely utilized, an argument emerges from a study on scale development that factor analysis itself cannot accurately evince validity [
39]. Therefore, in order to accurately estimate the fitness and difficulty of items derived from factor analysis, attempts to verify them using various statistical methods are required.
Among these attempts, the Rasch model is one of the item response theory models increasingly used as an appropriate research method for the assessment of the appropriateness of item fitness and difficulty [
40]. When measuring the ability of a subject using traditional methods, the same subject will attain a higher and lower score if administered with a lower and higher level of test, respectively. In other words, in traditional methods, the characteristics of children with CP could affect ability measurement, possibly influencing the validity analysis of measurement tools. As the psychometric properties of an instrument can vary among different population groups and can be particularly affected by the cultural context, a systematic assessment of psychometric properties is imperative before an instrument can become extensively used within a specific patient population [
41].
Therefore, in order to evaluate the psychometric properties of the CHAQ for assessing HRQL in children with CP in Korea, it is necessary to use data obtained from Korean children with CP. An item response theory-based analysis could be utilized to scrutinize question items using the item characteristic curve unique for each item, examine the difficulty and discrimination power of each item, and estimate the real ability of the subject based on analysis results. In addition, the use of the Rasch model has an advantage in that item characteristic curve estimation is not affected by the characteristics of subject groups [
42]. Although the suitability of the Korean version of the CHAQ as a tool based on the classical test theory has already been confirmed in a validity testing study [
33], attempting to verify the nature of question items by applying the Rasch model based on the item response theory remains essential to accurately evaluate item fitness and difficulty.
Therefore, this study aimed to identify the psychometric properties of the Korean version of the CHAQ in children with CP by applying the Rasch model. In view of the objective of this study, the following specific research questions were raised: First, is the item fitness of the Korean version of the CHAQ appropriate for children with CP? Second, is the item difficulty of the Korean version of the CHAQ appropriate for children with CP? Third, are the response categories of the Korean version of the CHAQ appropriate for children with CP? Fourth, is the Korean version of the CHAQ reliable when used in children with CP?
Discussion
This study aimed to identify the fitness and difficulty of the CHAQ items adapted to the Korean population and verify the appropriateness and reliability of the rating scale in children with CP. To address such objectives, Rasch analysis of 65 subjects was performed using the CHAQ adapted to the Korean population.
When the fitness of the CHAQ items was determined, 2 of 30 items were shown to be misfit items. Item fitness is used to confirm the unidimensional nature of test items and is estimated using the MNSQ value via the utilization of the rating scale model, which could reveal how each item is adequately configured to confirm its unidimensional nature. A high MNSQ value indicates that the item does not have homogeneity with other items within the scale. In contrast, a low value means that the item is redundant with other items [
45]. The MNSQ presents two values: the infit index and the outfit index. The infit and outfit indices are standardized, with the standardized value presented as Z value. In the Rasch model, an MNSQ value of 1 represents an ideal value. In this study, each item with an infit index < 0.5 or > 1.7 was regarded as a misfit item in order to determine item fitness [
42]. Item 4, which pertained to nail-cutting, had an infit index ≥1.7, whereas item 23, which involved opening a bottle cap that was already opened, had an infit index ≤0.5.
Difficulties associated with the use of the CHAQ adapted to the Korean population were analyzed by comparing individual attribute scores and item difficulty. When the distribution ranges of the individual attribute scores and item difficulty were consistent (i.e., similar distribution ranges for item difficulty such that item difficulty measurement could estimate all ranges of individual attribute scores), the distribution was considered sufficient [
45]. The analysis results indicated that the difficulty for item 1 (tying shoe laces and buttoning) was the lowest among the 28 items with low fitness, whereas the difficulty for item 20 (turning the head to see behind the shoulder) was the highest. With respect to item difficulty for the CHAQ adapted to the Korean population, 23.5% and 13.7% of children showed a lower and higher capacity, respectively. A high percentage of floor effect for the measurement tool indicates that the item difficulty is higher than the proficiency estimate assessed using a tool in subjects. Conversely, a high percentage of ceiling effect for the measurement tool indicates that the item difficulty is lower than the proficiency estimate in subjects; hence, it is impossible to assess subjects exhibiting higher proficiency estimates as the item difficulty is too low. In the study of Park [
33], the percentage of floor effect was reported to be 4.3–38.6%, with the rate for standing, walking, hand stretching, and catching exceeding 20%. In the case of ceiling effect, the percentage of floor effect was reported to be 1.4–25.7%, with the rate for walking and hygiene exceeding 20%. In the study of Morales et al. [
32], the percentage of floor and ceiling effects was reported to be 2.1–26.0% and 30.2–68.8%, respectively. The high percentage of ceiling effect in the study of Morales et al. [
32] should be considered, as the proportion of level 1 subjects according to the GMFCS was high (37.5%), and the results of Park’s study [
33] should had been affected by the fact that the proportion of level 1 children was 22.2%. Unlike in previous studies that identified item difficulty based on the classical test theory, the subjects’ proficiency estimates and item difficulty in this study were converted into logit interval scales and analyzed, making it possible to overcome the limitations of subjects. Nevertheless, the results of this study showed that there was a need to add items characterized by both high and low difficulty to the CHAQ adapted to the Korean population.
The rating scale to be used for the development of a test should have a clear response level as same as potential variables to be measured and used to produce a test with a rating scale shall have a clear response level. Furthermore, fit indices for each scale score provide information on whether the rating scale is properly functioning. The fit index for the individual scale scores showed values ≥1.5 with respect to a value of 1.0, implying that the applicable scale was not properly functioning and providing possible information on whether the scale scores could be merged later [
46,
47]. In the analysis, the 4-point rating scale of the CHAQ adapted to the Korean population was determined to be appropriate. The scale threshold estimate showed a tendency similar to that of the average proficiency estimate in which the scale threshold increased with an increase in scale scores. The scale threshold estimate differs from the average proficiency estimate in subjects in that the former is an estimate calculated by observation frequency based on the sample, whereas the latter is an estimate calculated using the Rasch model [
42]. The analysis results showed that the rating scale of the Korean version of the CHAQ is proportional to the increase in scale score, indicating that the response range was appropriate.
The Rasch analysis estimates two types of separation reliability: subject separation reliability and item separation reliability. The Rasch model is able to estimate the concurrent validity through the subject separation reliability and estimate the construct validity through the item separation reliability [
48]. The subject separation reliability is the same concept as the conventional reliability, Cronbach’s α [
46]. When the CHAQ separation reliability was estimated after excluding the misfit subjects and misfit topics, the subject separation reliability was 0.97, and the separation index was 5.92, whereas the item separation reliability was 0.95, and the separation index was 4.51. These results showed the CHAQ adapted to the Korean population had a high level of reliability.
This study has some limitations. First, it would have been preferable to have more than 65 children with CP as subjects. Further study with a larger sample size is required to increase the power and possibility for generalization of study results. Second, this study only included children with CP aged 75–190 months. This could potentially affect the feasibility to generalize the results to the entire pediatric population with CP. Future studies on infants (0–36 months) and/or young children (36–72 months) with CP should be performed. Lastly, this study did not examine differential item functioning. There may be items that function differently depending on the type of CP; hence, further analysis based on the types of CP is required.