Background
Objectives
Methods/design
Protocol and registration
Literature search
Inclusion criteria | Exclusion criteria | |
---|---|---|
Population | Eczema (synonyms: atopic eczema, atopic dermatitis, neurodermatitis); populations younger than 16 years of age | Populations with other skin diseases than eczema, populations of adults with eczema, carers of infants/children with eczema |
Study design | Development study, validation study | Linguistic validation studies |
Outcome | Quality of life, health-related quality of life | Signs, disease severity measure, disease control measure, biomarker, physiology of the skin |
Type of measurement instrument | Self- or proxy-reported measurement instrument | All others |
Publication type | Articles with available full text | Abstracts |
Eligible studies
Study selection
Data extraction
Content comparison
Assessment of the methodological quality of included studies
Domain | Measurement property | Aspect of a measurement property | Definition |
---|---|---|---|
Reliability | The degree to which the measurement is free from measurement error. | ||
Reliability (extended definition) | The extent to which scores for patients who have not changed is the same for repeated measurement under several conditions: for example, using different sets of items from the same HR-PROs (internal consistency), over time (test-retest) by different persons on the same occasion (inter-rater) or by the same persons (i.e., raters or responders) on different occasions (intra-rater). | ||
Internal consistency | The degree of interrelatedness among the items. | ||
Reliability | The proportion of total variance in the measurements which is because of “true”a differences among patients. | ||
Measurement error | The systematic and random error of a patient’s score that is not attributed to true change of the construct to be measured. | ||
Validity | The degree to which an HR-PRO instrument measures the construct(s) it purports to measure. | ||
Content validity | The degree to which the content of an HR-PRO instrument is an adequate reflection of the construct to be measured. | ||
Face validity | The degree to which (the items of) an HR-PRO instrument indeed looks as though they are an adequate reflection of the construct to be measured. | ||
Construct validity | The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the HR-PRO instrument validly measures the construct to be measured. | ||
Structural validity | The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured. | ||
Hypothesis testing | Idem construct validity. | ||
Cross-cultural validity | The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument are an adequate reflection of the performance of the items of the original version of the HR-PRO instrument. | ||
Responsiveness | The ability of an HR-PRO instrument to detect change over time in the construct to be measured. | ||
Responsiveness | Idem responsiveness. | ||
Interpretabilityb
| The degree to which one can assign qualitative meaning—that is, clinical or commonly understood connotations—to an instrument’s quantitative scores or changes in scores. |
Assessment of measurement properties and further characteristics of QoL instruments
Assessment of the adequacy of the measurement instruments
Property | Rating | Adequacy criteria |
---|---|---|
Reliability | ||
Internal consistency (CTT methods applied) | + | Cronbach’s alpha(s) ≥0.70 |
? | Cronbach’s alpha not determined | |
− | Cronbach’s alpha(s) <0.70 | |
Internal consistency (IRT methods applied) | + | Person Separation Index ≥0.70 |
? | Person Separation Index not determined | |
− | Person Separation Index <0.70 | |
Measurement error | + | MIC > SDC OR MIC outside the LoA |
? | MIC not defined | |
− | MIC ≤ SDC OR MIC equals or inside LoA | |
Reliability | + | ICC/weighted Kappa ≥0.70, OR Pearson’s r ≥ 0.80 |
? | Neither ICC/weighted Kappa, nor Pearson’s r determined | |
− | ICC/weighted Kappa <0.70 OR Pearson’s r < 0.80 | |
Validity | ||
Content validity | + | All items are considered to be relevant for the construct to be measured, for the target population, and for the purpose of the measurement AND the questionnaire is considered to be comprehensive |
? | Not enough information available | |
− | Not all items are considered to be relevant for the construct to be measured, for the target population, and for the purpose of the measurement OR the questionnaire is considered not to be comprehensive | |
Construct validity | ||
Structural validity (CTT methods applied) | + | Factors should explain at least 50 % of the variance |
? | Explained variance not mentioned | |
− | Factors explain <50 % of the variance | |
Structural validity (IRT methods applied) | + | Residual correlations among the items after controlling for the dominant factor <0.20 OR Q3’s <0.37, item scalability >0.30, IRT model fit: G2 >0.01, no DIF for important subject characteristics (such as age, gender, education): McFadden’s R
2 <0.02, OR no non-uniform DIF |
? | Important statistics not reported | |
− | Residual correlations among the items after controlling for the dominant factor ≥0.20 OR Q3’s ≥0.37, item scalability ≤0.30, IRT model fit: G2 ≤0.01, important DIF for important subject characteristics (such as age, gender, education): McFadden’s R
2 ≥0.02, OR non-uniform DIF | |
Hypothesis testing (convergent/divergent validity) | + | Correlations with instruments measuring the same construct ≥0.50 OR at least 75 % of the results are in accordance with the hypotheses AND correlation with related constructs is higher than with unrelated constructs |
? | Solely correlations determined with unrelated constructs | |
− | Correlations with instruments measuring the same construct <0.50 OR <75 % of the results are in accordance with the hypotheses OR correlation with related constructs is lower than with unrelated constructs | |
Hypothesis testing (discriminative validity) | + | Differences in scores on the measurement instrument for all evaluated patient subgroups are statistically significant OR ≥75 % of results in accordance with hypotheses |
? | Some differences statistically significant, others not | |
− | Differences in scores on the measurement instrument for all evaluated patient subgroups are not statistically significant OR <75 % of results in accordance with hypotheses | |
Cross-cultural validity | + | No differences in factor structure OR no important DIF between language versions |
? | Multiple group factor analysis not applied AND DIF not assessed | |
− | Differences in factor structure OR important DIF between language versions | |
Responsiveness | ||
Responsiveness | + | Correlation with changes on instruments measuring the same construct ≥0.50 OR at least 75 % of the results are in accordance with the hypotheses OR AUC ≥0.70 AND correlations with changes in related constructs are higher than with unrelated constructs |
? | Solely correlations determined with unrelated constructs | |
− | Correlations with changes on instruments measuring the same construct <0.50 OR <75 % of the results are in accordance with the hypotheses OR AUC <0.70 OR correlations with changes in related constructs are lower than with unrelated constructs |
Best evidence synthesis
Level | Rating | Criteria |
---|---|---|
Strong | +++, ? (strong) or −−− | Consistent findings in multiple studies of good methodological quality OR in one study of excellent methodological quality |
Moderate | ++, ? (moderate) or −− | Consistent findings in multiple studies of fair methodological quality OR in one study of good methodological quality |
Limited | +, ? (limited) or − | One study of fair methodological quality |
Conflicting | +/− | Conflicting findings |
Unknown | ? | Only studies of poor methodological quality |
Generating recommendations for the use of QoL measurement instruments for eczema
Adequacy item (name) | Inclusion in OMERACT filter | Required rating for recommendation |
---|---|---|
Content validity | Truth | + |
Structural validity | Truth | + |
Hypotheses testing | Truth | + |
Cross-cultural validity | Truth | + |
Internal consistency | Discrimination | + |
Reliability | Discrimination | + |
Measurement error | Discrimination | + |
Responsiveness | Discrimination | + |
Differences between this review and previously suggested methodology
-
For internal consistency, the indeterminate rating (“?”) was changed from “Dimensionality not known OR Cronbach’s alpha not determined” to “Cronbach’s alpha not determined” in order to avoid an overlap between the adequacy criteria and the COSMIN criteria for methodological quality. Adequacy criteria for studies using IRT methods were added.
-
The IRT criteria for structural validity were enhanced with criteria on differential item functioning (DIF) [25]. If a study shows that there is no non-uniform DIF, this can now also result in a positive rating. Non-uniform DIF will be rated negatively according to the new criteria.
-
Hypothesis testing was split into its two aspects convergent/divergent and discriminative validity, with separate criteria for each aspect, resulting in an overall rating for hypothesis testing in the end.
-
The criteria developed by Terwee et al. for hypothesis testing will only be applied to convergent and divergent validity. For discriminative validity, another aspect of hypothesis testing, self-developed criteria were added. As the COSMIN initiative does not consider interpretability to be a formal measurement property, the adequacy criteria for interpretability were omitted [18].