Validity-driven instrument development
Our approach to construct definition and instrument development is based on the tenet that construct validity needs to be the primary concern of all instrument development activities and of all proposed applications of instruments. This is consistent with the descriptions provided by Pedhazur and colleagues [
28], and the Standards for Educational and Psychological Testing developed jointly by the American Educational Research Association, the American Psychological Association and the (American) National Council on Measurement in Education [
29,
30]. The Standards describe validation as an ongoing process that commences with the conceptualization and continues each time someone proposes an additional interpretation or application of the tool [
29].
While it is common practice in health research to refer to a tool as either validated or unvalidated, it is not tools but only their interpretations and applications that are validated. To maximize the likelihood of producing valid data in relation to a range of possible interpretations and applications of a tool, there are development processes that seek to protect the instrument against two categories of error; measuring less than the proposed construct (construct underrepresentation) or measuring more (construct irrelevant variance) [
29]. Protection against the first type of error requires rigor in the processes of conceptualization and definition and the identification of a range of indicators. Protection against the second type of error requires rigor in psychometric analysis. We believe that three disciplines help achieve this necessary rigor: the use of grounded approaches for construct definition; the development of
a priori structural hypotheses (that define relevant versus irrelevant variance); and the development of
a priori, relational hypotheses as a basis for future construct validation.
The Standards contain 24 standards related to validity of a measure, but the first four of these specifically relate to the linkage between validity and possible interpretations (Table
3). It is clear that the authors of the Standards place a significant onus of responsibility on the developers of instruments to clarify the interpretations that are supported by available evidence at any point in time.
Table 3
Standards relating validity to interpretations
Standard 1.1 | A rationale should be presented for each recommended interpretation and use of test scores, together with a comprehensive summary of the evidence and theory bearing on the intended use or interpretation |
Standard 1.2 | The test developer should set forth clearly how test scores are intended to be interpreted and used. The populations(s) for which a test is appropriate should be clearly delimited and the construct that the test is intended to assess should be clearly described |
Standard 1.3 | If validity for some common or likely interpretation has not been investigated, or if the interpretation is not consistent with the available evidence, that fact should be made clear and potential users should be cautioned against making unsupported interpretations |
Standard 1.4 | If a test is to be used in a way that has not been validated, it is incumbent on the user to justify the new use, collecting new evidence if necessary |
An important initial step in scale development, and the final step in development of the hypothesized model, involves writing (hypothesized) descriptors about characteristics of people with a high score and people with a low score on scales related to each hypothesized domain. This exercise helps to clarify whether the domain can be represented as a scale or whether it is simply a checklist of possible characteristics, the desired range of item difficulty, and possible relationships between scale scores and other variables (other scales, demographic and clinical variables, outcomes of interventions). This final point is an important and often neglected step in preparing for construct validation by developing a broad range of
a priori hypotheses about the behavior of the scales in relation to other variables (the so-called nomothetic web) [
28,
31].
In considering the ongoing process of validation once the instrument has been developed, it is necessary to specify the interpretations and applications that we are seeking to validly and reliably achieve. These interpretations and applications are presented in Table
4, together with some of the evidence - or the processes to obtain the evidence - that is required to support validity of each type of interpretation/application. Table
4 shows that the expansion of valid interpretations and applications occurs in a number of stages that build upon each other. The first two stages are integral to the process of psychometric development through the application of draft tools to a construction and validation sample (see below). Evidence in relation to the second two proposed applications accrues through use of the tool, while the step from interpretation of data at the group level to interpretation at the individual level usually requires additional technical analysis as well as a body of evidence about the meaning and behavior of each scale acquired through widespread use with groups. There are also steps that can be taken during the psychometric development phase to increase the likelihood that the tool will be usable with individuals. These steps relate to ensuring that the scales have certain properties in relation to the range of difficulty that the items cover and the extent to which they can give scores spread evenly across this range. While the meaning of difficulty is clear in academic tests in this situation, a difficult item would be one where few people give the most positive possible response. There are also different reliability requirements related to each level of use, with individual applications having the most stringent requirements.
Table 4
Proposed interpretations/applications and evidence required to support the measure's validity for low back pain burden
Interpretations/applications applied to groups - supported through initial development processes |
Describe the burden of low back pain on a set of scales that reflects the full range of the experience of people with low back pain | Thorough, grounded identification of the range of issues that contribute to low back pain burden |
| Iterative process of organizing these into domains and potential scales |
| Comparison with interview data at a number of stages of development |
Quantify variations in the effects of low back pain across a broad range of sufferers on a range of scales | Cluster analysis to identify score profiles and qualitative confirmation of these |
| Tests of structural invariance across groups |
Interpretations/applications applied to groups - supported through subsequent applications of the tool
a
|
Describe the relative importance of different domains of low back pain burden in comparing one population with another (for example, needs identification) | Accumulated evidence about what is a high average score and what is a low average score for each scaleb |
| Establishment of whole of population norms and subgroup norms |
| Tests of structural invariance |
Validly assess changes in low back pain burden in a group over time or as a result of interventions | Application for a range of evaluation purposes including comparison with other subjective and objective indicators of change |
| Development of estimates of meaningful change |
Interpretations/applications applied to individuals
|
Assess the relative needs of an individual with low back pain across a range of domains | Attention to item scaling properties during psychometric development |
| Comparison with other subjective and objective indicators of status |
Measure changes in individuals over time or in response to interventions | Comparison with other subjective and objective indicators of change |
| Development of estimates of meaningful change |
Implications for the measurement of the burden of low back pain
One of the primary reasons for conducting this research was the observation that existing instruments inadequately capture the range of impacts of low back pain that are commonly reported by people with low back pain and the clinicians that work with them. This project has produced a conceptual framework that includes many concepts not included in the tools most commonly used to assess needs and/or outcomes for people with low back pain.
At one end of the spectrum, because low back pain has until recently been thought mainly a work-related problem, outcome measures have often been limited to occupational aspects of burden: most of all, measures of absence from work, and the consequent financial costs. Such measures only capture part of the burden of low back pain.
At the other end of the spectrum, Deyo and colleagues proposed a core set of six indicators for routine clinical use that included pain symptoms, function, well-being, disability, social role and satisfaction with care [
32]. Another core set of measures proposed for evaluating the effectiveness of treatment in clinical trials and routine care was proposed by Bombardier [
33]. Recognizing the importance of the patient's perspective, she proposed the following five domains: back-specific function, generic health status, pain, work disability, and patient satisfaction [
33]. Similar to these proposals, the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials group recommended a core set of six outcome domains be considered in chronic pain clinical trials: pain, physical functioning, emotional functioning, participant ratings of global improvement and satisfaction with treatment, symptoms and adverse events, and participant disposition [
34].
More recently, Kopec and colleagues proposed a web-based computerized adaptive test (CAT-5D-QOL) to measure five domains of health-related quality of life (Daily Activities, Walking, Handling Objects, Pain or Discomfort, and Feelings) for patients with back pain based upon item banks developed for these domains relevant to arthritis [
35]. Many measures have been developed to specifically quantify the limitations that low back pain places upon functional status. For example, in a 2004 systematic review Grotle and colleagues identified a total of 36 back-specific questionnaires [
36]. The authors classified the content of the questionnaires based upon the World Health Organization's International Classification of Functioning, Disability and Health (ICF); they found that while most of the questionnaires had a focus on activity limitations, there was a wide variation in their underlying constructs and content. Many questionnaires also included constructs of pain and symptoms, sleep disturbances, psychological dysfunction, physical impairment and social functions.
The brief and comprehensive ICF core sets for low back pain, based upon the ICF framework, are further attempts to develop a standardized set of indicators to encompass the key functional problems of patients with low back pain envisaged to be used for a variety of purposes including clinical studies and multidisciplinary assessment in clinical care [
37]. These were formed by consensus among a group of international clinical experts comprising physicians, occupational and physical therapists, who integrated evidence from a Delphi exercise to identify the most relevant ICF categories in patients with chronic health conditions including back pain [
38], a systematic review to identify the concepts contained in outcome measures in clinical trials of musculoskeletal disorders and chronic widespread pain [
39], and a study in a convenience sample of people undergoing rehabilitation for one of several chronic conditions including low back pain who were administered the ICF checklist [
40]. The comprehensive and brief ICF core sets include 78 and 35 categories, respectively, which cover not only aspects related to pain but also a wide spectrum of activities, social and environmental factors that affect functioning. In keeping with our conceptual model, these core sets recognize the importance of support and relationships, attitudes of significant others and health professionals as predictors of disability in people with low back pain.
A Norwegian study in a convenience sample of 118 patients with low back pain, however, has identified gaps in the comprehensive ICF core set with respect to capturing problems of importance to patients [
41]. This study compared the relationship between health problems rated by health professionals using the comprehensive ICF core set and patients' self-reported health problems identified by the Oswestry Disability Index and the World Health Organisation Disability Assessment Schedule II. Relevant domains not covered by the ICF included the subjective domain related to the impact of back pain and the feeling of being a burden to their family, while problems with sexual functions and relationship were poorly reflected in the health professionals' assessments.
Our model for the measurement of the burden of low back pain aims to comprehensively capture all of the various impacts of this condition on the individual. The model includes several domains that have not until now been considered important to measure in patients with low back pain, although they may contribute significantly to the individual's burden; for instance, loss of independence, worry about the future, negative or discriminatory actions by others, and secondary health effects, among others.
The new tool will have a wide range of potential applications for researchers, clinicians, policy-makers and insurance agencies; and for a range of purposes, including needs identification, service planning, evaluation, research and, eventually, for individual clinical assessment and monitoring. In suggesting such a range of applications, we are aware of our responsibility to consider the evidence for validity in relation to each interpretation and application [
29,
30].
To strengthen potential generalizability, we have used both a local approach and an international approach to scope and define low back pain burden, nominal group approaches and concept mapping. The questionnaire is being developed with input from an international team of experts in the field. To facilitate comparison of the burden of back pain between countries and between studies, steps are being taken to ensure its wide applicability and cross-cultural generalizability.
In assessing health priorities, allocating resources, and evaluating the potential costs and benefits of public health interventions, governments often consider the burden of a disease and its contribution to the overall health of the population. Information obtained from a single comprehensive measure of back pain burden will greatly enhance research efforts to identify major determinants of back pain burden and population groups that are most affected and to ensure efficient allocation of resources. This information may also inform the development and evaluation of novel new interventions that could improve patient-relevant outcomes.
While the measurement model (Figure
3) does test for a single underlying latent variable, which we have called the burden of low back pain, we expect the questionnaire will be used as a multidimensional tool providing a profile of scores across the various scales. We will not be attempting to provide a scoring mechanism to gain a single overall score. In our experience it is more useful to be able to use profiles of scores to describe the needs of different patient groups and to distinguish the benefits of different types of interventions than to generate a global indicator that is at such a high level of abstraction no-one will be clear what it means. A profile of scores will also serve to highlight the critical psychosocial aspects of the burden of low back pain that have not been adequately addressed in existing tools. It is hoped that this profile of scores will support a greater clinical emphasis and increased research focus on these aspects of the burden experienced by people with back pain.