A comparison of two time intervals for test-retest reliability of health status instruments
Introduction
Reliability is a critical measurement property for health-related quality of life (QOL) instruments. Reliability refers to the consistency of scores obtained by the same persons when re-examined with the same test on different occasions or with different sets of equivalent sets of items [1]. There are many techniques available to measure reliability, including internal consistency and test-retest reliability.
An instrument that has adequate test-retest reliability gives the same result if an individual is re-tested while remaining in a clinical steady state. The problem with testing reliability by the test-retest method is that there is a potential for learning, carry-over, or recall effects (i.e., the first testing may influence the second) [2]. The length of time between the two test administrations also affects the test-retest reliability. A very short time interval makes the carryover effects due to memory, practice, or mood more likely, whereas a longer interval increases the chances that a change in status could occur [2]. Measuring reliability by the internal consistency method involves dividing the instrument into two equal parts and comparing the score on both halves (i.e., split-half reliability). The Kuder Richardson formula 20 is the average of all of the split-half reliabilities, and Cronbach's α is an extension of this formula for ordinal data [3].
Test-retest reliability is more relevant in the setting of clinical medicine because the constructs we attempt to measure are heterogeneous. For example, many instruments used by physicians combine apparently diverse domains (i.e., heterogeneous) such as symptoms (e.g., pain, numbness) and disability (e.g., limited mobility, difficulty with activities of daily living) into a single score. A homogenous instrument consists of items that relate to a single domain. Thus, one may expect clinically heterogenous scales to have poor internal consistency. However, despite the heterogeneity of variables such as health-related QOL, there is evidence that the latter fulfill the criteria for internal consistency despite their apparent heterogeneity [4].
Studies of test-retest reliability for health-related QOL instruments have used varying intervals between test administrations. The interval has ranged from 10 minutes to 1 month [5], [6], [7], [8], [9], [10], [11], [12], [13]. Most investigators have chosen an interval ranging from 2 days to 2 weeks. This time frame is generally believed to be a reasonable compromise between recollection bias and unwanted (on the part of the investigator) clinical change.
There is no evidence available to aid in the selection of the time interval between questionnaire administration for a study of test-retest reliability for health status instruments. The goal of this study was to prospectively compare the 2-day and 2-week test-retest reliability of four knee-rating scales and the eight subscales of the SF-36 in a cohort of athletic patients with disorders of the knee.
Section snippets
Methods
Patients were recruited in the waiting room of orthopedic surgeons specializing in disorders of the knee or by telephone. The latter group was identified by the research nurse (LH) in a single orthopedic practice (RFW). Patients who were believed to be in a clinically stable state were asked to participate in the study. These individuals were randomly assigned to be re-tested at 2 days or 2 weeks [3], [14]. Randomization was performed using blocks of four from a random number generator.
A wide
Results
Of the 108 patients who completed the baseline questionnaire, 82 completed the second. Of these, 70 replied that the status of their knee was unchanged on the transitional index (38 in the 2-day group and 32 in the 2-week group). Five patients in the 2-day group reported change (four stated they were improved, and one stated they were worse). Seven patients in the 2-week group reported change (six stated they were improved, and one stated they were worse).
The mean age of the patients in the
Discussion
Four knee-specific instruments and one general health-related QOL questionnaire with eight domains were administered to patients to test reliability of the instruments. Patients were randomly assigned to complete the questionnaires at 2 days or 2 weeks. These time intervals were selected because in both cases it was believed that the intervals were too short for clinical change in patients who were believed to be in a stable state. With a minimum of 2 days between questionnaire completions,
Acknowledgements
Dr. Marx was supported by an American Academy of Orthopaedic Surgeons Health Services Research Fellowship and a Royal College of Physicians and Surgeons of Canada Detweiler Travelling Fellowship.
References (33)
- et al.
Clinimetric and psychometric strategies for development of a health measurement scale
J Clin Epidemiol
(1999) - et al.
Validity and reproducibility of the Spanish Version of the Sickness Impact Profile
J Clin Epidemiol
(1996) - et al.
The Toronto Breast Self-Examination Instrument (TBSEI): its development and reliability and validity data
J Clin Epidemiol
(1991) - et al.
An index of symptoms for infection with human immunodeficiency virus: reliability and validity
J Clin Epidemiol
(1994) - et al.
Test-retest reliability of the Minnesota Leisure Time Physical Activity Questionnaire
J Chronic Dis
(1986) - et al.
Children's Type A interview: interrater, test-restest reliability, and interviewer effect
J Chron Dis
(1985) - et al.
A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement
Comput Biol Med
(1990) - et al.
Comparing methods of measurement: why plotting difference against standard method is misleading
Lancet
(1995) Psychological testing
(1988)- et al.
Introduction to measurement theory
(1979)
Health measurement scales: a practical guide to their development and use
Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures
J Bone Joint Surg Am
The Sickness Impact Profile: reliability of a health status measure
Med Care
Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short-Form Health Survey among older adults
Med Care
A new instrument to measure patient satisfaction with mammography: validity, reliability, and discriminatory power
Med Care
Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation
Control Clin Trials
Cited by (493)
Cross-Cultural Validation of the Thai Oral Health Impact Profile for Temporomandibular Disorders
2024, International Dental JournalThe Western Ontario Shoulder Instability Index (WOSI) was successfully translated to the Greek language
2024, Journal of Hand TherapyDefining Meaningful Change in Antihypertensive Medication Adherence in Adults with Established Hypertension: Implications for Clinical Practice
2023, Medical Clinics of North AmericaValidity and reliability of the Perceived Nutrition Environment Measures Survey (NEMS-P) for use in Brazil
2024, Public Health Nutrition