A comparison of two time intervals for test-retest reliability of health status instruments

https://doi.org/10.1016/S0895-4356(03)00084-2Get rights and content

Abstract

Studies of test-retest reliability for health-related quality of life instruments have used varying intervals between test administrations. There is no evidence available to aid in the selection of the time interval between questionnaire administrations for a study of test-retest reliability for health status instruments. We compared the test-retest reliability at 2 days and 2 weeks for four knee-rating scales and the eight domains of the SF-36. Seventy patients with disorders of the knee who were in a stable state were randomly allocated to repeat the questionnaires at either 2 days or 2 weeks. There were no statistically significant differences in the test-retest reliability (intraclass correlation coefficient and limits of agreement statistics) for the two time intervals.

Introduction

Reliability is a critical measurement property for health-related quality of life (QOL) instruments. Reliability refers to the consistency of scores obtained by the same persons when re-examined with the same test on different occasions or with different sets of equivalent sets of items [1]. There are many techniques available to measure reliability, including internal consistency and test-retest reliability.

An instrument that has adequate test-retest reliability gives the same result if an individual is re-tested while remaining in a clinical steady state. The problem with testing reliability by the test-retest method is that there is a potential for learning, carry-over, or recall effects (i.e., the first testing may influence the second) [2]. The length of time between the two test administrations also affects the test-retest reliability. A very short time interval makes the carryover effects due to memory, practice, or mood more likely, whereas a longer interval increases the chances that a change in status could occur [2]. Measuring reliability by the internal consistency method involves dividing the instrument into two equal parts and comparing the score on both halves (i.e., split-half reliability). The Kuder Richardson formula 20 is the average of all of the split-half reliabilities, and Cronbach's α is an extension of this formula for ordinal data [3].

Test-retest reliability is more relevant in the setting of clinical medicine because the constructs we attempt to measure are heterogeneous. For example, many instruments used by physicians combine apparently diverse domains (i.e., heterogeneous) such as symptoms (e.g., pain, numbness) and disability (e.g., limited mobility, difficulty with activities of daily living) into a single score. A homogenous instrument consists of items that relate to a single domain. Thus, one may expect clinically heterogenous scales to have poor internal consistency. However, despite the heterogeneity of variables such as health-related QOL, there is evidence that the latter fulfill the criteria for internal consistency despite their apparent heterogeneity [4].

Studies of test-retest reliability for health-related QOL instruments have used varying intervals between test administrations. The interval has ranged from 10 minutes to 1 month [5], [6], [7], [8], [9], [10], [11], [12], [13]. Most investigators have chosen an interval ranging from 2 days to 2 weeks. This time frame is generally believed to be a reasonable compromise between recollection bias and unwanted (on the part of the investigator) clinical change.

There is no evidence available to aid in the selection of the time interval between questionnaire administration for a study of test-retest reliability for health status instruments. The goal of this study was to prospectively compare the 2-day and 2-week test-retest reliability of four knee-rating scales and the eight subscales of the SF-36 in a cohort of athletic patients with disorders of the knee.

Section snippets

Methods

Patients were recruited in the waiting room of orthopedic surgeons specializing in disorders of the knee or by telephone. The latter group was identified by the research nurse (LH) in a single orthopedic practice (RFW). Patients who were believed to be in a clinically stable state were asked to participate in the study. These individuals were randomly assigned to be re-tested at 2 days or 2 weeks [3], [14]. Randomization was performed using blocks of four from a random number generator.

A wide

Results

Of the 108 patients who completed the baseline questionnaire, 82 completed the second. Of these, 70 replied that the status of their knee was unchanged on the transitional index (38 in the 2-day group and 32 in the 2-week group). Five patients in the 2-day group reported change (four stated they were improved, and one stated they were worse). Seven patients in the 2-week group reported change (six stated they were improved, and one stated they were worse).

The mean age of the patients in the

Discussion

Four knee-specific instruments and one general health-related QOL questionnaire with eight domains were administered to patients to test reliability of the instruments. Patients were randomly assigned to complete the questionnaires at 2 days or 2 weeks. These time intervals were selected because in both cases it was believed that the intervals were too short for clinical change in patients who were believed to be in a stable state. With a minimum of 2 days between questionnaire completions,

Acknowledgements

Dr. Marx was supported by an American Academy of Orthopaedic Surgeons Health Services Research Fellowship and a Royal College of Physicians and Surgeons of Canada Detweiler Travelling Fellowship.

References (33)

  • D.L Streiner et al.

    Health measurement scales: a practical guide to their development and use

    (1989)
  • D.P Martin et al.

    Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures

    J Bone Joint Surg Am

    (1997)
  • W.E Pollard et al.

    The Sickness Impact Profile: reliability of a health status measure

    Med Care

    (1976)
  • E.M Andresen et al.

    Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short-Form Health Survey among older adults

    Med Care

    (1996)
  • K Loeken et al.

    A new instrument to measure patient satisfaction with mammography: validity, reliability, and discriminatory power

    Med Care

    (1997)
  • R.A Deyo et al.

    Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation

    Control Clin Trials

    (1991)
  • Cited by (493)

    View all citing articles on Scopus
    View full text