A comparison of two time intervals for test-retest reliability of health status instruments

doi:10.1016/S0895-4356(03)00084-2

Journal of Clinical Epidemiology

Volume 56, Issue 8, August 2003, Pages 730-735

https://doi.org/10.1016/S0895-4356(03)00084-2 Get rights and content

Abstract

Studies of test-retest reliability for health-related quality of life instruments have used varying intervals between test administrations. There is no evidence available to aid in the selection of the time interval between questionnaire administrations for a study of test-retest reliability for health status instruments. We compared the test-retest reliability at 2 days and 2 weeks for four knee-rating scales and the eight domains of the SF-36. Seventy patients with disorders of the knee who were in a stable state were randomly allocated to repeat the questionnaires at either 2 days or 2 weeks. There were no statistically significant differences in the test-retest reliability (intraclass correlation coefficient and limits of agreement statistics) for the two time intervals.

Introduction

Reliability is a critical measurement property for health-related quality of life (QOL) instruments. Reliability refers to the consistency of scores obtained by the same persons when re-examined with the same test on different occasions or with different sets of equivalent sets of items [1]. There are many techniques available to measure reliability, including internal consistency and test-retest reliability.

An instrument that has adequate test-retest reliability gives the same result if an individual is re-tested while remaining in a clinical steady state. The problem with testing reliability by the test-retest method is that there is a potential for learning, carry-over, or recall effects (i.e., the first testing may influence the second) [2]. The length of time between the two test administrations also affects the test-retest reliability. A very short time interval makes the carryover effects due to memory, practice, or mood more likely, whereas a longer interval increases the chances that a change in status could occur [2]. Measuring reliability by the internal consistency method involves dividing the instrument into two equal parts and comparing the score on both halves (i.e., split-half reliability). The Kuder Richardson formula 20 is the average of all of the split-half reliabilities, and Cronbach's α is an extension of this formula for ordinal data [3].

Test-retest reliability is more relevant in the setting of clinical medicine because the constructs we attempt to measure are heterogeneous. For example, many instruments used by physicians combine apparently diverse domains (i.e., heterogeneous) such as symptoms (e.g., pain, numbness) and disability (e.g., limited mobility, difficulty with activities of daily living) into a single score. A homogenous instrument consists of items that relate to a single domain. Thus, one may expect clinically heterogenous scales to have poor internal consistency. However, despite the heterogeneity of variables such as health-related QOL, there is evidence that the latter fulfill the criteria for internal consistency despite their apparent heterogeneity [4].

Studies of test-retest reliability for health-related QOL instruments have used varying intervals between test administrations. The interval has ranged from 10 minutes to 1 month [5], [6], [7], [8], [9], [10], [11], [12], [13]. Most investigators have chosen an interval ranging from 2 days to 2 weeks. This time frame is generally believed to be a reasonable compromise between recollection bias and unwanted (on the part of the investigator) clinical change.

There is no evidence available to aid in the selection of the time interval between questionnaire administration for a study of test-retest reliability for health status instruments. The goal of this study was to prospectively compare the 2-day and 2-week test-retest reliability of four knee-rating scales and the eight subscales of the SF-36 in a cohort of athletic patients with disorders of the knee.

Section snippets

Methods

Patients were recruited in the waiting room of orthopedic surgeons specializing in disorders of the knee or by telephone. The latter group was identified by the research nurse (LH) in a single orthopedic practice (RFW). Patients who were believed to be in a clinically stable state were asked to participate in the study. These individuals were randomly assigned to be re-tested at 2 days or 2 weeks [3], [14]. Randomization was performed using blocks of four from a random number generator.

A wide

Results

Of the 108 patients who completed the baseline questionnaire, 82 completed the second. Of these, 70 replied that the status of their knee was unchanged on the transitional index (38 in the 2-day group and 32 in the 2-week group). Five patients in the 2-day group reported change (four stated they were improved, and one stated they were worse). Seven patients in the 2-week group reported change (six stated they were improved, and one stated they were worse).

The mean age of the patients in the

Discussion

Four knee-specific instruments and one general health-related QOL questionnaire with eight domains were administered to patients to test reliability of the instruments. Patients were randomly assigned to complete the questionnaires at 2 days or 2 weeks. These time intervals were selected because in both cases it was believed that the intervals were too short for clinical change in patients who were believed to be in a stable state. With a minimum of 2 days between questionnaire completions,

Acknowledgements

Dr. Marx was supported by an American Academy of Orthopaedic Surgeons Health Services Research Fellowship and a Royal College of Physicians and Surgeons of Canada Detweiler Travelling Fellowship.

References (33)

R.G Marx et al.
Clinimetric and psychometric strategies for development of a health measurement scale
J Clin Epidemiol
(1999)
X Badia et al.
Validity and reproducibility of the Spanish Version of the Sickness Impact Profile
J Clin Epidemiol
(1996)
L.E Ferris et al.
The Toronto Breast Self-Examination Instrument (TBSEI): its development and reliability and validity data
J Clin Epidemiol
(1991)
C.C Whalen et al.
An index of symptoms for infection with human immunodeficiency virus: reliability and validity
J Clin Epidemiol
(1994)
A.R Folsom et al.
Test-retest reliability of the Minnesota Leisure Time Physical Activity Questionnaire
J Chronic Dis
(1986)
T.A Gerace et al.
Children's Type A interview: interrater, test-restest reliability, and interviewer effect
J Chron Dis
(1985)
J.M Bland et al.
A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement
Comput Biol Med
(1990)
J.M Bland et al.
Comparing methods of measurement: why plotting difference against standard method is misleading
Lancet
(1995)
A Anastasi
Psychological testing
(1988)
M.J Allen et al.
Introduction to measurement theory
(1979)

D.L Streiner et al.

Health measurement scales: a practical guide to their development and use

(1989)

D.P Martin et al.

Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures

J Bone Joint Surg Am

(1997)

W.E Pollard et al.

The Sickness Impact Profile: reliability of a health status measure

Med Care

(1976)

E.M Andresen et al.

Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short-Form Health Survey among older adults

Med Care

(1996)

K Loeken et al.

A new instrument to measure patient satisfaction with mammography: validity, reliability, and discriminatory power

Med Care

(1997)

R.A Deyo et al.

Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation

Control Clin Trials

(1991)

Cited by (493)

The burden of face affected questionnaire in patients with systemic sclerosis: Translation, cross-cultural adaptation and psychometric properties in the Turkish version
2024, Egyptian Rheumatologist
Aim of the work: Facial skin involvement is one of the most important features in systemic sclerosis (SSc) patients with physical, emotional and social effects. A newly developed burden of face affected (BoFA) questionnaire is available to quantify the disability associated with facial involvement in SSc patients. This study aimed to translate the BoFA questionnaire into Turkish (BoFA-T) and investigate its psychometric properties. Patients and Methods: Forty-nine SSc patients were included in the study. The translation and cross-cultural adaptation of the BoFA-T were conducted. The BoFA-T, the scleroderma mouth handicap in SSc scale (MHISS), the Rosenberg self-esteem scale (RSS), and the perceived stress scale (PSS) were applied to construct validity. The patients were re-evaluated 7 days later to assess the test–retest reliability of the BoFA-T. Results:The mean age of the 49 patients was 48.5 ± 9.97 years, age at diagnosis 40.1 ± 11.5 years,48 females and 1 male. 24 had diffuse cutaneous and 25 limited cutaneous SSc. The mean BoFA-T was 16.81 ± 22.37, MHISS was 16.93 ± 12.19, RSS was 2.04 ± 1.92 and PSS was 24.89 ± 12.31. BoFA-T has excellent internal consistency (Cronbach’s α = 0.97) and test–retest reliability (ICC = 0.94, 95 %CI = 0.88–0.97). The correlation of BoFA-T with MHISS (r = 0.56, p < 0.001), RSS (r = 0.51, p < 0.001), and PSS (r = 0.57, p < 0.001) was moderate. The BoFA-T questionnaire consists of three-factor subgroups (''self-esteem'', ''future anxiety'', and ''sexuality''), and no floor or ceiling effects were observed in relation to BoFA-T. Conclusion: The BoFA-T demonstrates appropriate psychometric properties and may be effectively utilized to assess facial involvement and disability in Turkish SSc patients.
Cross-Cultural Validation of the Thai Oral Health Impact Profile for Temporomandibular Disorders
2024, International Dental Journal
The Oral Health Impact Profile for Temporomandibular Disorders (OHIP-TMDs) is a validated condition-specific outcome measure to help guide decision-making in the management of the condition. There is no Thai version of OHIP-TMDs, and therefore the original English version needs cross-cultural adaptation translation, and validation with a Thai population to reduce the anomalies due to language and cultural differences. This study aimed to develop the Thai-language version of OHIP-TMDs, perform a cross-cultural adaptation to Thailand, and assess its content validity, internal consistency, reliability, and construct validity.
The original English version of OHIP-TMDs was forward and backward translated into Thai language using the International Network for Orofacial Pain and Related Disorders methodology (INfORM) protocol for cross-cultural adaptation. The Content Validity Index (CVI) was performed by 5 orofacial pain (OFP) specialists to establish content validity. The OHIP-TMDs-T was then tested in 2 groups of Thai dental patients including 110 TMD patients and 110 control participants. The internal reliability and test-retest reliability (n = 30) were investigated in the TMD group using Cronbach alpha coefficient and intraclass correlation coefficient (2-way mixed effect model), respectively. The difference in OHIP-TMDs-T score between the TMD group and control group was investigated for known group validity.
Cronbach alpha and intraclass correlation coefficients were 0.942 and 0.797, respectively. The CVI collected from the OFP specialists was 0.92. There was a statistical difference in the OHIP-TMDs-T overall score between the TMD group (95% CI, 40–46) and control group (95% CI, 2.0–3.4) (Z = 9.060, r = 1, P < .001).
The OHIP-TMDs-T is a valid and reliable tool for evaluating the quality of life and the impact on oral health in Thai patients with TMD.
Construct validity of the Hungarian Version of the Patient-Reported Outcomes Measurement Information System-29 Profile Among Patients with Low Back Pain
2024, World Neurosurgery
We aim to evaluate the psychometric properties of the Hungarian version of the patient-reported outcomes measurement information system (PROMIS)-29 profile domains among patients with chronic low back pain.
We used a convenience, cross-sectional sampling of patients recruited at our neurosurgical institution. The participants completed paper-pencil version of the PROMIS-29 profile in addition to validated legacy questionnaires, including the Oswestry disability index, Research and Development Corporation 36-item short-form survey, 7-item general anxiety disorder scale, 9-item patient health questionnaire. Reliability was evaluated by calculating the internal consistency (Cronbach's α). Test–retest reliability was assessed using the intraclass correlation coefficient. The structural validity of PROMIS-29 was assessed using a confirmatory factor analysis. Construct validity was assessed by evaluating convergent and discriminant validity using Spearman's rank correlation. To further corroborate the construct validity, we also performed known-group comparisons.
The mean age of the 131 participants was 54 ± 16 years. Of the 131 patients, 62% were women. The internal consistency of each PROMIS domain was high (Cronbach's α >0.89 for all). The test–retest reliability was excellent (intraclass correlation >0.97). The confirmatory factor analysis showed good structural validity (comparative fit index >0.96; standardized root mean square residual <0.026 for all domains). All measured PROMIS scores correlated strongly with the scores obtained using the corresponding primary legacy instrument, indicating excellent convergent validity. The known-group comparisons demonstrated differences as hypothesized.
We present data supporting the validity and reliability of the Hungarian PROMIS-29 profile short forms for patients with low back pain. This instrument will be useful for research and clinical applications in spine care.
The Western Ontario Shoulder Instability Index (WOSI) was successfully translated to the Greek language
2024, Journal of Hand Therapy
Defining Meaningful Change in Antihypertensive Medication Adherence in Adults with Established Hypertension: Implications for Clinical Practice
2023, Medical Clinics of North America
Validity and reliability of the Perceived Nutrition Environment Measures Survey (NEMS-P) for use in Brazil
2024, Public Health Nutrition

View all citing articles on Scopus

View full text

A comparison of two time intervals for test-retest reliability of health status instruments

Abstract

Introduction

Section snippets

Methods

Results

Discussion

Acknowledgements

J Clin Epidemiol

J Clin Epidemiol

J Clin Epidemiol

J Clin Epidemiol

J Chronic Dis

J Chron Dis

Comput Biol Med

Lancet

Psychological testing

Introduction to measurement theory

Health measurement scales: a practical guide to their development and use

Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures

J Bone Joint Surg Am

The Sickness Impact Profile: reliability of a health status measure

Med Care

Test-retest performance of a mailed version of the Medical Outcomes Study 36-Item Short-Form Health Survey among older adults

Med Care

A new instrument to measure patient satisfaction with mammography: validity, reliability, and discriminatory power

Med Care

Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation

Control Clin Trials