Background
Methods
Eligibility criteria
Quality assessment
Term | Definition | ||
---|---|---|---|
Domain | Measurement property | Aspect of a measurement property | |
Reliability | The degree to which the measurement is free from measurement error | ||
Reliability (extended definition) | The extent to which scores for patients who have not changed are the same for repeated measurement under several conditions: e.g. using different sets of items from the same health related-patient reported outcomes (HR-PRO) (internal consistency); over time (test-retest); by different persons on the same occasion (inter-rater); or by the same persons (i.e. raters or responders) on different occasions (intra-rater) | ||
Internal consistency | The degree of the interrelatedness among the items | ||
Reliability | The proportion of the total variance in the measurements which is due to ‘true’a differences between patients | ||
Measurement error | The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured | ||
Validity | The degree to which an HR-PRO instrument measures the construct(s) it purports to measure | ||
Content validity | The degree to which the content of an HR-PRO instrument is an adequate reflection of the construct to be measured | ||
Face validity | The degree to which (the items of) an HR-PRO instrument indeed looks as though they are an adequate reflection of the construct to be measured | ||
Construct validity | The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumptionthat the HRPRO instrument validly measures the construct to be measured | ||
Structural validity | The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured | ||
Hypotheses testing | Idem construct validity | ||
Cross-cultural validity | The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument are an adequate reflection of the performance of the items of the original version of the HR-PRO instrument | ||
Criterion validity | The degree to which the scores of an HR-PRO instrument are an adequate reflection of a ‘gold standard’ | ||
Responsiveness | The ability of an HR-PRO instrument to detect change over time in the construct to be measured | ||
Responsiveness | Idem responsiveness | ||
Interpretabilityb | Interpretability is the degree to which one can assign qualitative meaning - that is, clinical or commonly understood connotations – to an instrument’s quantitative scores or change in scores. |
Results
Author (country) | Study Population | Methods used | Studies and measures | Psychometric properties reported by studies | COSMIN quality ratings |
---|---|---|---|---|---|
Bonevski et al. (2010) Australia | Group 1 was 30% male and 70% female, Group 2 37% male and 63% female, Group 3 44% male and 56% female and Group 4 41% male and 59% female. Group 1 mean age 25 years. Group 2 mean age 27 years. Group 3 mean age 25 years. Group 4 mean age 25 years. | Participants were asked to recall alcohol intake using either a computer or paper administered measure. 4–7 days later both modes of measures were administered again. | Weekly quantity-frequency measure. | Test-retest reliability-kappa coefficient range (0.90–0.96). Test-retest reliability was good. | Test-retest reliability (poor) |
Chaikelson et al. (1994) Canada | Random sampling was used. The sample was 100% male with mean age 69 years. Wives were also asked same questions via written questionnaire to assess concordance. | Results compared to alcohol test the MAST (Michigan Alcoholism Screening Test [55]) for reliability and validity. | Short-term recall measure (drinking occasions in the previous month recall). | Test-retest reliability- kappa coefficients (0.76) total lifetime drinking, (0.84) last reported month and (0.77) monthly alcohol consumption indicating good test-retest reliability. Concurrent validity- correlations between self-reports (0.87) husband alcohol intake and (0.85) wife alcohol intake indicating good criterion validity. Construct validity- correlations with the MAST self-report test in 1987(0.60) with total lifetime drinking (0.05) with current drinking. Correlations with 1990 data (0.53) with total lifetime drinking (− 0.14) with current drinking. Construct validity shows moderate reported correlation. | Test-retest reliability (fair) Criterion validity (poor) Construct validity (poor) |
Crum et al. (2002) USA | Random sampling was used. The sample was 58% female and 42% male with mean age 76.2 years. Data was obtained from the 1993–1994 follow-up of the Washington County cohort of men and women 65 years and older. | Participants completed a measure of their usual alcohol consumption in two ways: (1) a quantity-frequency measure; (2) same questions asked in an interview about drinking habits. | Weekly quantity-frequency measure. Short-term recall measure (past week recall). | Hypothesis validity-past week recall of alcohol intake 15–20% lower than the quantity-frequency measure. Hypothesis validity was good. Inter-rater reliability-kappa statistic value 0.76 indicating good inter-rater reliability. | Hypothesis validity (good) Inter-rater reliability (poor) |
Cutler et al. (1988) UK | Random sampling was used. 63.4% of the sample were male and 36.6% female. No median or mean age was reported but participants were aged 18 and older. | CAGE responses and the quantity-frequency questions taken from Health Survey Questionnaire were compared. | Weekly quantity-frequency measure. | Criterion validity-sensitivity (42.9) specificity (97.1) positive predictive value (65.8) negative predictive value (92.8) for males and sensitivity (46.6) specificity (98.6) positive predictive value (50.3) negative predictive value (98.4) for females indicating good criterion validity. | Criterion validity (excellent) |
Dollinger et al. (2009) USA | The sample was composed of volunteers and was 61% female and 39% male with a mean age 22 years. | Responses to quantity-frequency measures at both time points compared. Nightly log of alcohol consumption compared to hours spent studying, socialising and religious behaviours. | Daily graduated-frequency measure. Short-term recall measure (daily alcohol intake recall). | Test-retest reliability-alcohol quantity coefficient of 0.85 and an alcohol frequency coefficient of 0.84 indicating good test-retest reliability. Divergent validity-religion-by-alcohol correlations were negative with values from −0.14 to −0.37. Convergent validity-positive correlations with alcohol with values of 0.40 and 0.41 respectively. Good divergent and convergent validity were reported. | Test-retest reliability (fair) Divergent validity (fair) Convergentvalidity (fair) |
Greenfield et al. (2014) USA | Random sampling was used. Respondents were 48.1% male and 53.2% female and aged over 18 years. | Participants completed questionnaires and a follow-up survey by phone or mail. | Short-term recall measure (occasions of ≥5 drinks during specific life decades). | Test-retest reliability-kappa values for gender (0.64–0.80), age groups (0.59–0.83), ethnicity (0.70–0.73), interview mode (0.72–0.73) and childhood victimisation (0.75) (0.73) indicating moderate to good test-retest reliability. Predictive validity-disclosure of prior heavy drinking increased risk for alcohol dependence by 18%, increased risk of consequences by 21% (by 15% when age of onset was controlled), increased risk for alcohol-use disorder by 18% indicating good predictive validity. | Test-retest reliability (fair) Predictive validity (fair) |
Gruenewald et al. (1995) USA | Random sampling was used. Respondents were 43.5% male and 56.5% female and aged 18 years or older. | Responses to graduated-frequency measures at two time points compared. | Gruenewald et al. (1995) Monthly graduated-frequency measure | Test-retest reliability-coefficients for average drinking quantity r = 0.76 and for variance in drinking quantities r = 0.78, indicating good test-retest reliability. | Test-retest reliability (fair) |
Hansell et al. (2008) Australia | Random sampling was used. Respondents were 40% male and 60% female and aged between 19 and 90 years old. | The measures examined were a dependence score, based on DSM-IIIR (Diagnostic and Statistical Manual of Mental Disorders [56]) and DSM-IV criteria for substance dependence, and a quantity × frequency of alcohol consumed taken from the quantity-frequency measure. | Annual quantity-frequency measure | Test-retest reliability-continuous data quantity x frequency of alcohol (0.61) between phase 1 and phase 3, and (0.55) between phase 2 and phase 3. Categorical data quantity x frequency of alcohol (0.64) between phase 1 and phase 3, and (0.59) between phase 2 and phase 3, indicating moderate test-retest reliability. | Test-retest reliability (poor) |
Hilton (1989) USA | Volunteer sample. Respondents were 50% male and 50% female and had a mean age of 30 years. The volunteer participants were recruited from the San Francisco Bay Area newspaper. | Participants completed 2 retrospective recall measures-graduated-frequency and beverage-specific quantity-frequency measures post diary completion. Responses compared. | Short-term recall measure (10 week recall). Graduated-frequency measure (30 day recall). Beverage specific Quantity-frequency measure (2 week recall). | Convergentvalidity-correlations 0.88 for volume of drinks consumed, 0.85 for days of beer consumed, 0.89 for days of beer usually consumed, 0.80 for days of wine consumed, 0.66 for days of wine usually consumed, 0.81 for days of liquor consumed and 0.65 for days of liquor usually consumed, indicating moderate to good convergent validity. | Convergent validity (fair) |
Koppes et al. (2002) Netherlands | Random sampling was used. Respondents were 46% male and 54% female with mean age 36 years. Data was collected from 1 time point, the 2000 follow-up measurement of 171 male and 197 female participants from the Amsterdam Growth and Health Longitudinal Study. | Subjects visited study premises for 1 day. The quantity-frequency measure and dietary history interview were based on alcohol consumption over the previous month and were completed in no particular order. | Quantity-frequency measure (ranging from never drinking to daily alcohol intake). Short-term recall measure (dietary history interview). | Concurrent validity-correlation between (0.77) for men and (0.87) for women, which indicates good concurrent validity. | Criterion validity (poor) |
LaBrie et al. (2004) USA | The sample was composed of volunteers and was 100% male with a mean age of 20.6 years. 211 male college students participated. | Drinking variables assessed were drinking days, average drinks, and total drinks during a 30-day period. | Short-term recall measure (monthly TimeLine follow back method). | Convergentvalidity-correlation coefficients between 0.52–0.69 showing moderate convergent validity. | Convergent validity (fair) |
Lennox et al. (1996) USA | Analysis was conducted of a sample of a household survey aged 18–64 years. Gender proportions were not reported. Responses were analysed from 1 time point (the 1991 follow-up) from 8755 participants in the 1988 National Household Survey of Drug Abuse. | Used a latent variable approach. In this model covariation among multiple indicators was used as an estimate of the latent construct. | Quantity-frequency measure of alcohol consumption over past 30 days. | Structural validity-correlations at 0.36, alcohol abuse and consequences between constructs correlates at 0.28 showing poor structural validity. | Structural validity (fair) |
McGinley et al. (2014) USA | A sample of 18–20 year olds were selected from respondents to the National Survey on Drug Use and Health. Gender proportions were not reported. | Quantity and frequency of alcohol consumption estimates derived from graduated-frequency measure. Estimates compared to the quantity-frequency measure. | Graduated-frequency measure of alcohol consumption over past 30 days. | Construct validity-mid values for quantity of alcohol consumed were (3.5) and (14.5) for frequency indicating poor construct validity. | Construct validity (fair) |
Northcote and Livingston (2011) Australia | Respondents were 47.3% male and 53.3% female and aged 18–25 years. | Participants reported number of alcoholic drinks consumed 1–2 days after drinking occasion which was compared to reported alcohol intake observed by peer-based researchers on the occasion. | Short-term recall measure (last occasion self-report of drinks consumed). | Criterion validity-significant associations with p values of 0.6, 0.31, 0.04 and < 0.01 for: up to 4 drinks, 5–8 drinks, 9–12 drinks and more than 12 drinks respectively indicating good criterion validity for respondents consuming ≥9 drinks. . Convergent validity- significant at 0.74, with gender specific correlations formen as 0.79 and women 0.60. Moderate to good convergent validity was reported. | Criterion validity (poor) |
O’Hare et al. (1991) USA | Respondents were 41.6% female 58.4% male and with mean age 20.6 years. | Participants were asked to complete mailed questionnaire with both measures of alcohol consumption included. | Weekly graduated-frequency measure. Short-term recall measure (retrospective recall of past 7 day alcohol intake). | Convergent validity-correlations were significant at 0.74, with gender specific correlations for men as 0.79 and women 0.60, indicating moderate to good convergent validity. | Convergent validity (good) |
O’Hare et al. (1997) USA | Random sample of an undergraduate university population. Gender proportions were reported as ‘representative of sex’. Respondents had a mean age of 18.7 years. | All students completed quantity-frequency questions, MmMAST and 7 day recall. The MmMAST was used as a criterion variable. | Weekly graduated-frequency measure. Short-term recall measure (retrospective recall of past 7 day alcohol intake). | Criterion validity-association was significant at p < 0.01 indicating good criterion validity. Predictive validity-sensitivity and specificity values were 76 and 59.8 for the recall measure. Using MAST cut off score ≥ 2 sensitivity and specificity values were 59.7 and 70.9 indicating moderate to good predictive validity. | Criterion validity (fair) Predictive validity (fair) |
Parker et al. (1996) USA | Random sampling was used. Respondents were 39% male and61% female and aged 18–64. Data was taken from surveys 1987–1989, 1989–1990 and 1992–1993 of the Pawtucket Health Program conducted among home dwelling adults. | Alcohol intake assessed with food frequency question as a component of the general health survey was compared against alcohol intake assessed with a graduated-frequency measure as part of a survey. | Short-term recall measure (beverage specific past 24 h recall). Annual graduated-frequency measure | Concurrent validity-kappa statistics reported between measures ranged from 0.08 (p < 0.001), 0.38 (p < 0.001) and 0.81 (p < 0.001), indicating good concurrent validity for high consumers of alcohol only. Inter-rater reliability Kappa values for both measures were (0.28–0.47). Inter-rater reliability was poor (below 0.70). | Criterion validity (poor) Inter-rater Reliability (fair) |
Poikolainen et al. (2002) Finland | Volunteer sample recruited from their workplace. Respondents were 83% female and 17% male with a mean age of 42 years. | Quantity-frequency and graduated-frequency obtained before and after 1-month daily recall on alcohol intake. Blood sample obtained at outset. | Annual quantity-frequency questionnaire. Daily graduated-frequency measure. Short-term recall measure (past month recall of intake). | Convergent validity-coefficients were 0.95 between the short-term recall measure and quantity-frequency 1, 0.95 between the short-term recall measure and quantity-frequency 2, 0.90 between the short-term recall measure and graduated-frequency 1 and 0.93 between the short-term recall measure andgraduated-frequency 2. Convergent validity was reported as good. | Convergent validity (good) |
Read et al. (2006) USA | College students who reported drinking different amounts of alcohol were selected for the sample to be representative of variation in drinking levels. Respondents were 52% female and 48% male with a mean age 19 years. | College students completed self-report questionnaire on demographic characteristics, drinking behaviours and drinking consequences. Drinking consequences assessed with composite measure based on Drinker Inventory of Consequences and Young Adult Alcohol Problem Screening Test developed by researchers. | Short-term recall measure (past 90 day intake). | Concurrent validity-correlation values of 0.36, p < 0.001 and with quantities of alcohol consumed with anr value of 0.31, p < 0.001, indicating poor concurrent validity. | Criterion validity (excellent) |
Rehm et al. (1999) Canada | The sample was chosen to be representative of the wider drinking population. Respondents were 48% male and 52% female, and chosen to be representative of age ≥ 18 years. | Population samples from 4 surveys conducted for Alcohol Research Group. Surveys used computer-assisted telephone interviews with random digit dialling sampling techniques. | Quantity-frequency measure for drinking occasion. Annual Graduated-frequency measure. Short-term recall measure (past week recall. | Convergent validity-correlations moderate at both approximately 0.40. Predictive validity-estimates by graduated-frequency measure 22% higher than short-term recall estimate. Quantity-frequency estimate of alcohol-related mortality 13% than short-term recall estimate, indicating poor predictive validity. | Convergent validity (fair) Predictive validity (excellent) |
Reid et al. (2003) USA | Random sampling was used. The veteran primary care sample was 3% female 97% male and the community dwelling sample was 60% female 40% male. Mean ages were 73.1 for the veteran primary care sample and 75.9 for the community dwelling sample. | Weekly quantity-frequency measure. | Inter-rater reliability-kappa values were 0.44 and 0.33. For population sample 2 kappa values were 0.21 and 0.46 indicating moderate to poor inter-rater reliability. | Inter-rater Reliability (fair) | |
Russell et al. (1991) USA | Random sampling was used. Respondents were 50.5% male and 49.5% female and aged over 18 years. Data was taken from 1 time point of the survey. | Quantity-frequency questions were asked about the amount and frequency of particular alcoholic beverages consumed via telephone interview using a random-digit-dial technique and supplemented by samples of homeless people, college students and those without telephones. | Typical annual beverage-specific Quantity-frequency measure | Criterion validity-correlations between 0.73 and 0.77 for subtypes of alcohol reported showing good criterion validity. | Criterion validity (poor) |
Sander et al. (1997) USA | 175 patients with traumatic brain injury were recruited from a medical rehabilitation centre along with their relatives. Respondents were 65% male and 35% female. Mean age 39.2 years for patients and 45.9 years for relatives. | Alcohol use examined 1 year after injury through quantity-frequency measure and brief MAST test. Patients and their relatives both completed measures and concordance between reports were examined. | Annual quantity-frequency measure | Concurrent validity-concordance showed 95.4% agreement indicating good criterion validity. | Criterion validity (fair) |
Searles et al. (1995) USA | The sample was chosen to be representative of male drinking population in Vermont enrolled in the Alcohol Research Centre. Respondents had a median age of 28 years(ranging from 21 to 56 years) and were 100% male. | Subjects self-reported daily alcohol intake via telephone.At 90days subjects completed an interview using DSM criteria to assess alcohol abuse ordependence. | Short-term recall measure (Daily self-report of alcohol intake). Short-term recall measure (annual retrospective recall). | Predictive validity-correlations0.86 andwith alcohol related problems level as 0.69. Predictive validity is moderate between daily self-report and retrospective recall and alcohol related problems, and good between daily self-report and retrospective recall and alcohol intoxication level. | Predictive validity (poor) |
Searles et al. (2000) USA | Volunteer sample of those enrolled in the Vermont Alcohol Research Centre. Respondents were 100% male and had a mean age of 36.2 years for those without alcohol problems tested at outset and 30.4 years for those with alcohol problems. | Participants recorded alcohol intake on interactive voice response system using telephones. In person interviews were conducted every 13 weeks during which they completed timeline follow back. Results were compared. | Short-term recall measure (Timeline Follow back over 366 days). Short-term recall measure (Daily self-report of alcohol intake). | Convergent validity-correlations 0.60 at 180 days of administration, 0.57 at 270 days of administration and 0.57 at 366 days of administration, indicating moderate convergent validity. | Convergent validity (fair) |
Tuunanen et al. (2013) Finland | The sample included 45 year olds resident in Finnish city of Tampere. The sample was 100% male. | Participants completed a mailed health questionnaire which invited previous week recall of alcohol intake, a quantity-frequency measure and structured quantity-frequency questions based on the AUDIT. | Quantity-frequency measure (typical drinks consumed per occasion). Short-term recall measure (past week recall). | Hypothesis validity-the past week recall measure reported mean alcohol consumption lower than the quantity-frequency measure indicating good hypothesis validity. | Hypothesis validity (fair) |
Weingardt et al. (1998) USA | Random sampling was used. Respondents were 58% female and 42% male and aged 18–20 years.Data was taken from 1990 and 1994 cohorts of college undergraduate students. | Peak consumption, typical weekend quantity and typical daily quantity measures used to derive binge drinking data to analyse validity. Binge drinking defined as 5–6 drinks per occasion for men and 3–4 drinks per occasion for women. | Graduated-frequency measure (peak monthly alcohol consumption). Graduated-frequency measure (typical weekend quantity). Short-term recall measure (typical daily quantity). | Concurrent validity-r value 0.57 and Alcohol Dependence Scale with r value 0.54. Predictive validity-daily quantity measure classified 6.2% of drinkers as chronic and 7.4% indicating poor predictive validity. | Criterion validity (good) Predictive validity (good) |
Whitfield et al. (2004) Australia | Voluntary sample. Respondents were 36% male and 64% female with a mean age of 33.7 years. Data was taken from 3 waves (1980, 1989 and 1993) using adult male and female participants of the AustralianTwin Registry. | Test-retest reliability was calculated as correlations between occasions and between measures. Relationships between alcohol use and lifetime DSMIIIR alcohol dependence examined. | Annual quantity-frequency measure. Short-term recall measure (past week recall of alcohol intake). | Test-retest reliability-correlations between (0.54–0.70) indicating moderate to good test-retest reliability. | Test-retest reliability (fair) |
Methodological quality assessment
Test-retest reliability
Criterion validity
Construct validity
Hypothesis validity
Predictive validity
Convergent validity
Discussion
Psychometric property ratings for measure types
Discrepancies between COSMIN ratings and psychometric properties
Issues with self-reporting alcohol consumption
Comparison with previous reviews
Recommendations for improved reliability and validity
Measure type | Advantages | Disadvantages |
---|---|---|
Quantity-frequency measures | • Easily administered. • Simple structure; respondents are more likely to understand the measure. • Well-established (respondents are more likely to be familiar with the measure). • Captures ‘usual’ drinking behaviour, unaffected by occasions or seasons where more alcohol consumption may occur. • Can increase reliability by including beverage-specific questions. | • May not record heavy episodic drinking occasions. |
Graduated-frequency measures | • Categories act as prompts for respondents. • Answers are easily standardised to identify those drinking above the guidelines. • Can increase reliability by including beverage-specific questions. | • May not record heavy episodic drinking occasions. |
Short-term recall measures | • Can focus questions on specific drinking events. • Requires respondents to consider their responses to a greater extent (as answers are not structured). • Respondents can report their alcohol consumption (in standard drinks sizes, units etc.) in a way they are familiar with. • Can increase reliability by including beverage-specific questions. | • Hard to standardise answers to the same measure recorded in different formats. • Respondents may be confused by lack of response options. |