Background
Measurement error in nutritional epidemiology
Within-person random error: This is the variation that is observed in exposure using a specific instrument when it is repeatedly measured in the same individual. A nutritional example would be the day-to-day variation in dietary intake reported using multiple 24-h recalls for an individual (assuming that it is possible to capture a single day’s dietary intake perfectly). The day-to-day variation may be random and thus results in an estimate of usual intake that is unbiased meaning that a person’s true usual intake is estimated accurately on average with several repeat measurements, although with some error. |
Between-person random error: When error is random between individuals it results in an unbiased estimate of the mean usual exposure for the population of interest. Even with random measurement error within a person, it is possible to calculate an unbiased estimate for the population, by balancing out overestimation of some individuals with underestimation for others. With between-person random error the mean is estimated without bias, but the variance is inflated. In nutritional research this can be the result of using a single or a few repeat measurements of dietary intake per individual in the presence of within-person random error. |
Within-person systematic error: Systematic errors are biases in the measurement of an exposure that consistently depart from the “true exposure” value in the same direction. Within-person systematic errors are systematic errors that are specific to an individual that are manifested as a positive or negative difference between an individual’s reported exposure. For example, some individuals may occasionally use dietary supplements which may lead to “systematic additive error,” indicating that a constant error is added to each person’s reported dietary intake. This could lead to over- or underestimation for all participants by the same amount. This directional difference (or intake-related bias) is usually constant within an individual and would remain regardless of how many repeat measurements are taken. Within-person systematic error may be related to individual characteristics, such as social/cultural desirability, that affects how a particular individual reports dietary intakes. |
Between-person systematic error: Systematic errors in exposures can be additive or multiplicative. Additive between-person systematic error can occur when the dietary instrument of interest causes every measurement to be too large or too small by a constant amount from the truth. For example if the additive systematic error was negative each participants reported intake would be lower than their true intake using the dietary instrument of interest. Multiplicative between-person systematic error can occur when instead of reporting their true intake all participants report a fixed multiple of their true intake. This can be thought of as an intake-related bias where there is a systematic deviation from the truth due to a correlation between errors in the dietary instrument of interest and true intake. The attenuation (or flattened slope phenomenon) happens when both additive and multiplicative (intake-related bias) are present, which is typical in nutritional epidemiology. Person-specific bias is another type of between-person systematic error that may occur, if for example, a person that takes a dietary supplement every day – their average intake will be different from the predicted group-level flattened slope. |
The classical measurement error model assumes additive error that is unrelated to the targeted consumption, unrelated to other study subject characteristics, and independent of the corresponding measurement error in the dietary instrument of interest [103]. It is important that nutritional epidemiologists are aware of what sort of impact measurement error can have on diet-disease associations derived from even generally well conducted large-scale epidemiological studies. If there is a linear relationship between a single dietary exposure and the disease of interest, as in a logistic regression model, and this is also the case for a Cox regression or linear regression model, then the effect of classical measurement error is to attenuate the diet-disease association [37, 47]. This means that diet-disease associations such as log odds ratio, log hazard ratios or linear regression coefficients will be biased towards the null and a further consequence of classical measurement error in linear models is a loss of power to detect diet-disease associations. Classical measurement error in a multivariable exposure situation can bias the diet-disease associations in any direction, even in a linear regression model [47]. Other types of error that depend on the ‘true’ exposure (i.e. systematic error) or that depends on the outcome (i.e. differential error), may result in biases either away or towards the null in an unpredictable manner. [42, 47] |
Dietary assessment methods
Gold standards and alloyed gold standards
Reference instruments and calibration studies
In general terms, a study of ‘relative validity’ is one that compares the performance of two or more imperfect instruments, for example, food frequency questionnaire (FFQ) relative to other self-reported instruments, such as 24-h dietary recalls and food records [104]. The evaluation of a dietary instrument can therefore involve both the assessment of its measurement error structure and its correlation with the truth (i.e. its ‘relative validity’). |
Often researchers will aim to assess the ‘relative validity’ of a new dietary instrument, such as a FFQ, by comparing its results with those obtained with a more accurate measure of food or nutrient intake. This can be in the context of the development of a new instrument, to test whether it provides improvement over currently used instruments, or for the use of an existing instrument in a different population from the one in which it has been developed. The development of any given FFQ is based on the dietary intake of a defined population during a specific period in time, and when these instruments are to be used in other populations, it is important to evaluate whether the instrument gives the same results when repeated on several occasions (the ‘reproducibility’) as well as its ‘relative validity’ in the new target population [11]. |
For the purpose of this report, we collectively refer to “calibration studies” to indicate studies that either (i) aim to assess systematic error by comparing a dietary assessment instrument with “true exposure” (or “gold standard” reference instrument) or with a known superior dietary instrument which may also be prone to its own measurement error as the reference instrument (an “alloyed gold standard”); (ii) aim to assess random error by taking repeat measurements using the same dietary instrument. Calibration studies can be “internal” if they are performed on a subsample of the main study, or “external” otherwise [46]. Calibration studies that use repeat measurements are common because under the classical measurement error model the error prone measurements of dietary intake are described as unbiased measures of ‘true’ exposure. This is due to the fact that under the classical measurement error model (i.e. errors in repeat measurements are uncorrelated) the average over a large number of repeated measurements would provide a good estimate of the ‘true’ exposure [7]. |
Methods
Study identification and data extraction
Results
Approaches to quantify the relationship between different dietary assessment instruments and “true intake”
Reference outlining the method | Classical measurement error model assumed | Requirements of calibration study | Relationship between reference instrument and dietary instrument of interest. | Aim of the approach |
---|---|---|---|---|
Method of Triads [105] | Yes | Three methods of assessment of dietary intake to be available (e.g. FFQ, 24-h dietary recalls and a biomarker) | The minimal statistical requirements are that the measurements from the three instruments are linearly related to the true intake levels and their random errors are statistically independent (i.e. uncorrelated). | To assess the ‘relative validity’ of dietary intake when the quantitative information was available for three methods (usually FFQ, 24-h dietary recalls and a biomarker). |
Method of Triads Extension 1(MOTEX1) [2004] [27] | No | Superior or gold standard reference instrument available | The correlation between errors in the dietary instrument of interest and reference instrument can be non-zero (i.e. the errors are not statistically independent). | Aim to estimate the magnitude of correlations between errors in reference and the dietary instrument of interest (e.g. a FFQ). |
Method of Triads Extension 2 (MOTEX2) [2005] [36] | No | Multiple dietary assessment methods required (e.g. self-reported instruments and biomarkers). | Three surrogate variables questionnaire (Q); M, and P where M and P are both instrumental (often biological) variables. No conventional reference instrument is required. M and P can be concentration biomarkers rather than recovery biomarkers. | To estimate the correlation between a dietary instrument of interest (Q) and true intake (T). |
Method of Triads Extension 3 (MOTEX3) [2007] [35] | No | Multiple dietary assessment methods required (e.g. self-reported instruments and biomarkers). | No conventional reference instrument is required. Requires that error correlations between dietary estimates and biomarkers or between biomarkers be close to zero. M, and P are biomarkers with M being a direct measure of dietary intake and M and P are chose so that one has a long half-life and the other a short half-life. | Aimed to produce corrected estimates of the effects on an outcome variable of changing the true exposure variables by one standard deviation, a standardized regression calibration. |
Correlation analysis
When two measures are correlated, measurement error can lower the correlation coefficient below the level it would have reached if the measures had been free from measurement error. A de-attenuated correlation coefficient can be computed to correct for attenuation due to within-person variation if repeat measurements are available on the reference method. If for example, the dietary instrument was a FFQ and the reference instrument were multiple food diaries the de-attenuated correlation (ρ), under the assumption of a classical measurement error model, could be obtained by the formula: ρ = r √[1 + (wpv/bpv)*n] |
Where r is the observed correlation; wpv is the within-person variance of the reference method; bpv is the between-person variance of the reference method; and n is the number of repeat measurements of the reference method [18]. Often variation due to daily energy intake is removed by adjusting for total energy using the residual method [106] prior to accounting for within-person variation in order to produce energy-adjusted de-attenuated correlations. |
Method of triads
Extensions to approaches based on method of triads
Approaches to adjust estimates in diet-disease associations for measurement error
Reference outlining the method | Requirements of the calibration study | Relationship between reference instrument and dietary instrument of interest. | Aim of the approach |
---|---|---|---|
Intra-class correlation [107] | Repeat measurements are available on the same individuals on the error prone dietary instrument. | No reference instrument is required just repeats of the dietary instrument of interest. However, the measurement errors in the repeated measures should be uncorrelated. | To be able to correct relative risk estimates and other regression slopes for bias. This approach can also be used to assess the reproducibility of a dietary instrument where a higher value indicates lower within-person variation. |
External sample with gold standard reference instrument or repeat measures of the error prone dietary instrument of interest measure. | No correlation between the measurement errors in reference instrument and dietary instrument of interest. | To be able to correct relative risk estimates and other regression slopes for bias. | |
Multivariable regression calibration (MVRC) [42] | External sample with gold standard reference instrument or repeat measures of the error prone dietary instrument of interest measure. | No correlation between the measurement errors in reference instrument and dietary instrument of interest. | To be able to correct relative risk estimates and other regression slopes for bias. |
Intra-class correlation
Regression calibration
Extensions to linear regression calibration to address departures from the main assumptions
Reference outlining the method | Requirements of the calibration study | Relationship between reference instrument and dietary instrument of interest. | Aim of the approach |
---|---|---|---|
Person-specific bias adjusted regression calibration (PSBRC) [52] | Superior or gold standard reference instrument available. | An estimate of the person-specific bias in the reference measure and its correlation with systematic error in the FFQ is required. | To be used as sensitivity analysis in order to assess the impact of varying pre-specified ratios of the variance of the person-specific biases in a reference instrument and FFQ and the correlation between these biases. |
Flawed reference instrument adjusted regression calibration (FRIRC) [50] | Internal or external sample with superior or gold standard reference instrument available. | Extension of PSBRC where the model assumes for both the FFQ and the dietary report reference instrument, group-specific biases related to true intake and correlated person-specific biases can be estimated. | To be used as a sensitivity analysis in order to assess the impact of additional complexity of both group and person-specific biases. |
Biomarker and alloyed gold standard regression calibration (BAGSRC) [51] | Internal or external sample with superior or gold standard reference instrument available | Model assumes that there is a correlation between the “alloyed gold standard” and the level of exposure using dietary instrument of interest. If a third method of exposure assessment (biomarker) is available and it is reasonable to assume that the errors in this method are uncorrelated with the errors in the other two exposure assessment methods. | Estimate the bias in the standard regression calibration due to the correlation between alloyed gold standard and the level of exposure using dietary instrument of interest. Derive estimates of the correlation between the errors in alloyed gold standard and exposure assessment using biomarker data. |
Internal or external sample with superior or gold standard reference instrument available (if a biomarker –then replicates) | The models account for correlated errors in the FFQ and the 24-h diet recall and random within-person variation in the biomarkers. | To be used as a sensitivity analysis in order to assess the impact of correlated subject-specific error on correction factor. | |
Episodically consumed foods regression calibration (ECFRC) [59] | External sample with superior or gold standard reference instrument available | Model assumed that a food is reported on the 24HR as consumed on a certain day if and only if it was consumed on that day. Also that the 24HR is unbiased for true usual intake on consumption days. | To predict an individual’s usual intake of an episodically consumed food and relating it to a health outcome. |
Never and episodic consumers (NEC) model [88] | A subset of the population has repeat measurements of dietary instrument of interest. | Assumes that food record measurements are subject only to random within-person variability. The observed food record measurements are unbiased estimates of “true intake”. Nonzero food records measurements to be normally distributed on a transformed scale. | To predict an individual’s usual intake of an episodically consumed food whilst incorporating never consumers and relating it to a health outcome. |
Departures from assumptions of classical measurement error in the reference instrument
Use of biomarkers in addition to dietary instruments in regression calibration
Incorporating episodically consumed foods in regression calibration
Other approaches
Reference outlining the method | Classical measurement error model assumed | Requirements of calibration study | Relationship between reference instrument and dietary instrument of interest measure. | Aim of the approach |
---|---|---|---|---|
Simulation Extrapolation (SIMEX) [63] | Yes | External sample with two concentration biomarkers and internal sample with repeat measurements of the FFQ were also used. | Assumes random within-person error for FFQ and that concentration markers are uncorrelated. | To assess the impact of measurement error in nutrient intake as assessed by a FFQ when concentration biomarkers are also available. |
Structural equation modelling [64] | Approach can be used with and without assuming a classical measurement error model. | Superior or gold standard reference instrument available with repeat measurements. | Varied the assumptions of the relationship of the reference instrument with the dietary measure. | Aimed to assess the different types of error (either random or systematic), and within or between individuals-that may occur in dietary intake measurements. In addition to demonstrate that the inclusion of biomarker data can allow the estimation of the average magnitude of these errors even if random errors of repeat measures of the reference instrument are correlated. |
Moment Reconstruction (MR) [65] | No | Internal sample with gold standard reference instrument available. | Assumes that disease D, true exposure (X), exposure based on dietary instrument of interest (Z) and biomarker (M) are multivariate normal distributed. | As a sensitivity analysis to show that other “substitution methods” have advantages over standard regression calibration when the measurement error is differential (i.e. error is related to disease outcome D). |
Imputation (IM) [65] | No | Internal sample with gold standard reference instrument available. | Assumes that disease D, true exposure (X), exposure dietary instrument of interest (Z) and biomarker (M) are multivariate normal distributed. | As a sensitivity analysis to show that other “substitution methods” have advantages over standard regression calibration when the measurement error is differential (i.e. error is related to disease outcome D). |
Impact of departures from classical measurement error model on statistical power
Discussion
General design of a calibration study
Calibration studies to assess and correct for measurement error
Calibration studies to assess the ‘relative validity’ of a new dietary instrument
Number of replicate measures and sample size of the calibration study under the assumption of random within-person measurement error
Generalizability of the calibration study information under the assumption of random within-person measurement error
The most robust approach to correct point and interval estimates for measurement error
A. When should measurement error bias/sensitivity analyses be conducted? |
1. When assessment of the observed diet-disease associations was estimated using a crude instrument of dietary intake such as a FFQ. |
2. Essential when the study report aims to translate their findings into policy decision-making actions for a variety of stakeholders. |
B. How does one select a method to conduct a model measurement error bias/sensitivity analysis? |
1. Aim to balance realistic modelling with practicality of conducting the modelling (e.g. availability of software). |
2. Report the measurement error bias/sensitivity analyses as transparently as possible, giving clear details of what was done and the assumptions made. |
3. Make the statistical analysis code used to conduct these measurement error bias/sensitivity analyses available either as supplementary web material or by publishing it as an appendix to the main report. |
C. How does one assign values to the parameters of the model? |
1. Assign values based on the latest information from available data such as internal calibration sub-studies or external calibration studies with a similar design. |
2. Choose a range of plausible values in order to assess the impact on the overall findings of a range of scenarios. |
3. Evaluate the impact of departures from the assumptions of the classical measurement error model (such as correlated errors between the dietary instruments used or non-differential measurement error). |
D. How does one present and interpret the measurement error bias/sensitivity analysis? |
1. Present the results in the form of a table or figure where it is possible for the reader to see the complete set of analyses performed. |
2. Quantify the direction of the bias based on departures from the classical measurement error model on the overall study findings (e.g. are the observed diet-disease associations likely to be over- estimated or under-estimated?). |
3. Describe the implications in light of the measurement error bias/sensitivity analysis (are the policy decisions changed or toned-down in light of these findings?). |