Definition and previous estimators for PAF (binary exposures)
We first define PAF and possible estimators assuming a binary disease indicator,
Y, and a binary risk factor (or synonymously binary disease exposure),
A. We also state some approximations that will be used in the suggested plots, leaving their justification to the Additional file
1. While many authors have defined
PAF using conditional probabilities for
Y given
A, attributable fractions are causal concepts and deserve a causal definition. With this in mind, we adopt a counterfactual notation, [
8], where the pair (
Ya = 0,
Ya = 1) denotes the potential (or counterfactual) binary disease outcomes for an individual under the two scenarios that that they were exposed to the risk factor
A (
a = 1), and that they were not exposed to the risk factor
A (
a = 0). One interpretation of the pair (
Ya = 0,
Ya = 1) is that they are the disease outcomes that would be observed for that individual in two almost identical universes, which differ only according to whether that individual was exposed to the risk factor, and in the possible consequences of this exposure. In the situation that (
Ya = 0,
Ya = 1) = (0, 1), the risk factor,
A, has is regarded as having a causal effect on disease for that individual . In reality, we observe either
Ya = 0 or
Ya = 1, but not both, as every individual (at least at a point in time) is either exposed or unexposed to
A.
Given these preliminaries, the population attributable fraction can be defined [
8] as:
$$ PAF=\frac{P\left(Y=1\right)-P\left({Y}^{a=0}=1\right)}{P\left(Y=1\right)}, $$
(E1)
where
P(
Ya = 0 = 1) can be interpreted as the disease prevalence in a population where nobody was exposed, and
P(
Y = 1) is current disease prevalence in the current population. While in general PAF can be negative, here we assume that the risk factor has been coded so that
P(
Y = 1) >
P(
Ya = 0 = 1), which is usually implied if
a = 0 indicates absence of the risk factor. As explained above,
Ya = 0 is only observed on the group of individuals who are unexposed to the risk factor, and as a result
P(
Ya = 0 = 1) and by extension E1 are not directly estimable. To proceed, three technical assumptions, usually referred to as consistency, positivity and conditional exchangeability, are needed (see Table
1 and [
8] for further discussion). In this manuscript, we also assume no multiplicative interactions involving the exposure, or more precisely that the relative risk within a strata
c of the confounders:
RR =
P(
Y = 1|
A = 1,
C =
c)/
P(
Y = 1|
A = 0,
C =
c) does not depend on
c [
10]. Under these conditions one can rewrite E1 as follows:
$$ PAF=\frac{P\left(A=1|Y=1\right)\left( RR-1\right)}{RR}. $$
(E2)
Table 1
Definitions, assumptions and approximations for PAF when the exposure is binary, multi-category and logistic
Counterfactual definition of PAF |
\( \frac{P\left(Y=1\right)-P\left({Y}^{a=0}=1\right)}{P\left(Y=1\right)} \)
|
\( \frac{P\left(Y=1\right)-P\left({Y}^{a=0}=1\right)}{P\left(Y=1\right)} \)
|
\( \frac{P\left(Y=1\right)-P\left({Y}^{a={j}_0}=1\right)}{P\left(Y=1\right)} \)
|
Assumptions: | 1. Standard causal inference assumptions • Conditional exchangeability (counterfactual outcome Ya = j and assigned risk factor A are independent random variables, within strata of observed confounders c • Consistency of counterfactuals: Ya = j = Y when A = j for all levels j of the risk factor A • Positivity 0 < P(Ya = j = 1| C = c) < 1 for all j and strata c 2. No interactions (P(Ya = j = 1| C = c)/P(Ya = k = 1| C = c) does not depend on c), for any possible values of exposure j and k 3. Rare disease assumption (P(Y = 1) small) |
Re-expression of PAF (given assumptions 1. and 2.) | P(A = 1| Y = 1)(RR − 1)/RR | \( \sum \limits_{j=1}^KP\left(A=j|Y=1\right)\left(R{R}_j-1\right)/R{R}_j \)** | \( {\int}_{-\infty}^{\infty }f\left(j|1\right)\frac{RR(j)-1}{RR(j)} dj \) ** |
aCorresponding logistic model (Given assumption 3.) | logit(P(Y = 1| A = j, C = c)) =μ + βj + γ(c) | logit(P(Y = 1| A = j, C = c)) = μ + βj + γ(c) | logit(P(Y = 1| A = j, C = c)) = μ + β(j) + γ(c) |
Logistic Approximation for PAF (Given assumptions 1,2 and 3) |
\( \frac{\hat{P\left(A=1|Y=1\right)}\left({e}^{\hat{\beta_1}}-1\right)}{e^{\hat{\beta_1}}} \)
|
\( \sum \limits_{j=1}^K\hat{P}\left(A=j|Y=1\right)\left({e}^{\hat{\beta_j}}-1\right)/{e}^{\hat{\beta_j}} \)
| \( {\int}_{-\infty}^{\infty}\hat{f}\left(j|1\right)\left({e}^{\hat{\beta (j)}}-1\right)/{e}^{\hat{\beta (j)}} dj \)*** |
Graphical Approximation |
\( \hat{P\left(A=1|Y=0\right)}\times {\hat{\ \beta}}^{ave} \)
|
\( \hat{P}\left(A>0|Y=0\right)\times {\hat{\ \beta}}^{ave} \)
| \( 1\times {\hat{\beta}}^{ave} \)**** |
“Average” estimated log-odds ratio: \( {\hat{\beta}}^{ave} \) |
\( \hat{\beta_1} \)
|
\( \frac{\sum \limits_{j=1}^K\hat{P}\left(A=j|Y=0\right)\hat{\beta_j}}{1-\hat{P}\left(A=0|Y=0\right)} \)
|
\( {\int}_{-\infty}^{\infty}\hat{f}\left(j|0\right)\hat{\beta (j)} dj \)
|
Note that under the same conditions other estimable expressions for E1 do exist (see (3)), but E2, an expression that was first derived in [
9], has the added attraction of estimability in case-control studies. A short proof of the equality of E1 and E2 under these assumptions is provided for convenience in the Additional file
1, but similar results have been proven already elsewhere [
17,
18].
Under an additional assumption that the disease risk is small under each strata
c of the confounders, the conditional odds ratio:
OR =
Odds(
Y = 1|
A = 1,
C =
c)/
Odds(
Y = 1|
A = 0,
C =
c) where
Odds(
Y = 1|
A = a,
C =
c) =
P(
Y = 1|
A =
a,
C =
c)/(1 −
P(
Y = 1|
A = 0,
C =
c)) is a close approximation for
RR. This implies that under this ‘rare disease’ assumption, PAF can be then estimated by substituting an estimated Odds Ratio,
\( \hat{OR} \), that is adjusted for
c, and the sample proportion of cases with
A = 1,
\( \hat{P}\left(A=1|Y=1\right) \), into E2 . Typically,
\( \hat{OR} \) is then calculated via exponentiating the coefficient for the risk factor,
\( \hat{\beta_1} \), in a logistic regression model (see Table
1) that regresses
Y against
A and
C leading to the estimator:
$$ \hat{PAF}=\frac{\hat{P}\left(A=1|Y=1\right)\ \left({e}^{\hat{\beta_1}}-1\right)}{e^{\hat{\beta_1}}}. $$
(E2b)
This approach described above has formed the backbone of many previous attributable fraction estimators [
11,
12]. In the Additional file
1, we derive the following approximation for E. 2b:
$$ \hat{PAF}\sim \hat{P}\left(A=1|Y=0\right)\ \hat{\beta_1} $$
(E2c)
implying that the estimated PAF is approximately the estimated log-odds ratio between the risk factor and disease multiplied by the estimated prevalence of the risk factor in controls.
Definition of PAF for multicategory and continuous exposures
These definitions and results extend easily to multicategory and continuous exposures. For instance, suppose that the exposure
A can take
K + 1 values:
a ∈ 0, 1, …,
K, with
a = 0 a reference level such that:
$$ P\left({Y}^{a=j}=1\right)\ge P\left({Y}^{a=0}=1\right) $$
(E3)
for aIl
j = 1, …,
K. In this case, the formula for PAF is still given by E1 which now has the interpretation as the proportion of disease cases removed in a hypothetical population where everyone had
A = 0. In the case that
A is continuous, we set the A =
j0 to be a minimum risk level of the exposure variable, that is:
$$ P\left({Y}^{a=j}=1\right)\ge P\left({Y}^{a={j}_0}=1\right) $$
(E4)
for all possible exposure values:
j. Here, a suitable definition of PAF is the following:
$$ PAF=\frac{P\left(Y=1\right)-P\left({Y}^{a={j}_0}=1\right)}{P\left(Y=1\right)}, $$
(E5)
and has the interpretation as the proportion of disease cases removed in a hypothetical population where everyone had
A =
j0. Note that in order to estimate E5,
j0 needs to be a realizable value of the exposure variable, with sufficient data in its vicinity to estimate relative risks. For instance,
j0 = 0 would not be an acceptable value of systolic blood pressure, even if the relationship between blood pressure and disease risk was strictly increasing.
For both multicategory and continuous exposures, the appropriate estimators for PAF, underlying assumptions and possible approximations are similar to those described in the binary case above and are detailed in Table
1, and proven in the Additional file
1. In particular, we still have a result with a similar flavour to (2c):
$$ \hat{PAF}\sim \hat{\mathrm{P}}{\hat{\beta}}^{ave}, $$
(E6)
with
\( \hat{P} \) now the estimated probability of an individual having a non-reference level of the exposure in controls and
\( {\hat{\beta}}^{ave} \) an average of estimated log-odds ratios for various exposure levels of
A compared to the reference weighted according to the distribution of exposure in controls. Provided
\( {\hat{\beta}}^{ave} \) is not too large and the disease is rare, another re-interpretation of
\( {\hat{\beta}}^{ave} \) is it is approximately the average percentage elevation in risk when comparing the actual exposure levels observed in the population to the reference exposure. Note that
E6 reduces to
E2
c in the case that the risk factors are binary.