Background
An important factor in the prediction of disease risk is the family history of disease, as the presence of such a history indicates that the patient has some underlying susceptibility. If few risk factors for a disease are known, but a familial association is observed, assessing family history of disease is one of the simplest and most cost-effective tools for risk prediction. However, the concept of familial risk is not as simple as it appears. Indeed, relatives may have an increased risk of disease due to genetic, epigenetic, common environmental/behavioral factors, or a combination of these. Very often these underlying determinants are unknown. Despite improved clinical health registries and the rapid development of genetic research methodology, observed factors explain only a minor proportion of the variation in disease risk within a given population [
1]. For example, having a BRCA mutation increases the risk of breast cancer dramatically, but can explain only a minor proportion of all breast cancers [
2]. Similarly, an underlying, unobserved heterogeneity in risk is likely to be important for many diseases [
1].
This failure to explain the causes of complex diseases is currently a matter of debate. For example, the impact of chance
per se in cancer development has been heavily discussed after being sparked by Tomasetti and Vogelstein, who implied that a large proportion of cancers are simply due to ’bad luck’ [
3,
4]. When the causes of a disease are unknown, studying its familial aggregation may help us understand how the risk is distributed in the population due to both observed and unobserved factors. This is not only important from a scientific point of view; it can also reveal the consequences of having an affected relative. In practice, this information may be important for genetic counseling and for follow-up of individuals with a family history of disease [
5].
Familial associations are often quantified in terms of familial relative risks (
FRRs). Generally, the
FRR denotes the risk of disease when a family member is affected compared to the risk level in the general population. Specific types of familial relationships, like first-degree relatives, parent-child, or siblings might also be of interest. A familial association has been demonstrated in a wide range of diseases. In the last decades,
FRRs for virtually all cancers have become readily available in the literature [
6]. For breast, colon, and prostate cancer, the risk has been reported to double when a family member (first-degree relative) has the disease [
6‐
9], and the risk further increases if several family members are affected. Furthermore, several autoimmune and neurodegenerative diseases also have a substantial familial association [
10,
11]. Therefore, understanding the information that is carried by an
FRR is becoming increasingly important.
Fundamentally, we can assume two different points of view when studying familial associations. One view consists of focusing on observed familial associations in incidence, e.g., measured as the
FRR, which can be calculated immediately from the data. The other focuses on how the risk varies between families, i.e., rather than consider summary measures like the
FRR in isolation, we can investigate how the disease risk is distributed across families in the population. Indeed, it is logical that a familial association of disease risk implies a variation between individuals in a population. However, the connection between these two views is not immediate or intuitively easy. It is generally under-appreciated that a risk factor that is correlated within a family has to be very strong to produce even a moderate familial association in disease risk [
12‐
14]. In the present study, we will use different models to illustrate various possible, potentially surprising, relationships between these two views.
First, we will study a simple dichotomous risk model, by dividing the population into two distinct risk groups. All members of a family (e.g., a group of siblings) belongs to the same risk group. This model provides simple, yet informative illustrations of the relationship between observed familial risk and actual differences in disease risk. If FRRs are available for both one and two affected family members, then the actual relative risk between the two risk strata, as well as the size of both of these strata, can be calculated. Next, we will study a slightly more detailed model, using a continuous distribution for the risk of developing disease. This facilitates the computation of Lorenz curves and Gini coefficients, which are well established methods for measuring inequality in wealth in economics. Finally, we discuss the issues highlighted in the paper, and their implications.
Discussion
Even when a familial disease association is modest, the variation in individual risk could be substantial. Using simple statistical models we have shown relationships between the
FRR and the
IRR that are intuitively surprising. Even familial disease risks that are apparently modest, like the doubling of risk in relatives of patients with breast cancer or colon cancer, imply large variations in risk between families [
31,
32]. It is important that clinicians and epidemiologists are able to recognize the consequences of these relationships. In particular, even if an
FRR is modest, some families would have a remarkably larger risk of developing the disease than others. In fact, the risk distribution may be more skewed than the income distributions of many countries. This points to a fundamental, and to a large extent unexplained, variation in risk. This variation occurs due to genetic predispositions, but also environmental, and various other sources of variation, including more unspecific, random variation [
1]. Importantly, this unexplained variation in risk may lead to considerable selection bias in observational studies [
33,
34]. Estimates of FRRs may actually be used to adjust for this bias, e.g., in Cox proportional hazards models [
35].
Using
FRRs for counseling is tempting when other tools for predicting individual risks, such as genetic tests or biomarkers, are lacking. It could also be tempting to use
FRRs to aid in targeted screening [
6,
36]. For example, many countries use family history of colorectal cancer in screening recommendations today [
37]. However, the quantifications of familial associations are often crude, sometimes limited to one number for a given familial relationship. We emphasize that this
FRR could be deceiving, even for relatively rare diseases. A correct interpretation of the
FRR is therefore crucial. Individuals could suffer from unnecessary worry and testing if the
FRR is misinterpreted. On the other hand, we could also fail to identify high-risk individuals by putting too much confidence on moderate
FRRs, which are often population averages. In reality,
FRRs vary continuously along a number of parameters. The number of affected family members is of obvious importance, but the number of healthy members may also provide important information. Furthermore, the aspect of
time is essential. The ages at which family members acquire a disease (or remains disease free) is crucial to determining the level of risk. Taking these aspects into account is possible, e.g., by using methods from survival analysis that provides the opportunity to make risk predictions based on the family history of specific individuals [
16,
20,
38]. These detailed
FRRs could be more useful for individualized risk predictions, and could thus contribute to targeted screening as well as help focus preventive efforts.
Estimated
FRRs from epidemiological studies are often taken at face value in other biomedical disciplines. For example, these estimates are frequently used as reference values in genome-wide association studies (GWAS’) [
39‐
42], in which the proportion of the
FRRs that can be attributed to specific, or all known risk alleles are frequently reported. However, methods for estimating
FRRs may vary in their complexity. Thus, the proportion of the
FRR explained in GWAS’ is dependent on the methods used in the epidemiological studies they base their reference on. A review of the genetics of type 1 diabetes stated that 34 susceptibility loci explained 60% of the variation in risk, based on an estimated
FRR reference value of 15 [
43]. The review was criticized for using this reference value as it was estimated in a period when risk was lower than it is presently, and it was suggested that 12 was a better estimate [
44]. Applying this reference value changed the explained variance to 75%. The criticism was sound, but alternative methods for estimating the
FRR might have produced different (or more detailed) estimates [
16,
20,
38]. Rather than focusing on a single estimate of the
FRR, investigating the sensitivity to different methods and measures of association seems like a good practice. Furthermore, the
FRR encompasses much more than merely genetic inheritance. If all susceptibility alleles could be identified, they would not explain 100% of the familial risk. Also, a gene with a strong impact on disease susceptibility might only explain a minor fraction of the
FRR, depending on the magnitude of the
FRR and the population frequency of the allele [
45]. Hence, it is not clear what information that is provided by this measure (proportion of
FRR explained), and it seems difficult to compare it across different diseases.
Interpreting familial risks can be challenging, especially when relating them to heredity. Even if a mutation gives a high life-time risk of, say 50%, half of the high-risk individuals would not be expected to develop the disease. Furthermore, the mutation would only be passed on to half of the individuals in the next generation, on average. Consequently, many cancers diagnosed in genetically predisposed individuals would appear to be sporadic (i.e., occurring in individuals without a family history). Such an argument has been used to suggest that the majority of hereditary, early-onset breast cancers appears sporadically in the population [
46]. In a recent study, Cremers et al. found that the same single-nucleotide polymorphisms (SNPs) were associated with an increased risk of both hereditary (defined as three cancers in first-degree relatives) and sporadic prostate cancer, and concluded that “hereditary prostate cancer most probably is merely an accumulation of sporadic prostate cancers” [
47]. Although that might be the case, it is possible to draw the opposite conclusion; that sporadic cancers are, in fact, also inherited. This underscores the fact that relating the
FRR to heredity is not straightforward. If assessing the degree of heritability is the aim, one should look to other types of studies, e.g., to twin studies [
48,
49]. However, the large variation in risk implied by even a moderate familial risk is unlikely to be explained by environmental risk factors correlated in families [
12,
13,
50].