Introduction

Familial hypercholesterolemia (FH) is an autosomal-dominant disease that leads to markedly elevated low-density lipoprotein cholesterol (LDLC) levels and increased risk for coronary artery disease (CAD) and myocardial infarction (MI). The prevalence of FH is estimated as high as one in 200–500,1 with even higher frequencies in populations with founder effects.2 FH is mainly caused by variants in genes coding for proteins affecting hepatic LDLC uptake including the LDL receptor (LDLR), in which most disease-causing variants are found, as well as apolipoprotein B-100 (APOB) and proprotein convertase subtilisin/kexin type 9 (PCSK9).1, 3, 4, 5, 6 More recently, STAP1 has been proposed as a fourth gene causing FH.7 Cumulatively, variants in these genes explain around 40% of FH cases.8

The phenotype may vary in variant carriers or even copied by clustering of common LDL-modifying variants, each affecting LDLC levels by only a small extent.9 The large number of patients with apparently monogenic FH but without currently known variants also suggests that other genes, which have not been identified so far, may cause FH.

Multiple studies document the preventive effect of intensive medical LDL-lowering at young age to prevent cardiovascular events.10, 11, 12 Therefore, it has been suggested that incidental detection of variants leading to FH should be communicated to the affected individual and the family.13 In fact, owing to the high frequency of FH, several guidelines recommend programs to systematically unravel variants and to facilitate medical treatment already at young age.14, 15, 16 Despite the knowledge of causal genes and the obvious advantages of early therapy only 1–15% of FH cases are diagnosed in most European countries.17 Notable exceptions are Norway (43%) and the Netherlands (71%), in which national screening programs had been initiated.1, 18, 19 There are several reasons why FH is vastly under-diagnosed. First, LDLC levels and other clinical presentations of FH are variable.20 Second, a small family size may obscure the inherited nature of FH,21, 22, 23 and third, with FH being only one of multiple genetic and exogenous conditions affecting CAD risk, it might be overlooked in the large number of CAD/MI patients.

It is assumed that FH explains about 20% of premature CAD cases with familial clustering.24, 25 However, no systematic analysis is available to quantitate FH variants by molecular-genetic screening in individuals with premature CAD. In this work, we evaluate the frequency of FH due to variants in the LDLR, APOB, PCSK9 and STAP1 genes in 255 unselected patients with premature MI (clinical manifestation before the age of 60) and a positive family history.

Materials and methods

Patients

The ascertainment strategy of MI families is described elsewhere.26, 27 In brief, index patients had suffered from MI before the age of 60 years. If at least one additional living sibling was affected with MI or severe CAD (defined by percutaneous coronary intervention or coronary artery bypass grafting) before the age of 70 years, the entire family (index patient, available parents and siblings) was contacted and invited to participate in the study. For the present study, we chose one affected individual each from 255 such families for whole-exome sequencing. Clinical characteristics are detailed in Table 1.

Table 1 Clinical characteristics of the MI patients analyzed in this work

All subjects analyzed in this study gave written informed consent. The local ethical committee (University of Regensburg, Germany) approved the study.

Exome sequencing

Exome sequencing was performed as 54-bp (base pair) paired end runs on a GenomeAnalyzer IIx system (Illumina, San Diego, CA, USA) after in-solution enrichment of exonic sequences (SureSelect Human All Exon 38 Mb kit, Agilent, Santa Clara, CA, USA), yielding on average 6.2 giga bases (Gb) of sequence per individual. Read alignment was performed with BWA (v. 0.5.8) using the default parameters. We used the human genome assembly hg19 (GRCh37) as reference. A small percentage of duplicate reads (4–5%) were removed. Single nucleotide variants and small insertions and deletions (indels) were detected using SAMtools (v 0.1.7). For the variant filter part of SAMtools, we used the default parameters with the exception of setting a maximum read depth to 9999. Furthermore, we required putative single nucleotide variants to fulfil the following criteria: (i) median base quality of the variant bases of at least 15; (ii) a minimum of 15% of reads showing the variant base; and (iii) the variant base is indicated by at least 5% of reads coming from different strands.

Variant validation

We used annovar28 to annotate the single nucleotide variant. Annotation was based on several databases provided by annovar such as UCSC known gene,29 dbSNP (http://www.ncbi.nlm.nih.gov/SNP/), Exome Sequencing Project (http://evs.gs.washington.edu/EVS/) and 1000 Genomes.30 In addition, we also annotated several functional prediction scores such as SIFT,31 CADD,32 PolyPhen233 and Mutation Taster.34 Variant validation was performed using PCR and Sanger sequencing. For co-segregation analysis, validated variants were screened in affected and unaffected family members.

Primers that were used for validation are listed in Supplementary Table 1. PCR was carried out in 20 μl volume containing 50 ng genomic DNA, 1 μl of each primer and either 8 μl of Mastermix (5PRIME, Hamburg, Germany) or 0.1 μl Taq-Polymerase and 4 μl Taq-buffer mix (Bioline Pharmaceutical AG, Baar, Switzerland). Samples were processed in a Sensoquest labcycler with a standard touchdown PCR program (annealing temperature from 59 to 65 °C).

The variants identified in this work were submitted to the publicly funded database LOVD (http://databases.lovd.nl/shared/variants/) with the LOVD individual IDs 00033731, 00033768, 00033771–00033807.

Results and Discussion

We studied 255 unselected MI/CAD patients from families with strong familial clustering of MI/CAD. The average age at disease manifestation was 42.5 years. The families of the index patients had an average size of 5.5 individuals with an average number of 2.3 affected family members.

Whole-exome sequencing yielded on average 6.2 Gb of sequence per individual. The average read depth was 78 with between 84.5 and 85.6% of the target regions covered at least 20 ×. In total, we identified 259 single nucleotide variants in the LDLR, APOB, PCSK9 and STAP1 genes.

We considered individuals with LDLC levels of >160 mg/dl after adjusting for statin intake for further analysis.35 To identify potential disease-causing variants, we filtered these variants based on three assumptions. First, we expect a strong functional impact and therefore, we removed synonymous and intronic variants outside splice-site regions. Second, we expect the variant to be rare in the general population, as FH affects 1 in 200 individuals at the most. Hence, we filtered based on a frequency of 1% in public databases. Third, we filtered variants in segmental duplications due to the high false-positive rate in these regions.

Additionally, a decision tree (Supplementary Figure 1) was developed to further filter the remaining 54 variants. This decision tree includes several considerations, such as the predicted deleterious/damaging effect of a variant on the protein function or whether the variant is known as potentially disease-causing in HGMD.36 Figure 1 shows the pedigrees of families where a potential disease-causing variant was identified.

Figure 1
figure 1

Pedigrees of families where a potential disease-causing variant was identified. Squares are males and circles are females. Individuals affected with MI/CAD are shown as dark symbols.. The first row of numbers below the symbols are the individual IDs, the second line show the genotype for the identified variant and the third row the corrected LDLC level. (LDLC levels marked with * are not corrected for statin treatment.) Detailed information is shown in Table 2. The potential disease-causing variant per family are: Family ID 4349: LDLR. c.131G>A (p.(W44*)); Family ID7520: LDLR. c.1285G>A (p.(V429M)); Family ID8450: LDLR. c.G1775 (p.(G592E)); Family ID 9242: LDLR. c.1444G>A (p.(D482N)); Family ID 7500: PCSK9. c.610G>A (p.(D204N)); Family ID 8797: STAP1. c.139A>G (p.(T47A)); Family ID 6548 LDLR. c.2231G>A (p.(R744Q)); Family ID 7421: LDLR. c.811G>A (p.(V271I)); Family IDs 8400, 8615, 9192: APOB. c.10580G>A (p.(R3527Q)) /(APOB. c.7696G>A (p.(E2566K))); Family ID6502: LDLR.c.1359-1G>A; Family ID 6565: LDLR. c.757C>T (p.(R253W)).

Variant spectrum in LDLR, APOB, PCSK9 and STAP1 genes

In total, we identified 13 rare variants with potentially functional effect in the LDLR, 9 variants in APOB, 8 variants in PCSK9 and 1 variant in STAP1. Twenty-three variants are missense variants, three are deletions, two are nonsense variants, two are splice-site variants and one is an insertion. Of these 31 variants, 24 were confirmed using Sanger sequencing (see Table 2). Fourteen have been previously reported to cause FH and listed in HGMD (access date March 2014).

Table 2 Validated variants identified by exome sequencing of 255 MI-affected individuals

LDLR gene variants

Co-segregating new variant: c.811G>A (p.(V271I))

We identified only one variant in the LDLR gene that has not been previously reported (c.811G>A (p.(V271I))). Both CAD/MI-affected members of family 7421 available for the genetic study carry this variant. Both individuals show elevated LDLC levels of 286 and 261 mg/dl after adjustment for statin treatment. The variant is also found in an family member unaffected by CAD but with markedly elevated LDLC levels (220 mg/dl).

Variant c.811G>A (p.(V271I)) lies in the domain that interacts with APOB37 and is in close proximity with several disease-causing amino acid changes listed in HGMD, for instance p.N272T, p.C270S, p.C270R, p.C276R, p.G269D or p.C270Y.38, 39, 40, 41, 42, 43 On the basis of the position of the identified variant with respect to previously reported variants, it is very likely that c.811G>A (p.(V271I)) is responsible for the increase of LDLC in the variant carriers.

Co-segregating known variants: c.1285G>A (p.(V429M)), c.1444G>A (p.(D482N)), c.G1775 (p.(G592E)), c.2231G>A (p.(R744Q)), c.757C>T (p.(R253W)), c.131G>A (p.(W44*)), c.798T>A (p.(D266E)) and c.828C>A (p.(C276*))

The missense variants c.1285G>A (p.(V429M)) (first found in the African population44), c.1444G>A (p.(D482N)), c.G1775 (p.(G592E)), c.2231G>A (p.(R744Q)), c.757C>T (p.(R253W)) and c.131G>A (p.(W44*)) show co-segregation with elevated LDLC levels (mean 344 mg/dl after adjustment for statin treatment) and MI in the families under study. Hence, on the basis of our own results and evidence in previously reported studies,38, 44, 45, 46 these variants probably cause the disease in these patients/families.

The splice site variant c.1359-1G>A has also been reported to cause FH.47 In addition, we have previously also identified this variant as probably disease-causing in one of our extended MI/FH families.48 Co-segregation analysis showed that all five family members carry the variant. Four family members are affected with MI/CAD and all have elevated LDLC levels even after statin therapy (mean LDLC of 366 mg/dl). Hence, our data confirm previous findings.

The missense variant (c.798T>A (p.(D266E))) and the nonsense variant (c.828C>A (p.(C276*))) were also found in individuals with markedly elevated LDLC levels and MI. Both variants have been reported as disease-causing in HGMD.38, 49 Here, we only identified both variants in the index patients of family 7080 and 8985 and not in the two other affected family members. Both index patients have markedly elevated LDLC levels of 310 and 204 mg/dl, respectively. Interestingly, the two non-carriers within these families also have elevated LDLC levels of 187 and 205 mg/dl.

Whereas c.828C>A (p.(C276*)) is not found in our internal non CAD/FH patients, c.798T>A (p.(D266E)) is found twice. However, as FH is a common disease, we also expect to find a small number of FH cases in population-based controls.

Non-co-segregating known variant: rs45508991

One LDL receptor variant, c.2177C>T (p.(T726I),rs45508991), is found in 6 of the 255 patients. This variant is reported to cause FH and is labelled as likely disease-causing in HGMD but the variant may be only pathogenic in combination with another variant in the LDLR gene.50, 51 In our data, we checked for co-segregation with LDLC and observed a positive co-segregation only in one family (family 7421) that also carries an additional co-segregating LDLR variant (c.811G>A (p.(V271I))). The other five families show poor or no evidence that the variant causes FH. In fact, in carriers of this variant, LDLC levels range from 147 to 268 mg/dl, and in non-carriers from 86 to 257 mg/dl.

Despite incomplete co-segregation with elevated LDLC levels, c.2177C>T (p.(T726I)) is only found in one MI/CAD-unaffected family member (Supplementary Figure 2). In our data set, we do not find variant carriers with neither MI/CAD nor FH. To further check whether c.2177C>T (p.(T726I)) is associated with an increased risk of CAD or FH, we compared the number of variant carriers in our sample within the general population. As c.2177C>T (p.(T726I)) is mainly found in individuals of European ancestry,50 we compared the frequency only in European samples. With a frequency of 0.007 in 1000 Genomes (1kG) data set (European samples, phase 3 version 5) versus 0.023 (6/255) in our sample set, there is evidence that the variant indeed increases the risk of CAD/FH. However, we also found c.2177C>T (p.(T726I)) in 25 of 1462 (0.017) internal European non-CAD patients (P-value=0.477). Hence, as the variant is equally common in affected versus unaffected individuals, we expect the accumulation of c.2177C>T (p.(T726I)) rather to be a founder effect than related to an increased risk of CAD/FH.

Non-co-segregating known variant: c.313+2T>C

LDLR variant c.313+2T>C does neither co-segregate with elevated LDLC levels nor with MI/CAD. This splice-site variant is only found in the index patient and not in the two relatives with elevated LDLC levels (mean LDLC 217 mg/dl) of which one is also MI/CAD-affected. Variant c.313+2T>C is reported to cause FH52 and is labelled as disease-causing in HGMD. However, on the basis of the results of co-segregation analysis in family 4318, we do not expect the variant to be disease-causing, at least not to be the primary cause of MI/CAD in this family.

Double variant: LDLR. c.131G>A (p.(W44*)) and PCSK9.c.137G>T (p.(R46L))

For all validated LDLR-variant carriers, we checked for a second variant in PCSK9, APOB and STAP1 gene. We found one compound heterozygote patient with a variant in the LDLR gene (c.131G>A (p.(W44*))) and PCSK9 gene (c.137G>T (p.(R46L))) (family 4349, patient 501). The PCSK9 variant, c.137G>T (p.(R46L)), is reported to be associated with a significant reduction of LDLC.53 Co-segregation analysis revealed that one of the three c.131G>A (p.(W44*)) carriers also carry c.137G>T (p.(R46L)) (patient 501), in addition to one family member without the LDLR variant (patient 504).

Reduced PCSK9 activity leads to increased density of LDLR54, 55 at the cell membrane. As we have heterozygous variant carriers expressing one healthy LDLR allele, we would not expect to see a strong effect, but expect to see reduced levels of LDLC for the double variant carriers. However, the LDLC level is markedly elevated for all three c.131G>A (p.(W44*)) variant carriers (mean LDLC 322 mg/dl). We do not identify a higher level of LDLC in the affected relative with the protective PCSK9 variant (LDLC level of 432 mg/dl). On the contrary, this patient has the highest LDLC level in the family. This could, however, imply that the protective effect of the PCSK9 variant is negligible compared with the LDLC level increase caused by the LDLR variant. The unaffected relative carrying only the reported protective variant has similar LDLC levels as the second unaffected relative lacking both variants. In summary, we do not see a reduction in LDLC levels in c.137G>T (p.(R46L)) carriers and would not expect the variant to markedly reduce LDLC levels.

APOB gene variants

Of the nine APOB variants, three were found in HGMD: c.7696G>A (p.(E2566K)) and c.5066G>A (p.(R1689H)) are reported to be associated with high triglyceride and c.10580G>A (p.(R3527Q)) with high LDLC levels. Of the nine variants, only one co-segregates with FH and MI/LDLC levels, and is previously reported. In the following, we will only discuss the three HGMD variants.

c.10580G>A (p.(R3527Q)) and c.7696G>A (p.(E2566K))

Interestingly, c.10580G>A (p.(R3527Q)) is always found in combination with the c.7696G>A (p.(E2566K)) variant. Vice versa, c.7696G>A (p.(E2566K)) is also found without the c.10580G>A (p.(R3527Q)) variant. As c.10580G>A (p.(R3527Q)) appears to be on the same allele as c.7696G>A (p.(E2566K)), c.10580G>A (p.(R3527Q)) seems to be the more recent variant.

The c.10580G>A (p.(R3527Q)) variant was found in three families (8615, 9192, 8400). Variant c.10580G>A (p.(R3527Q)) has been reported to cause defective binding of ApoB to the LDLR.56 All five c.10580G<A (p.(R3527Q)) carriers show elevated LDLC levels (mean LDLC 300 mg/dl). However, two siblings with elevated LDLC (mean 206 mg/dl), do not carry the variant. These non-carriers could, however, have another cause of disease. Hence, there is evidence that c.10580G>A (p.(R3527Q)) may cause elevated LDLC levels. Therefore, our results support previous findings.

Variant c.7696G>A (p.(E2566K)) was identified in seven families. This variant does neither co-segregate with elevated LDLC levels nor with MI. In the seven families, we find three affected individuals (mean 214 mg/dl) without the variant as well as four unaffected variant carriers (mean 133 mg/dl).

Non-co-segregating known variant: c.5066G>A (p.(R1689H))

The c.5066G>A (p.(R1689H)) variant was found in three of the 255 exome-sequenced MI-patients. Two non-carriers have LDLC levels above 190 mg/dl while four variant carriers do not show elevated cholesterol levels (mean 156 mg/dl). Hence, we do not expect this variant to be the cause of disease in these families nor increase the risk of FH/MI in general.

PCSK9 gene variant

We identified two variants in the PCSK9 gene. Both variants have not been reported earlier. On the basis of co-segregation analysis with LDLC levels, the c.610G>A (p.(D204N)) variant might cause FH. The affected variant carrier has LDLC levels of 314 mg/dl and the MI-affected brother has low LDLC levels (104 mg/dl). Hence, the cause of MI may differ in this individual. The variant c.449_450del(p.(150_150del) does not co-segregate with FH in the family.

STAP1 gene variant

Variant c.139A>G (p.(T47A)) is the only variant we found in the STAP1 gene and has not been reported previously. Both family members available for genetic studies are MI-affected and carry the amino acid substitution. In addition, both show elevated LDLC levels (mean 248 mg/dl). Hence, we might have identified the causal variant in this family but functional studies are necessary to further evaluate its functional implication.

Potential polygenic cause of FH

Of the analyzed CAD patients, 48% have LDLC levels above 190 mg/dl after correction for statin intake35 (LDLC levels were available for 212 CAD patients). We find rare potential disease-causing variants in 12.7% of these patients CAD patients with high LDLC levels. It has recently been shown that the clinical phenotype of FH can also be caused by the accumulation of common variants with small effects.9 Indeed, it has been reported that a score of six SNPs enables to discriminate FH patients from healthy controls.57 We calculated the score as described by Futema et al for the patients with high LDLC levels but without a rare disease-causing variant and compared the score with the controls of the German MI Family Study II58 (n=1298). Not all SNPs were covered by exome sequencing, so we calculated the score based on GeneChip Human Mapping 500 K Array Set (Affymetrix) available for 234 of the 255 CAD patients. The mean score in controls (0.63) are significantly lower than the mean score (0.69) in patients with high LDLC levels (P-value=0.025). Our values are in range with the scores reported by Futema et al (0.63 in the control cohort and 0.71 in the mutation negative FH patients). Hence, the mutation-negative patients with elevated LDLC levels might have a polygenic cause of disease.

Conclusion

Here, we screened 255 patients with premature MI/CAD for variants in genes known to cause FH. If we only account for variants in the LDLR gene, 3.1% of the patients carry potential FH-causing variants. If we also account for variants in APOB, PCSK9 and STAP1, we have a cumulative frequency of FH-causing variants of 5.1%. Indeed, the frequency of potential disease-causing variants in our sample is probably underestimated. Large rearrangements are reported to account for 11% of the LDLR variants.59 Unfortunately, the coverage profile of our data was not uniform enough to allow a screening for such variants and hence we might have missed some of these.

In summary, the high frequency of potential FH-causing variants in these unselected MI/CAD patients supports the hypothesis that FH is overseen in a substantial number of patients with MI/CAD and demonstrates that genetic screening also of MI/CAD patients can improve diagnosis of FH.

Additionally, we also screened family members of the index patients for the identified variants. This revealed that 17 family members also carry the potential FH-causing variant. Hence, our findings underline the need for a systematic molecular-genetic screening to enable an early diagnosis of FH and to allow timely preventive treatment.

A further interesting finding is that the functional effect of several variants reported as disease-causing, for instance in HGMD, is questionable. We observed that five reported causal variants show none or minor functional impact in our analyzed families. Consequently, given the far-reaching implications of the diagnosis of FH, each variant has to be carefully evaluated. In fact, a co-segregation analysis is advisable to determine whether a variant truly is disease-causing.

In summary, our work demonstrates that exome sequencing can be used for FH-variant screening. In addition, the quality of the exome sequencing has improved over the last years, allowing identification not only of small nucleotide variants but also large rearrangements.60 Also, as the sequencing costs have decreased dramatically, exome sequencing might become the method of choice for molecular genetic screening of, for instance, FH.