Background
Genome wide association studies (GWAS) provide a powerful tool for identifying common genetic variation associated with complex common traits. Single nucleotide polymorphisms (SNPs) represent the most common form of variation in the human genome [
1]. Most genotyping platforms used in GWAS use approximately 1 million SNPs to capture this genomic diversity [
2]. As the SNPs sampled in GWAS account for less than 10% of all SNPs present in the genome, the causative SNPs are unlikely to be sampled themselves and are thus more likely to be found in linkage disequilibrium (LD) with the GWAS risk SNPs [
3]. As protein-coding regions make up only about 1% of the ∼3.3 billion nucleotides in the human genome [
4], it is not surprising that the majority of GWAS risk SNPs identified map to non-coding sequences [
5‐
8]. Together with the recent advancements in post-GWAS interpretation methods, increasing evidence has pointed towards the enrichment and the functionality of GWAS risk variants and their associated SNPs in non-coding regulatory elements such as epigenetic markers, transcription factor binding sites, DNase I hypersensitive sites, RNA splicing and gene expression [
7,
9‐
12]. All of the above highlight the importance of understanding the functional polymorphisms within large expanses of LD which is often challenging due to the difficulty of working with large genomic regions in models of disease and the potential subtle effects of functional polymorphisms.
The microtubule-associated protein tau (
MAPT) locus is among the most important gene loci in neurodegeneration implicated in genetic risk for or pathology of a number of neurodegenerative disorders. There are two principal genetic haplotypes at the locus, named H1 and H2, of which the H1 haplotype shows strong genetic association with a number of neurodegenerative diseases including progressive supranuclear palsy (PSP) (odds-ratio [OR] of 5.5) [
13], corticobasal degeneration (CBD) (OR 3.7) [
14] and Parkinson’s disease (PD) (OR 1.3) [
15,
16]. Linkage disequilibrium across the region is very high (~1.8 Mb) due to the presence of a 900 kb chromosomal inversion on the H2 haplotype [
17], making it particularly challenging to identify functionally important polymorphisms. Prior to
MAPT being identified in genetic association studies as a risk locus for PD, PSP and CBD, the tau protein was already of interest in a number of neurodegenerative disorders due to the presence of abnormally phosphorylated tau protein in pathological aggregations in the form of neurofibrillary tangles. The multiple biological involvements of tau in neurodegeneration places further importance of understanding tau biology at both the genetic and the protein levels.
Our laboratory is interested in the hypothesis that polymorphisms within the
MAPT sequence have functional consequences on
MAPT gene splicing and therefore protein function [
18‐
20]. Mutations at the exon 10 splice site in FTDP-17 patients demonstrate that differences in splicing alone are sufficient to generate disease [
21,
22]. We have previously shown the H1 haplotype expresses up to 40% greater exon 10 containing transcripts than H2 in the absence of overall transcript expression differences [
18]. Additionally, we have shown that the H2 haplotype expresses 2-fold greater transcripts containing the alternatively spliced exon 3, both in cells, post-mortem brain tissue [
19] and induced pluripotent stem cell derived models of neurodegeneration [
23]. Recently, 2N tau isoforms have been show to interact with proteins important for neurodegenerative pathways (Parkinson’s, Alzheimer’s and Huntington’s disease) [
24]. Additionally, there is evidence that 2N isoforms depress tau aggregation [
25] which together may indicate 2N tau proteins offer some protection from pathology.
MAPT.
Here, we present an approach to determine the functional effects of specific SNPs located within a large region of LD
MAPT by leveraging the strong haplotype specific expression of
MAPT exon 3 to gauge SNP functionality. Our laboratory uses high capacity bacterial and P1-phage artificial chromosome vectors (BACs and PACs) to express whole genomic loci in culture and applies homologous recombination technology to manipulate the large inserts with base-pair accuracy. We have previously used these vector systems to express the human
MAPT locus in neuronal cell culture models and demonstrated
MAPT locus expression is under developmental and cell-type specific regulation [
26] and in transgenic mouse models express all six adult tau isoforms [
27]. Here, we applied an analogous strategy to generate genomic DNA p
MAPT-H1 and p
MAPT-H2 expression vectors with identical upstream and downstream sequence, differing only at sites of haplotype variation. We generated haplotype hybrid vectors using homologous recombination in
E. coli to specifically assay the impact of polymorphisms on splice phenotypes observed at
MAPT exon 3. To understand the underlying mechanisms of the haplotype-specific splicing regulation, we employed biochemical techniques to study the impact of H1/H2 SNP sequences on RNA-protein interaction and to identify their interacting trans-acting splicing regulators. We developed an allele-specific qPCR assay to measure the ratios of H1 vs H2
MAPT transcripts in our RNA interference experiments and identified hnRNP F and hnRNP Q as critical protein regulators of haplotype-specific inclusion exon 3.
Methods
Generation of pMAPT hybrid vectors
Homologous recombination technology from GeneBridges for BAC modifications using a selection-counter selection method [
28] was used for engineering the p
MAPT vectors. Briefly, PCR products was produced using primers that amplify the selection-counter selection streptomycin sensitive/chloramphenicol resistant (rpsl/chl) cassette, with long homology arms flanking the p
MAPT sequence to be modified. The PCR product containing the rpsl/chl cassette was then used to replace the p
MAPT sequence to be modified by homologous recombination. Bacterial colonies were selected based on streptomycin sensitivity and chloramphenicol resistance. The rpsl/chl cassette was then excised and replaced by PCR products containing the desired sequence using homologous recombination. Colonies were selected on streptomycin resistance and colony PCR was performed to screen for the deletion of the rpsl/chl cassette and the insertions of the engineered sequence. DNA sequencing was performed to identify bacterial colonies with sequence successfully modified. Primer sequences are given in Additional file
1: Table S1.
Transfection of SK-N-F1 Neuroblastoma cells
SK-N-F1 Neuroblastoma cells were cultured in culturing medium [Dulbecco’s modified eagle’s medium (Sigma-Aldrich) with 10% fetal bovine serum, 4 mM L-glutamine, 50 U/mL penicillin and 50 μg/mL streptomycin, and 1X minimum essential medium non-essential amino acids (Life Technologies)]. 7–10 × 105 cells were seeded per 6-well well. Transfection was carried out when cell confluence reached ~75%. 6.25 μg of each MAPT construct DNA was incubated for 10 min at room temperature with 6.25 μL of PLUS™ reagent in Opti-MEM® reduced serum medium (Life Technologies). 15.6 μL of Lipofetamine® LTX reagent (Life Technologies) diluted in Opti-MEM® medium and then incubated with the DNA mixture for a further 30 min, forming the transfection mix. Cells were washed with Opti-MEM® medium and were then incubated with the transfection mix for 4 h at 37 °C, 5% CO2. Cells were washed with OptiMEM® medium and were then further incubated in culturing medium for 48 h before proceeding to RNA extraction.
RNA extraction and cDNA synthesis
SK-N-F1 cells were harvested in TRIzol reagent (Life Technologies). Organic and aqueous phases were separated by the addition of chloroform. Total RNA was precipitated by the addition of isopropanol. RNA purification was then carried out using the RNeasy mini kit (Qiagen) according to the manufacturer’s protocol. RNA concentration was determined using the ND-1000 nanodrop spectrophotometer. cDNA was generated using either SuperScript® III Reverse Transcriptase (Life Technologies) or SuperScript® VILO Master Mix (Life Technologies) from 1 to 5 μg RNA according to the manufacturer’s protocol.
qPCR
Quantitative PCR was performed using either SYBR® Green PCR Master Mix or Fast SYBR® Green PCR Master Mix and StepOnePlus™ System (Life Technologies) according to manufacturer’s protocol and cycling parameters. For measuring transgenic total
MAPT and exon 3 expression, a relative standard curve method was employed to compare the amount of exon 3 transcripts expressed to that of total
MAPT and values obtained are in arbitrary units for internal comparisons not a percentage of the total tau expression. For measuring splice factor expression levels, a 2
-(delta-delta Ct) method [
29] was used where gene expression is normalised against the geometric means of three endogenous control genes
GAPDH,
HPRT1 and
ACTB [
30]. All qPCR reactions were performed in triplicate. Primer sequences (Eurofins Scientific and Integrated DNA Technologies) are listed in Additional file
1: Table S2.
Allele-specific qPCR assay
H1 (FAM labelled) and H2 (VIC labelled) specific Taqman probes spanning rs17650901 in exon 1 were designed using Primer Express 3.0 (Applied Biosystems) and Primer 3 [
31,
32]. A standard curve was generated using genomic primers where the delta Ct values between FAM (H1) and VIC FAM (H2) signals were plotted against the log
2 ratios of H1:H2 transcripts by mixing 50 pg of H1 and H2 p
MAPT vectors in the ratios 8:1, 4:1, 2:1, 1:1, 1:2, 1:4 and 1:8. The equation of regression line obtained from the standard curve log
2 (H1/H2) = −1.194 × delta Ct was used calculate the ratios of H1:H2 transcripts. Primers (Integrated DNA Technologies) and probes (Life Technologies) are listed in Additional file
1: Table S3.
RNA Electrophoretic mobility shift assay (EMSA)
SK-N-F1 nuclear lysate was extracted using a protocol as previously described [
33] with minor modifications. Briefly, SK-N-F1 cells were grown in 15 cm dishes for 48 h and were harvested by gentle scraping. The cytoplasmic fraction was first extracted using cold lysis buffer (10 mM HEPES, 10 mM KCl, 0.1 mM EDTA, 0.1 mM EGTA, Halt protease and phosphatase inhibitors cocktail and 0.67% Igepal) (Ambion, Thermo Scientific and Sigma Aldrich). The nuclear pellet was washed and lysed using cold nuclear lysis buffer (20 mM HEPES, 400 mM KCl, 1 mM EDTA, 1 mM EGTA and Halt protease and phosphatase inhibitors cocktail) (Ambion, Thermo Scientific and Sigma Aldrich). RNA EMSA was carried out using RNA oligonucleotides biotinylated at the 3′ end and SK-N-F1 nuclear lysates using the LightShift Chemiluminescent RNA EMSA Kit (Pierce) according the manufacturer’s protocol. rs17651213 H1 (5’TGA GGG AGC TTT GCG TGT TTA TCC TCC TGT3’) and H2 (5’TGA GGG AGC TTT GCA TGT TTA TCC TCC TGT3’) probes were biotinylated at the 3′ end via a 15 atoms triethyleneglycol linker (Integrated DNA Technologies). rs1800547 H1 (5’CCA CAG GUG AGG GUA AGC CCC AGA GAC CCC5’) and H2 (5’CCA CAG GUG AGG GUG AGC CCC AGA GAC CCC3’) probes (Integrated DNA Technologies) were biotinylated at the 3′ end joined by a cytidine (bis)phosphate residue using the RNA 3′ End Biotinylation Kit (Pierce) according to manufacturer’s protocol.
RNA pull-down
RNA pull-down of RNA binding proteins was performed using the magnetic RNA-protein pull-down kit (Pierce) according to the manufacturer’s instruction. Briefly, streptavidin coated magnetic beads were washed in wash buffers. 50 pmol RNA oligonucleotide probes labelled with 3′ biotin-tetra-ethyleneglycol spanning rs17651213 either H1-G (5’TGA GGG AGC TTT GCG TGT TTA TCC TCC TGT3’) or H2-A (5’TGA GGG AGC TTT GCA TGT TTA TCC TCC TGT3’), rs1800547 H1 (5’CCA CAG GUG AGG GUA AGC CCC AGA GAC CCC5’) or H2 (5’CCA CAG GUG AGG GUG AGC CCC AGA GAC CCC3’) (Integrated DNA Technology) were bound to the streptavidin coated magnetic beads and incubated with 40 μg SK-N-F1 nuclear enriched lysates. Unbound proteins were washed off. The magnetic beads together with the RNA bait and bound proteins were snap frozen and stored at -20 °C until further analysed.
Mass spectrometry
Tryptic digests of RNA-bound proteins were analysed on an Ultimate 3000 RSLCnano HPLC (Dionex, Camberley, UK) system run in direct injection mode coupled to a QExactive Orbitrap mass spectrometer (Thermo Electron, Hemel Hempstead, UK). Samples were resolved on a 25 cm by 75 μm inner diameter picotip analytical column (New Objective, Woburn, MA, USA) which was packed in-house with ProntoSIL 120–3 C18 Ace-EPS phase, 1.9 μm bead (Bischoff Chromatography, Germany). The system was operated at a flow-rate of 250 nL min
−1. A 120 min or 60 min gradient was used to separate the peptides. The mass spectrometer was operated in a “Top 20” data dependent acquisition mode. Precursor scans were performed in the orbitrap at a resolving power of 70,000, from which the 20 most intense precursor ions were selected by the quadrupole and fragmented by HCD at a normalised collision energy of 30%. The quadrupole isolation window was set at
1.6 m/z. Charge state +1 ions and undetermined charge state ions were rejected from selection for fragmentation. Dynamic exclusion was enabled for 27 s. Data were converted from. RAW to. MGF using ProteoWizard [
34].
Mass spectrometry data analysis
Mass spectrometry. RAW data files for SNP rs17651213 were imported and processed with Progenesis QI for Proteomics software (Nonlinear Dynamics). The H1 probe sample data were selected as the alignment reference for all the replicates in the alignments of ion density maps. Automatic alignment was first performed followed by manual reviewing of the alignments. Peptide ions with a charge between 2 and 4 were included. Peptide abundance normalisation was performed automatically by the software. A between-subject design was set for analysed runs. Protein identification was performed using MS/MS ions search on the Computation Biology Research Group (Oxford University) Mascot Server (Matrix Science). Peptide ions were searched against the UPR_HomoSapiens database with fixed modifications set for carbamidomethyl at cysteine residues and variable modifications set for oxidation at methionine residues. Completed protein identifications were imported back into Progenesis QI for matching to their corresponding detected peptide ions. Proteins were quantified by Progenesis QI using a relative quantitation method using non-conflicting peptides.
Mass spectrometry. RAW data files for SNP rs1800547 were imported and processed with MaxQuant [
35]. A minimum of two unique peptides were required to protein quantification, peptide abundances were normalised using the iBAQ algorithm. Peptide ions were searched against the UniProtHomo Sapiens database with fixed modifications set for carbamidomethyl at cysteine residues and protein identifications were obtained.
To identify splicing factors interacting with the RNA probes, proteins were matched to a list of 71 experimentally validated splicing factors obtained from the SpliceAid-F database (
http://srv00.recas.ba.infn.it/SpliceAidF/) [
36]. Candidate splice factors were manually curated by (Additional file
1: Table S4) excluding factors in the three experimental replicates where the H1/H2 protein abundance ratios were not consistently above or below 1 across replicates. At least two replicates were used in the final results. Splice factors with ratios of abundance between 1.2 and 0.8 were further excluded as candidates.
Western blot
RNA pull-down fractions were denatured in Lamelli buffer (6X: 12% SDS, 30% β-mercaptoethanol, 60% glycerol, 0.012% bromophenol blue, 375 mM Tris pH 6.8) at 95 °C for 10 min. Following denaturation, proteins were separated by polyacrylamide gel at 200 V for ~45 min. Proteins were transferred using the Trans-Blot Turbo Transfer System (BioRad) onto polyvinylidene fluoride membrane contained in the Trans-Blot Turbo PVDF Transfer Packs (BioRad) according to the manufacturer’s protocol. Specific proteins were detected using anti-hnRNP F (IQ208 Immuquest) and anti-hnRNP Q (ab189405 Abcam) antibodies according to the manufacturer’s protocol.
Frontal cortex (BA 46) brain tissue was obtained from 5 H1/H2 PSP cases and 5 pathology-free controls from the brain banks of the Oxford Project to Investigate Memory and Ageing (OPTIMA) and the Thomas Willis Oxford Brain Collection. Brain tissue samples are collected with full consent of the patient and with the approval of the local Ethics Committee (COREC approval number 1656). Expression analysis has been approved by local Ethics Committee review (ref 06/Q1605/8). Total protein was extracted with an initial homogenization in 10 mls cold RIPA buffer per gram of tissue (RIPA 50 mM tris-HCl, pH 7.4, 150 mM NaCl, 1% (v/v) Triton X-100, 1% (w/v) sodium deoxycholate, 0.1% (w/v) SDS) with cOmplete mini protease inhibitors (Roche). The tissue was homegenized and sonicated before leaving on ice for 1 hour after which the soluble fraction was isolated by microcentrifugation (14,000 RPM, 10 mins, 4 °C) and Western Blotting ocucred as described above. Primary antibodies used for human brain samples: hnRNP Q (ab184946) 1: 10,000; hnRNP F (ab6095) 1:500. HRP-conjugated β-actin (ab49900) (1:20,000).
GTEx data analysis
The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund (
https://commonfund.nih.gov/GTEx) of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 08/17/2017: GTEx Analysis V6 (dbGaP Accession phs000424.v6.p1). Data analysis and visualisations were performed using R version 3.3.1 (
https://www.r-project.org/). rs17650907 genotypes were used as a proxy for H1 and H2 haplotypes where H1 represent individuals with genotype AA and H2 with AG and GG. GTEx brain regions were grouped into cerebral cortex, cerebellum, amygdala, hippocampus, midbrain and spinal cord.
Discussion
In this study, we have combined whole genomic locus expression vectors with RNA-Protein identification and validation to identify functional variants that alter the expression of
MAPT exon 3 through interaction with protein splice factors. These combined techniques have enabled us to assay the effect of risk associated variants within a large region of LD. By using haplotype-hybrid
MAPT genomic locus vectors in a cell culture model, we have identified a functional variant rs17651213 that imparts a two-fold increase in H2-
MAPT exon 3 expression compared to H1, a haplotype-specific expression pattern which has previously observed in both cell culture and in post-mortem brain tissue [
19,
48]. Additionally, we identified rs1800547 which, also alters the regulation of the H1/H2 haplotype-specific alternative splicing of exon 3, although does not confer the splice phenotype of its haplotype background. Furthermore, we use mass spectrometry to identify splice factors that differentially bind to these alleles and confirmed that hnRNP Q and hnRNP F, two factors that displayed differential binding to rs17651213 alleles, alter the expression of
MAPT exon 3 from the two haplotypes. Importantly, the haplotype-specific exon 3 inclusion by rs17651213 H1/H2 variants is highly dependent on the presence of either the H1 or H2 variant of its upstream functional SNP rs1800547, demonstrating the complex interactions between the functional SNP and its surrounding haplotype sequence context.
Experimental evidence supports roles for numerous factors in splicing including RNA-protein interactions, epigenetic regulation, co-transcriptional splicing, RNA secondary structures and RNA quality control systems (reviewed in [
49]). Furthermore, the recognition motifs are short and motifs are short and degenerate and can be recognized by multiple different proteins, which in turn form complexes that have the ability to alter binding affinities and specificities of their peers [
50]. The combination of these factors to regulate splicing contributes to the complexity of splicing regulation. The data we present here identifies just some of the cis-elements and trans-acting factors that contribute to the splice phenotype of
MAPT exon 3. The context specific nature of the cis-element roles in splicing is demonstrated through the interactions of the alleles rs1800547 and rs17651213, which alone can enhance the haplotype-splice phenotype seen at exon 3, but when combined serve to act against the enhancing or silencing activity of the other SNP. Here we have limited our investigations to identifying polymorphisms and trans-acting factors that are contributing to the allele-specific expression of exon 3, focusing predominately on rs17651213 as the example of a cis-acting element that could convey the haplotype-specific expression profile of exon 3 when swapped on genetic backgrounds however there remains great scope for additional investigations.
Multiple GWAS and subsequent meta-analyses consistently report the H1 and H2
MAPT haplotypes to be over and under-represented, respectively, in PD, PSP and CBD [
13‐
15], demonstrating the genetic risk and protection contributed by the H1 and H2 polymorphisms. Dissecting the mechanistic effects of H1/H2 polymorphisms that lead to splicing changes therefore requires methods that can encompass the large genomic linkage disequilibrium structures.
We present here a novel application for whole genomic locus vectors to study the functional effects of genetic variations on alternative splicing. Previously, mini-gene splicing constructs have been used to identify functional sequences and study mutations in alternative splicing [
51]. However, in order to understand the functional significance of SNPs, there is a great benefit to using whole genomic locus vectors where the full complement of haplotype-specific polymorphisms in the non-coding regions, including all introns, upstream and downstream sequences of a gene, can be captured and manipulated. Our p
MAPT-H1 and –H2 wildtype genomic DNA vectors carrying the 143 kb
MAPT locus recapitulate the expression of endogenous exon 3, which is expressed at a two-fold higher level from the H2 allele compared to the H1, providing the correct physiological context of genetic variations from which they can be modified and studied. We have achieved single base pair accuracy in manipulating genomic DNA vectors, thereby allowing us to identify the precise haplotype-specific functions of rs1800547 and rs17651213 on exon 3 inclusion. Recent genome wide analyses of genetic variations showed SNPs are often associated with differences observed in gene expression and splicing [
52,
53]. More importantly, SNPs in strong LD with lead risk SNPs identified in GWAS are often enriched in regulatory elements [
7,
11], illustrating the importance of understanding the functions of non-coding sequence variations. Here, genomic DNA vectors that are able to capture this sequence diversity provide a novel and powerful tool to study differential regulation of gene expression and alternative splicing by SNPs, both in normal physiology and disease associations.
In silico analyses provide informative data that suggest potential mechanisms of differential exon 3 inclusion by rs1800547 and rs17651213 SNP sequences as the H1 and H2 alleles were predicted to bind different splice factors. We postulate that H1/H2 SNPs may regulate exon 3 inclusion by generating new splice factor binding sites and/or by altering the sequence strength for splice factor binding, two mechanisms that are not mutually exclusive. Previous studies have shown poor correlation between sequence motif predictions and RNA or DNA-protein interaction events [
9,
54]. In vitro validations of in silico RNA-protein interaction predictions are therefore important in interrogating the mechanisms of splicing regulation. Our RNA-EMSA and RNA-protein pull-down experiments showed variant sequences confer allele-specific RNA-protein interactions and differences in sequence strengths for splice factor bindings, further supporting the notion that the H1/H2 SNPs modulate haplotype-specific exon 3 splicing by altering RNA-protein interactions. DNA/RNA-affinity approaches provide an unbiased means of studying nucleic acid and protein interactions [
55], while label-free quantification of peptides provides a flexible method for comparing protein abundance in different samples [
56]. Here, we discovered a trans-acting splicing regulator hnRNP Q interacting with SNP rs17651213 by RNA-protein pull-down experiments which was not previously predicted by computational sequence analysis based on consensus motifs. DNA/RNA-affinity approaches thus provide an informative method to screen for and to identify new RNA-protein interacting partners for further functional studies. hnRNP F was identified to interact with SNP rs17651213 in our RNA-protein pull-down. Our data highlight the significance of complementing computation predictions with biological data for identifying true RNA/DNA-protein interaction events.
We developed an allele-specific expression assay which allowed us to study changes in the H1:H2 transcript ratio following knockdown of splicing factors. We found the silencing of both hnRNP Q and hnRNP F led to an increase in the H1:H2 exon 3
MAPT transcript ratio, indicating that they promote the exclusion and/or inclusion of exon 3 from the H1 and H2 alleles, respectively, under normal conditions. Previous studies have demonstrated that the regulation of inclusion of exons by splicing factors is highly context specific. The same cis-regulatory sequence element could have both enhancer and silencer effects depending on the surrounding sequences. For example, deletion experiments of hnRNP F binding motifs (G-run elements) in the fibrinogen gamma-chain gene pseudoexon showed that the deletion of a silencer G-run element could have enhancing effects on the pseudoexon if neighbouring G-run elements are not present [
54]. Likewise, the same splice factor could promote both exon inclusion and skipping, depending on the sequence context. For example, recent genome wide analyses on alternative splicing events showed that the depletion of hnRNP F proteins led to both activation and repression of alternative exons, strongly indicating that hnRNP F normally regulates both the enhancing and silencing of alternative exons [
57,
58]. The interaction between rs1800547 and rs17651213 and their individual effect on exon 3 inclusion are likely to be complex and highly reliant on surrounding sequencesince exon 3 has an intrinsically suboptimal branch point at the 3′ splice site [
59]. Nevertheless, our data from our p
MAPT haplotype-hybrid vector study highlight the haplotype-specific differences between H1 and H2 SNP alleles and their combinatorial effects on regulation exon 3 inclusion.
The strong association of rs1800547 and rs17651213 with the PD GWAS risk SNP rs17649553 (Additional file
1: Figure S1) and the functional effects of on the two haplotype-specific SNPs on exon 3 inclusion could be contributing to the risk or protection conferred by the H1 and H2 haplotypes respectively. Exon 3 encodes the N-terminal acidic projection domain that mediates the interaction of tau protein with various cellular components such as the plasma membrane, dynactin, actin cytoskeleton, phospholipase C-γ and tyrosine kinase fyn signalling pathways and axonal transport processes [
60‐
69]. many of which are implicated in PD pathogenesis [
70‐
72]. The 2N tau protein isoform interacts with preferentially with proteins which map to neurodegenerative disease pathways such as AD, PD and Huntingdon’s disease [
24]. In some neuropathology studies, 2N tau protein does not stain tau inclusions in CBD post-mortem brain tissue [
73], and blots of sarkosyl-insoluble tau from both PSP and CBD lack 2N Tau isoforms [
74], though we note studies using different conditions have been also detect 2N tau in CBD and PSP. There is evidence that 2N isoforms depress tau aggregation [
25] which may indicate a route by which 2N tau offers some protection from disease. In this study we have investigated the genetic mechanisms which regulate the inclusion of exon 3 under haplotype-specific control. Understanding how different exon 3 tau isoform levels mediate processes implicated in neurodegeneration will provide further insight into the mechanisms of H1/H2 polymorphisms confer risk/protection in neurodegeneration.