Background
Human hair traits show wide interindividual variability, which has been suggested to be largely determined by genetic factors [
1,
2].
Systematic gene identification efforts for a growing number of quantitative and complex human traits have shown that the majority of associated genetic factors are located in non-coding genomic regions [
3]. These variants most probably exert their functional effects through the tissue-specific modulation of the expression of trait-relevant genes. Expression quantitative trait loci (eQTL) analyses, that correlate sequence variation with gene expression data, have proven a valuable tool in terms of delineating the tissue-specific architecture(s) of gene regulation and predicting the impact that trait-associated variants exert on it [
4]. This approach therefore bridges the gap between a genetic association finding and the underlying biological mechanism, and may provide crucial insights into disease development.
Increased knowledge of the genetic factors that contribute to variability of gene regulation in the human scalp hair follicle (HF) will aid the interpretation of genetic findings for hair-related traits and hair loss disorders, such as androgenetic alopecia (AGA). Moreover, the comparison of the regulatory architecture between HFs from different scalp areas may aid the understanding of the variable susceptibility of hair follicle subpopulations to hormonal hair loss. The aims of the present study were to: 1) perform a systematic mapping of eQTLs in the human hair follicle, and 2) evaluate the potential of these eQTLs in terms of the functional annotation of genetic loci that contribute to the development of hair-related traits and common diseases.
Methods
Sample collection
About 50 HFs were plucked from the occipital scalp of 100 (discovery sample), and the frontal and occipital scalp of 25 (replication sample, previously described in [
5]) unrelated male volunteers. Peripheral venous blood samples were collected from all study participants. All volunteers were German residents of European descent, and showed a collective mean age of 27.9 years.
DNA was extracted from whole blood samples using the Chemagic Magnetic Separation Module I (Perkin Elmer Chemagen Technology Inc., Baesweiler, Germany). Total RNA was extracted from HFs using the RNeasy Micro Kit (Qiagen, Hilden, Germany), and the quality and quantity of the RNA were assessed using a BioAnalyzer 2100 (Agilent Technologies, Waldbronn, Germany), and a NanoDrop ND-1000 spectrophotometer (Peqlab Biotechnologie, Erlangen, Germany), respectively. Only total RNA samples with an RNA integrity number (RIN) ≥ 8 were further analyzed in the study.
Array hybridization
DNA extracts were hybridized onto the Human OmniExpress-12v1.0 bead array (Illumina, San Diego, CA, USA) (N = 100) or the Illumina PsychArray v1.0 (N = 25) for genome-wide genotyping, while RNA extracts from HFs were amplified and biotinylated using the TotalPrep™-96 RNA Amplification Kit (Illumina, San Diego, CA, USA) prior to whole transcriptome profiling, performed on the Illumina HT-12v4 bead array.
Preparation of genotype data
SNP array raw data was initially analyzed using the Genotyping module within the GenomeStudio software (Illumina). Genotype calls were exported for basic quality control in PLINK v1.9 ([
6];
www.cog-genomics.org/plink/1.9/) to eliminate bad quality data (e.g. SNPs and individuals with high degree of missing data, very rare SNPs) prior to genotype imputation. Imputation was performed on the Michigan Imputation Server [
7] using the 1000 Genomes Project Phase 3 v5 reference panel and Eagle v2.3 phasing [
8]. Post-imputation data processing was performed using VCFtools [
9] and quality control was carried out in PLINK 1.9. Briefly, only biallelic single nucleotide variants with high imputation quality score (Rsq) > 0.7, minor allele frequency (MAF) ≥ 1% and under Hardy-Weinberg equilibrium (HWE
p > 1 × 10
− 8) were further analyzed. A principal component analysis (PCA) was performed with PLINK to identify potential outlier samples and use the generated principal components (PCs) as covariates for eQTL analysis. Two final autosomal genotype datasets consisted of 5,887,234 SNPs and 98 individuals for the discovery sample, and 1,044,566 SNPs and 24 individuals for the replication sample.
Preparation of gene expression data
Raw data from the expression microarrays was initially analyzed using the Gene Expression module within the GenomeStudio software to generate calls and detection
p-values. The probe-level gene expression data was exported for pre-processing by background correction, quantile normalization, log2 transformation, probe quality filtering and identification of potential outlier samples by PCA using R. Probes were considered expressed when showing a detection p-value < 0.01 in at least 5% of the samples. Probe quality filtering included the retention of only “good” and “perfect” quality probes mapping to only one gene with a valid identifier, according to annotations retrieved from the illuminaHumanv4.db package [
10]. Three final gene expression datasets consisted of 13,217 probes and 98 individuals for the discovery sample, and 13,091 probes from frontal scalp and 12,814 probes from occipital scalp and 24 individuals for the replication sample.
eQTL mapping
Genome-wide associations in
cis (1 Mb window) between the expression levels in scalp HFs and SNP genotypes were tested using QTLtools [
11]. Three covariate files prepared for the analyses included the first 10 (discovery sample) or 5 (replication sample) PCs for the genotype datasets and the first 10 or 5 PCs for the phenotype datasets. Initially, the full spectrum of eQTLs for each of all three gene expression datasets was identified through nominal pass analysis. After exploration of the nominal significant results (
p < 0.05), only those eQTLs with false discovery rate (FDR) < 1 × 10
− 4 in the discovery sample were considered true eQTLs, while all eQTLs with
p < 0.01 in the replication sample were retained for further analyses. To identify independent signals within our set of true eQTLs, a permutation pass analysis (1000 permutations), followed by a conditional pass analysis on grouped phenotypes (i.e. a gene-level output from the probe-level analysis) were applied.
Variant annotation
We explored reported eQTL effects for our independent HF eQTLs using the Variant Annotation tool from SNiPA (Single Nucleotide Polymorphisms Annotator) ([
12];
http://snipa.org), noting whether or not the HF eQTL has been previously reported to have
cis-eQTL effects in at least one tissue, and whether or not eQTL effects have been observed on the same gene as in our study. Additionally, we searched for trait associations of our true HF eQTLs that have been reported in the GWAS Catalog ([
13];
https://www.ebi.ac.uk/gwas/).
Replication of HF eQTLs
The true eQTLs identified from the nominal pass analysis in the discovery sample were considered replicated when the same eQTL SNP (eSNP) was found associated to the same eGene, either through the same or a different probe, with p < 0.01 in either frontal and/or occipital scalp areas from the replication sample.
Differential eQTLs between frontal and occipital scalp
To investigate differential regulatory effects between frontal and occipital scalp areas, non-overlapping eQTLs with p-value < 5 × 10− 5 were investigated in the replication sample. These included eSNPs that showed different effects between both datasets (i.e. association to a different eGene or opposing direction of the effect for the same SNP-gene pair; different-effect eQTLs), and those eQTLs that were unique to the frontal or occipital dataset (i.e. no overlaps in eSNPs or eGenes; region-specific eQTLs).
Functional enrichment analysis
The lists of eGenes obtained from the discovery sample and the differential eQTL analysis were submitted for enrichment analysis using the GENE2FUNC function of the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) platform ([
14];
http://fuma.ctglab.nl/). Each analysis used the provided option to include all human genes as background, a significance cutoff of FDR < 0.05 and a threshold of minimum overlapping genes with the gene-sets = 3. In addition, a comparative analysis of biological pathways for the differential eQTLs was performed using the lists of the identified region-specific eGenes and the Gene Enrichment Compare function within the FunRich (Functional Enrichment analysis tool) software [
15]. From the resulting pathway analysis using the default FunRich database, only frontal- and occipital-specific terms with at least 2 genes from the dataset present in the pathway term, and
p < 0.01 (from hypergeometric test) in one scalp area and
p > 0.05 in the other, were retained for the purposes of the present study.
Overlaps with reported genetic findings for hair phenotypes
To test the informativeness of our HF eQTLs for the interpretation of genetic findings for hair phenotypes, we used three sets of published genome-wide significant variants (
p < 5 × 10
− 8) associated with (i) hair shape (discovery meta-analysis, supplementary table 2 from [
1]), (ii) hair color (meta-analysis, supplementary table 2 from [
2]), and (iii) AGA (reported lead SNPs from [
16‐
26]). These three variant sets were subjected to analysis by the SNP2GENE function of FUMA GWAS to assign GWAS findings to genomic regions. The analysis for each phenotype was set to include SNPs from the 1000 Genomes Project (Phase 3, European population) [
27] that are in linkage disequilibrium (LD) with the reported GWAS variants. LD blocks were set to include variants of minor allele frequency ≥ 0.01,
r2 ≥ 0.6 and a distance < 500 kb for merging into a locus. Afterwards, we searched for overlaps between our true HF eQTLs and the resulting loci for hair shape, hair color and AGA.
Discussion
The strongest eQTL associations in occipital HFs were observed for
IPO8 (rs7326),
ATP5MD (rs2271751), and C17orf97/LIAT1 (rs11150881). The findings for
ATP5MD were confirmed in our small replication study. Although we were not able to replicate the “true” SNP-gene associations for
IPO8 in our replication sample, a set of different SNPs were indeed associated with
IPO8 expression in HFs from frontal and/or occipital scalp areas, some of which were also associated with
IPO8 expression in the discovery sample at the nominal level (data not shown), therefore confirming the genetic regulation of
IPO8 in HFs. While little is known about the function of C17orf97, IPO8 mediates the nuclear import of proteins and mature microRNAs [
28]. ATP5MD is crucial for the maintenance of ATP synthase in mitochondria, and might actively participate in the cellular energy metabolism, a process with well-known relevance to hair biology and hair growth [
29,
30]. Over-expression of
ATP5MD causes a number of mitochondrial abnormalities and an increase in anaerobic metabolism associated with the induction of an epithelial to mesenchymal-like transition, as well as delayed cell growth [
31,
32]. The genetically-controlled regulation of mitochondrial function in human HF is further supported by our pathway analysis that found an enrichment of eGenes for true eQTLs in pathways related to mitochondrial function along with other pathways, such as the regulation of responses to steroid hormones, the Wnt/β-catenin and interferon (IFN) signaling, adipogenesis, immune responses and the metabolism of glucose and lipids, all of which have well-known roles in HF biology [
33].
We also investigated whether there might be differences in the genetic control of gene expression between HF subpopulations from different scalp areas. Despite the reduced size of this (replication) sample, our results point to interesting avenues for future research. For instance, an important difference between frontal and occipital scalp appears to be the metabolism of amino acids, including arginine degradation. It is known that the HF is dependent on arginine, as hair growth depends on the vasculature and L-arginine not only participates in cell proliferation but is a precursor for the vascular mediator nitric oxide. L-arginine deficiency has been shown to impair hair elongation, while its supplementation increases the number of HFs in anagen (growth) and decreases that of HFs in telogen (resting) phase [
33,
34]. This supports the notion that regional increases in arginine degradation might result in enhanced vulnerability to hair loss in the frontal scalp area. A similar scenario can be thought for HDACs. HDACs are important transcriptional repressors that act in multiprotein complexes and are involved in the control of cell cycle progression [
35,
36]. While class I HDACs are ubiquitously expressed, class II HDACs show tissue specificity [
36]. In our study, we found eQTLs for HDACs 2, 5 and 7 only in frontal scalp. Interestingly, these particular HDACs modulate HF development and homeostasis [
37,
38], as well as angiogenesis and vascular integrity [
39‐
41]. Moreover, androgen actions have been shown to regulate HDAC7 subcellular compartmentalization, and HDAC7 has been proposed to be a co-repressor of the androgen receptor [
42].
We also found evidence for differential regulation of vitamin C (L-ascorbic acid) metabolism in frontal scalp. It has been shown that a derivative of L-ascorbic acid (L-ascorbic acid 2-phosphate) promotes HF growth that is mediated by the induced expression of insulin-like growth factor-1 (IGF-1) in dermal papilla cells [
43]. This opens the possibility that decreased expression of genes involved in the metabolism of L-ascorbic acid in frontal scalp might render this region more susceptible to hair loss. Taken together, our pathway analysis results show that frontal-specific pathways present several factors with negative effects on hair growth (e.g. androgen and estrogen responses, BMP2/4 signaling, IFN-γ response, endocannabinoid signaling), while occipital-specific pathways are more consistent with factors exerting positive effects on hair growth (e.g. vascular endothelial growth factor signaling, transforming growth factor beta receptor), considering what Bernard has referred to as “the Yin Yang of the human hair follicle” [
33]. Nevertheless, the implications of genetic regulation in frontal scalp for hair loss disorders should be investigated through the generation of a confident eQTL dataset with increased power in future studies.
With our study, we also show that tissue-specific eQTL data are a valuable resource to identify regulatory effects at disease- and trait-associated loci. In particular, our results suggest an important role of PADI3 in hair traits. Indeed, PADI3 is located in the inner root sheath and medulla in anagen HFs and has been reported to play roles in HF differentiation [
44] and hair shaft formation [
45]. Our results also implicate
ANXA3 and
TSPAN10 as novel candidate genes for hair shape and color, respectively. Although we found no overlap between the true HF eQTLs and AGA genetic risk loci, which might be an expected finding, considering that the occipital scalp is not susceptible to balding, the overlapping eGene
ATP2B4 provides a potential novel candidate gene for AGA, as the reported gene for the region is so far
SOX13 [
25,
26].
The most obvious limitation of our study is the sample size. However, we tried to overcome this limitation by applying high quality standards to the data and stringent selection criteria to our HF eQTL results. Another potential limitation of our study resides in the use of plucked HFs instead of intact HFs. However, a skin biopsy is required in order to obtain intact HFs, whereas hair plucking is a less invasive technique. Moreover, it has been demonstrated that plucked hairs retain most epithelial structures, maintain the integrity of the outer root sheet and also contain stem cells [
46]. Finally, due to the small size of our samples, particularly that of the replication sample, we considered advisable to exclude from the present study indels and sex chromosomes, and limit ourselves to reporting, in very general terms, differential findings between HFs from frontal and occipital scalp areas.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.