Background
Lacunar stroke has been recognized as a stroke subtype for over 50 years, although the etiology and whether it differs from cortical ischemic stroke are still debated [
1]. Approximately 30% of patients with lacunar stroke are left dependent, and up to 25% of patients are predicted to have another stroke within 5 years [
2]. The increase in large-scale genome-wide association studies (GWAS) has greatly aided the discovery of genetic variations linked to lacunar stroke during the last decade [
3]. However, deciphering the underlying biological processes responsible for the great majority of these genetic effects remains difficult, which has hampered the translation of these genetic results into novel drugs targeting these candidate genes for lacunar stroke [
4].
Proteins are the most efficient biomarkers and therapeutic targets [
5,
6] as they represent the major functional components of cellular and biological processes and the end products of gene expression [
7]. It is critical to investigate the risk proteins in the brain disorders [
8,
9]. Previous research on lacunar stroke examined genetic, epigenetic, and transcriptome variables [
10,
11], but few studies have explored brain proteins directly [
12]. For example, previous studies identified an association between loci on chromosome 16q24.2 and small vessel stroke in 4203 cases and 50,728 controls [
13]. In addition, a transcriptome-wide association study identified associations between the expression of six genes (
SCL25A44,
ULK4,
CARF,
FAM117B,
ICA1L,
NBEAL1) and lacunar stroke [
14]. The current breakthrough in high-throughput proteome sequencing of complex tissues [
15,
16] represents a significant step forward in the large-scale quantification of the human brain proteome. Wingo et al. developed a novel framework called proteome-wide association studies (PWAS) to combine gene and protein expression data with the results of GWAS (integrate gene expression data and GWAS results) in depression pathogenesis [
12]. Ou and colleagues also revealed that particular genetic variants impact disorders by altering the quantity of brain proteins, and uncovered potentially brain-pathogenic proteins in Alzheimer’s disease [
17]. Thus, the causal inference of this integrated analytical approach has been empirically verified and shown to be reliable [
17,
18].
Accordingly, we sought to discover novel drug targets for lacunar stroke by combining high-throughput proteomics in the brain with genetic data to determine the genomic architecture-associated protein levels. To identify potential protein biomarkers, we systematically linked protein biomarkers to lacunar stroke by taking a four-step approach. First, we used two protein quantitative trait locus (pQTL) datasets obtained from brain tissue and findings from lacunar stroke GWAS to perform a PWAS analysis. Second, we used independent Mendelian randomization (MR) analysis to verify PWAS-significant genes. Third, we used a COLOC to integrate GWAS data and brain pQTL using a Bayesian colocalization analysis to explore whether two associated signals are consistent with shared causal variant(s). Fourth, we explored the significant genes driving GWAS signals at the transcriptional level by leveraging gene expression data.
Discussion
In the present study, we employed a pipeline of analytical techniques investigating the functional associations between protein biomarkers in the brain and lacunar stroke risk. We identified 7 potential risk genes (ICA1L, CAND2, ALDH2, MADD, MRVI1, CSPG4, and PTPN11) of lacunar stroke with altered protein abundances in the brain. Four (ICA1L, CAND2, ALDH2, MADD) of these 7 genes were replicated in the independent PWAS and MR validation analyses of lacunar stroke, providing a higher confidence level. Furthermore, we identified ICA1L, CAND2, and ALDH2 from comprehensive analyses, including non-lacunar stroke brain PWAS and colocalization, and ICA1L was supported at the brain transcriptional level. These genes may serve as promising targets for further mechanistic and therapeutic studies.
Identifying therapeutic targets for diseases is a crucial goal of human genetics research and is particularly vital for neurovascular diseases, including lacunar stroke. Our analysis implicated genes previously investigated in lacunar stroke, such as
ICA1L and
MADD, as well as new candidates, including
CAND2,
ALDH2,
MRVI1,
CSPG4, and
PTPN11. Two genes (
ICA1L and
MADD) reported in lacunar stroke play roles at the synapse.
ICA1L encodes a protein triggered by type IV collagen and plays a crucial role in myelination [
39]. According to our lacunar stroke PWAS data, ICA1L has a lower abundance in the brains of lacunar stroke patients. Furthermore, we discovered that ICA1L was enriched in cortical glutamate neurons. Glutamate neurons are crucial components in neural development and neuropathology through their role in cell proliferation, differentiation, survival, and neural network formation. Our findings imply that decreased
ICA1L may impair excitatory synaptic signaling and contribute to the pathogenesis of lacunar stroke.
ICA1L has also been linked to the etiology of lacunar stroke in previous transcriptome investigations [
14]. Our findings show that
MADD is more abundant in glutamate neurons. We speculate that
MADD is primarily involved in the transmission of apoptotic signals in neuronal signaling pathways [
40], consistent with previous research suggesting that ischemia causes excitatory glutamate toxicity [
41‐
44].
Other notable molecular roles for the 5 novel genes in lacunar stroke include cerebral cavernous malformations, vascular inflammation, platelet adhesion, and cell apoptosis.
CAND2, which encodes cullin-associated and neddylation-dissociated 2, plays a role in cerebral cavernous malformations [
45]. Cavernous malformation is a key inducing factor in lacunar stroke and cerebral microbleeds [
46]. According to our findings,
CAND2 is decreased, predominantly in GABAergic neurons in the brains of lacunar stroke patients, indicating its role in the etiology of lacunar stroke. Both
ALDH2 and
MRVI1 are involved in platelet adhesion [
47] and vascular inflammation [
48,
49]. Previous research has linked increased blood-brain barrier permeability to an inflammatory process involving activated monocytes/macrophages in individuals with cerebral small vessel disease [
50,
51]. In our study,
MRVI1 was more abundant in astrocytes, which supports their roles in vascular inflammation. CSPG4, also known as neuron-glial antigen 2 (NG2) [
52], is a protein that helps to stabilize cell-substrate connections [
53‐
55]. Finally, we discovered a novel protein, PTPN11, as a new candidate for a membrane protein that suppresses cell growth and induces apoptosis [
56‐
59]. These 7 genes are implicated in the molecular process and neuropathological changes in lacunar stroke.
Most trait-associated variants in neuropsychiatric disease are found in protein-noncoding areas of the human genome, where they have previously been linked to transcriptional levels [
60‐
62]. As such, we applied eQTLs to understand GWAS-related transcriptional regulatory mechanisms in lacunar stroke. However, only the
ICA1L-identified proteins exhibited changes in gene expression. There could be several reasons for this lack of agreement. First, while the exact link between eQTLs and pQTLs has yet to be discovered, the mRNA expression and protein levels of many genes are uncorrelated, owing in part to various posttranscriptional factors such as sequence characteristics implicated in protein translation and degradation [
63]. Second, assay technical artifacts and differences in data analysis may impact the results significantly. While opposed to pQTL analysis [
64], eQTL studies use stricter criteria to detect remote regulatory changes, resulting in a lower false-positive rate. In addition to raising thresholds, one way to improve the performance is to use strong tools like FUSION [
26], MR [
29], and COLOC [
26,
30] to check findings with independent samples. To address this difficulty, however, it is essential to expand the depth and variety of multiomics sequencing at the individual level.
Clinical trials have been conducted using drug compounds targeting one of the three causal genes, including
ALDH2 (ranked as high confidence level in our findings), for alcohol dependency and parasite infection (two drugs, phase 4) [
65]. Secondary analysis of those and future drugs in clinical trials would likely be helpful to prove the idea that the proteins are involved in the development of lacunar stroke.
Our study has several advantages. First, PWAS of lacunar stroke was conducted using the largest and most comprehensive human proteome and summary statistics from the most recent lacunar stroke GWAS. Second, we performed the replication PWAS using independent human brain proteome and verified the risk proteins with independent MR validation analysis. Third, based on Bayesian colocalization used to estimate the probability that two associated signals were observed at a particular site with a common causal variant, we confirmed the pathogenetic protein (ICA1L, CAND2, and ALDH2) of lacunar stroke. Fourth, this study analyzed both mRNA and protein levels associated with lacunar stroke utilizing both the PWAS and the TWAS. Finally, the dorsolateral prefrontal cortex in the current study was chosen because it includes the cell type most linked to lacunar stroke [
14]. Furthermore, the prefrontal cortex has been proposed as a top-down control system that connects other brain areas to facilitate sophisticated cognitive functions. Prefrontal brain risk protein screening for lacunar stroke may help identify critical targets for enhanced cognitive function as well as those who are at high risk of stroke recurrence [
2,
66].
The current study has several limitations. First, pQTL and eQTL mapping cannot solve all GWAS signals. At a single level, such as the protein level, the function of genes in the biological development of lacunar stroke is difficult to explain. More epigenetic investigations, based on mQTL, single-cell sequencing, and whole-genome sequencing, are needed to design tailored therapy regimens and offer a complete understanding of the molecular mechanisms implicated in lacunar stroke [
67,
68]. Second, the method for detecting Slow Off-rate Modified Aptamers was limited to a subset of proteomes and did not cover the whole proteome. Third, because current proteome samples vary by ethnicity, further expansion of the scale and diversity of brain proteome data can help with more precise estimates and enable its broader applications.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.