Discussion
Based on a comprehensive analysis of genetic influences on 340 human blood lipids assayed in 5662 individuals from Pakistan, we identified 253 significant associations between 181 lipids and 24 genetic loci. Additionally, in our analysis of 399 lipids in 13,814 British blood donors, we identified significant associations between 244 lipids and 38 independent loci. The majority of genetic regions associated with lipids in PROMIS were also found in INTERVAL; those that did not replicate may be due to the increased sample size in INTERVAL which gave a substantial boost in power. These findings suggest that genetically determined aspects of lipid metabolism are broadly similar in individuals of South Asian and European ancestry, and that DIHRMS can reliably capture differences in lipid levels across diverse populations.
There were six genetic loci specific to lipid levels in PROMIS:
ANGPTL3,
UGT8,
PCTP,
C19orf80,
XBP1, and
GAL3ST1. Angiopoietin-like 3 (
ANGPTL3) is involved in the regulation of lipid and glucose metabolism. SNPs in the
ANGPTL3 region have previously been shown to be associated with major lipids, including LDL-C and total cholesterol [
34,
35]. In PROMIS, rs6657050, an intronic variant in the
ANGPTL3 locus, was significantly associated with [PI(36:2)-H]
- (
m/z 861.5498) (Supplementary Figure
7a).
UDP glycosyltransferase 8 (
UGT8) catalyses the transfer of galactose to ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are abundant sphingolipids of the myelin membrane of the central and peripheral nervous system. In PROMIS, rs28870381, an intergenic variant in
UGT8, was associated with [PG(32:1)+OAc]
- (
m/z 779.5078) (Supplementary Figure
7w).
Phosphatidylcholine transfer protein (
PCTP) catalyses the transfer of phosphatidylcholines between membranes and is involved in lipid binding. Through regulation of plasma lipid concentrations, it may also modulate the development of atherosclerosis [
36]. In PROMIS, rs11079173, an intronic variant in the
PCTP locus, was associated with [PA(40:5)+OAc]
- (
m/z 809.5337) (Supplementary Figure
7p).
C19orf80, also known as angiopoietin-like 8 (
ANGPTL8), is involved in the regulation of serum triglyceride levels and is associated with major lipids including HDL-C and triglycerides [
35]. In PROMIS, rs8101801, an intronic variant in the
C19orf80 locus, was significantly associated with [PI(38:4)-H]
- (
m/z 885.5498) (Supplementary Figure
7d).
Galactose-3-
O-sulfotransferase 1 (
GAL3ST1) catalyses the sulfation of membrane glycolipids and the synthesis of galactosylceramide sulfate, a major lipid component of the myelin sheath. In PROMIS, rs2267161, a missense variant in the
GAL3ST1 locus, was associated with [PG(32:1)+OAc]
- (
m/z 779.5078) (Supplementary Figure
7i).
X-box binding protein 1 (
XBP1) functions as a transcription factor during endoplasmic reticulum stress by regulating the unfolded protein response. It is also a major regulator of the unfolded protein response in obesity-induced insulin resistance and T2D for the management of obesity and diabetes prevention. Recent studies have shown that compounds targeting the
XBP1 pathway are a potential approach for the treatment of metabolic diseases [
37]. In addition,
XBP1 protein expression, which is induced in the liver by a high carbohydrate diet, is directly involved in fatty acid synthesis through de novo lipogenesis. Therefore, compounds that inhibit
XBP1 activation may also be useful for the treatment of NAFLD [
38]. In PROMIS, rs71661463, an intronic variant for which
XBP1 is the candidate causal gene, was associated with [SM(37:1)+H]
+ (
m/z 745.6216) (Supplementary Figure
7x). Recent research across many species has shown that
XBP1 is a transcription factor regulating hepatic lipogenesis. In mice, hepatic
XBP1 expression is regulated by proopiomelanocortin (POMC) during sensory food perception and coincides with changes in the lipid composition of the liver with increases in PCs and PEs [
39]. Although previous studies have shown direct links between
XBP1 and overall lipid metabolism, this is the first time a genetic association has been reported between
XBP1 and lipid metabolites in humans, affecting sphingomyelins, PCs, and PEs (Supplementary Figure
7x).
Our findings for the
PNPLA3 and
MBOAT7 loci were also notable.
PNPLA3 is a multifunctional enzyme that encodes a triacylglycerol lipase, which mediates triacylglycerol hydrolysis in adipocytes and has acylglycerol
O-acyltransferase activity. The relationship between rs738409, a nonsynonymous variant (p.Ile148Met) in the
PNPLA3 gene, and non-alcoholic fatty liver disease (NAFLD) has been well established [
40]. This variant has been shown to impair triglyceride hydrolysis in the liver and secretion of triglyceride-rich very low-density lipoproteins, leading to the altered fatty acid composition of liver triglycerides, and is also associated with reduced risk of CHD [
41] and increased risk of type 2 diabetes (T2D) [
42]. This suggests that targeting hepatic pathways to reduce cardiovascular risk may be complex, despite the clustering of cardiovascular and hepatic diseases in people with metabolic syndrome. Our analysis offers granularity to the previously identified total triglyceride associations with
PNPLA3 by identifying two specific triglyceride species that may have a role in
PNPLA3 function.
MBOAT7, which contributes to the regulation of free arachidonic acid in the cell through the remodelling of phospholipids, was reported as being associated with the metabolite 1-arachidonoylglycerophosphoinositol in a previous mGWAS [
19] (known as [PI(36:4)-H]
- in our study), but we found that the lead SNP in this locus, rs8736 (chr19:54677189), was also associated with a wide range of phosphatic acids, phosphatidylcholines, phosphatidylethanolamines, and phosphoinositols (Supplementary Figure
7m). Several studies have shown that
MBOAT7 (also known as lysophosphatidylinositol-acyltransferase 1 [
LPIAT1]) is responsible for the transfer of arachidonoyl-CoA to lysophosphoinositides [
43]. The creation of
MBOAT7-deficient macrophages show a decreased level of [PI(38:4)-H]
- and an increase of [PI(34:1)-H]
- as well as [PI(40:5)-H]
- [
44]. The T allele of rs8736, a 3’ UTR SNP, shows a similar shift in the phosphatidylinositol metabolism. Our work shows that this SNP is also strongly associated with [PI(38:3)+OAc]
- (
m/z 947.5866), which is likely to be the dihomo-gamma linoleic acid (20:3n6)-containing phosphoinositol. None of the papers testing the substrate specificity of
MBOAT7 have included dihomo-gamma linoleic acid or [PI(38:3)+OAc]
- in their analysis. Thus, we provide novel evidence in humans that there is an association between
MBOAT7 activity and circulating phosphatidylinositols, a finding that requires further replication.
Our network diagram helped identify sphingomyelins that were associated exclusively with four loci that were not associated with any other lipid subclasses:
GCKR,
SGPP1,
MLXIPL, and
XBP1. Sphingomyelins have previously been shown to be associated with
SGPP1 [
45], but the associations of sphingomyelins with these other three loci are reported here for the first time.
GCKR has been shown to be associated with total cholesterol and triglycerides (see Fig.
2) and has also been associated with the plasma phospholipid fraction fatty acids 16:0 and 16:1 [
46,
47]; most lipids that we found to be associated with
GCKR (Supplementary Figure
7j) are likely to contain these particular fatty acids. It has been suggested that the glucokinase receptor, encoded by
GCKR, affects the production of malonyl-CoA, an important substrate for de novo lipogenesis [
46]. To a similar extent, there is a known relation between
MLXIPL and carbohydrate and lipid metabolism.
MLXIPL is a transcription factor affecting carbohydrate response element-binding protein (CREBP) and therefore also plays a role in lipogenesis. Although both these genes have previously been linked to lipogenesis, we discovered that genetic variation at genes involved in the regulation of lipogenesis has been implicated in altering sphingomyelin concentrations.
The network diagram also helped recapitulate known biological relationships between lipids. As we established in our previous analysis [
7], the number of significant partial correlations between lipids of different subclasses was significantly higher than would be expected due to chance alone. This analysis further showed that genes that were significantly associated with lipids of a particular subclass regulated all of the lipids within the subclass in a similar manner. Therefore, the total concentrations of a given lipid class associated with a genetic locus are less affected by the proportion of fatty acids present in those lipid species.
In summary, our analyses resulted in the following new insights in an understudied South Asian population: (1) we established that decreased levels of sphingomyelins are associated with genetically lower LPL activity; (2) we revealed a wide range of glycerophospholipids that are associated with variants in the MBOAT7 locus; (3) we identified several new associations of phosphatic acids, phosphocholines, and phosphoethanolamines with variants in the LIPC region; (4) we found several novel associations of sphingomyelins and phosphocholines with variants in the APOE-C1-C2-C4 cluster; (5) we discovered four new associations of sphingomyelins with variants in the SGPP1 locus; and (6) we found several previously unreported associations of phosphocholines, sphingomyelins, and ceramides with variants in the SPTLC3 locus. These findings can help further the identification of novel therapeutic targets for prevention and treatment.
Our investigation into the genetic influences of lipids has several strengths. First, the research involved participants from a population cohort in Pakistan, thereby enhancing the scientific understanding of lipid associations in this understudied population, and we compared the findings with a typical Western population of British blood donors using the same lipid-profiling platform. Second, the analysis was based on a relatively large dataset of 5662 participants from Pakistan and an even larger cohort of 13,814 individuals from the UK, thereby increasing statistical power to detect associations. Third, our mGWAS was performed in individuals free from established MI at baseline in PROMIS and healthy blood donors in INTERVAL, which reduces spurious associations due to the disease state or potential treatments. Finally, our newly developed open-profiling lipidomics platform was utilised to provide detailed lipid profiles, with a wider coverage of lipids than most other high-throughput profiling methods [
7], which improved our ability to detect novel associations and our understanding of the detailed effects of known lipid loci at the level of individual lipid species.
Nevertheless, our study has several technical limitations. To enable the rapid and robust lipid profiling of such a large number of samples, we employed DIHRMS. Despite the advantages of this platform, it is unable to distinguish isobaric lipids. This means that different lipid species can contribute to the same signal; for instance, [PC(32:1)+H]
+ and [PE(35:1)+H]
+ both have the same molecular formula (C
40H
77NO
8P) and will both contribute to the signal of
m/z 732.5541. Furthermore, even [PC(32:1)+H]
+ consists of both PC(16:0/16:1) and PC(14:0/18:1). These limitations are discussed in detail in our previous methodological paper on this platform [
7], while the relevance of using these aggregate of signals in metabolic studies has been shown by other studies [
45]. Further work, with improved analytical resolution, will enable further pinpointing of the relevant lipids to the identified loci.
The cohorts included in our analysis also have several potential limitations. First, possible selection biases arise from the case-control design of PROMIS, although this was minimised by the recruitment of controls from patients, visitors of patients attending out-patient clinics, and unrelated visitors of cardiac patients. Second, serum samples in PROMIS were stored in freezers at −80°C for between 2 and 8 years before aliquots were taken for the lipidomics measurements, which we accounted for by adjusting the analyses by the number of years that the samples had been stored. Although residual confounding and deterioration of lipid profiles may still exist, such deterioration is unlikely to have been related to genotype. Third, a majority (76%) of PROMIS participants had not fasted prior to blood draw, and a small proportion of participants (7%) had reportedly fasted for an unknown duration. Recent food consumption may have had significant effects on lipid levels and influenced the results. Our analyses adjusted for fasting status although we lacked statistical power to stratify by fasting status. Fourth, PROMIS participants were recruited from multiple centres in urban Pakistan [
7], but it is unclear whether the findings from this study would be generalizable to individuals living in rural villages and other parts of Pakistan, or in other countries in South Asia. However, the confirmatory analysis in INTERVAL, in which we identified significant associations with lipids for the majority of the genetic loci found in PROMIS, helps strengthen the argument that these findings are generalizable. Additionally, many of the lipids were associated with known genetic regions such as
APOA5-C3 and
FADS1-2-3, which have already been shown to be associated with multiple lipids in other Western populations, further strengthening the validity of the findings from this analysis. Finally, although two-sample Mendelian randomization approaches to make causal inferences about the association of lipids with CHD risk factors and disease outcomes hold great promise in the lipidomics arena [
48], extensive pleiotropy made it too difficult to disentangle the findings and we chose not to pursue this avenue. Therefore, although especially stringent procedures were followed, highly conservative cut-offs were used to determine statistical significance, and rigorous pre-analysis and post-analysis quality control steps were performed, there is still a possibility that some of the findings were false positives that arose due to artefacts rather than being true signals. Additional analyses in other populations using the DIHRMS lipidomics platform would be helpful to further replicate our findings. Moreover, the identified pathways and proposed molecular mechanisms require validation through functional analyses in model organisms and humans.
Further research will be able to leverage these lipidomics results in combination with whole-genome and whole-exome sequencing performed in PROMIS and INTERVAL to help understand the consequences of loss-of-function mutations identified in these participants [
49].