Background
Materials and methods
Subject population
SNP selection strategy
Molecular genotyping
Statistical analysis
Logistic regression model
Classification and regression trees
Multifactor dimensionality reduction
Modeling and interpreting SNP-SNP interactions
Strategy to identify critical SNP-SNP interactions associated with breast cancer
Step 1
Step 2
Results
Individual SNP effects
Evaluation of two-way SNP-SNP interactions
Rank | LRM | CART | MDR | |||
---|---|---|---|---|---|---|
Interaction |
P-value | Interaction |
P-value | Interaction |
P-value | |
1 |
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 0.013 |
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 0.006* |
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 0.009* |
2 |
COMT-[Met108/158Val]
CCND1-[Pro241Pro] | 0.020 |
CYP17-[C(518)T]
BARD1-[Pro24Ser] | 0.013 |
COMT-[Met108/158Val]
CCND1-[Pro241Pro] | 0.019 |
3 |
TNFA-[G(-308)A]
p27-[Val109Gly] | 0.046 |
BARD1-[Pro24Ser]
XPD-[Lys751Gln] | 0.037 | ||
4 |
IL13-[Arg130Gln]
CYP17-[C(518)T] | 0.037 |
Evaluation of higher-order interactions
Rank | Interaction |
P-value |
---|---|---|
1 |
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 0.006 |
2 |
CYP17-[C(518)T]
BARD1-[Pro24Ser] | 0.013 |
3 |
XPD-[Lys751Gln]
IL10-[G(-1082)A]
IL1A-[Ala114Ser]
ESR1-[Ser10Ser] | 0.016 |
4 |
CYP17-[C(518)T]
IL13-[Arg130Gln]
IL1A-[Ala114Ser]
MTHFR-[Ala222Val]
PTEN-[(IVS4+109)ins/del5 | 0.019 |
5 |
XPD-[Lys751Gln]
BARD1-[Pro24Ser]
MTHFR-[Ala222Val] | 0.020 |
6 |
CYP17-[C(518)T]
BARD1-[Pro24Ser]
MTHFR-[Ala222Val] | 0.043 |
7 |
XPD-[Lys751Gln]
GADD45-[C(IVS3+168)T]
GSTP1-[Ile105Val] | 0.050 |
Rank | Interactions | Testing accuracy* | Permutation P-value |
---|---|---|---|
1 |
IL1A-[Ala114Ser]
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 58.2% | <0.001 |
2 |
ESR1-[Ser10Ser]
CYP17-[C(518)T]
IL10-[G(-1082)A]
COMT-[Met108/158Val]
PTEN- [(IVS4+109)ins/del5]
CCND1-[Pro241Pro] | 60.2% | <0.001 |
3 |
CYP17-[C(518)T]
BARD1-[Pro24Ser]
COMT-[Met108/158Val]
MMP1-[1G(-1607)2G]
CCND1-[Pro241Pro] | 58.4% | 0.001 |
4 |
IL13-[Arg130Gln]
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 57.5% | 0.002 |
5 |
CYP17-[C(518)T]
XPD-[Lys751Gln]
GSTP1-[Ile105Val]
GADD45-[C(IVS3+168)T] | 56.7% | 0.006 |
6 |
XPD-[Lys751Gln]
IL10-[G(-1082)A]
GSTP1-[Ile105Val]
COMT-[Met108/158Val]
MMP1-[1G(-1607)2G]
CCND1-[Pro241Pro] | 57.5% | 0.006 |
7 |
ESR1-[Ser10Ser]
XPD-[Lys751Gln]
IL10-[G(-1082)A]
MMP1-[1G(-1607)2G]
PTEN-[(IVS4+109)ins/del5 | 57.2% | 0.007 |
8 |
CYP17-[C(518)T]
XPD-[Lys751Gln]
IL10-[G(-1082)A]
MTHFR-[Ala222Val]
COMT-[Met108/158Val]
CCND1-[Pro241Pro] | 58.2% | 0.007 |
9 |
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 55.9% | 0.009 |
10 |
CYP17-[C(518)T]
XPD-[Lys751Gln]
IL10-[G(-1082)A]
MTHFR-[Ala222Val]
COMT-[Met108/158Val] | 56.3% | 0.016 |
11 |
BARD1-[Pro24Ser]
XPD-[Lys751Gln]
IL10-[G(-1082)A] | 55.6% | 0.017 |
12 |
ESR1-[Ser10Ser]
XPD-[Lys751Gln]
MMP1-[1G(-1607)2G]
PTEN-[(IVS4+109)ins/del5 | 55.9% | 0.017 |
13 |
COMT-[Met108/158Val]
CCND1-[Pro241Pro] | 55.2% | 0.019 |
14 |
CYP17-[C(518)T]
IL10-[G(-1082)A]
COMT-[Met108/158Val]
PTEN-[(IVS4+109)ins/del5]
CCND1-[Pro241Pro] | 55.7% | 0.020 |
15 |
IL10-[G(-1082)A]
MTHFR-[Ala222Val]
GSTP1-[Ile105Val]
COMT-[Met108/158Val] | 55.7% | 0.023 |
16 |
IL13-[Arg130Gln]
CYP17-[C(518)T] | 54.5% | 0.037 |
17 |
BARD1-[Pro24Ser]
XPD-[Lys751Gln] | 54.7% | 0.037 |
18 |
CYP17-[C(518)T]
XPD-[Lys751Gln]
IL10-[G(-1082)A]
COMT-[Met108/158Val]
PTEN- [(IVS4+109)ins/del5]
CCND1-[Pro241Pro] | 55.9% | 0.046 |
Discussion
Specificity of each method to identify SNP-SNP interactions
Approach | Type of two-locus model detected | Pattern of complex interactions | Potential advantages | Potential limitations | Possible improvements |
---|---|---|---|---|---|
LRM | Logical AND models – multiplicative models | Can not be investigated | Easy to fit | Curse of dimensionality | Logic regression MARS* |
CART | Conditional recessive or dominant models | Driven by SNP main effects and binary splits | Deals with sparse data Useful for risk characterization and prediction | Influence of main effects Redundancy | Random forest Boosting |
MDR | All types | Diverse | Deals with sparse data Useful for risk characterization and prediction | Over-fitting Difficult to find best models Inefficient with large number of SNPs | Limit plausible genetic models Use test statistic |