Background
Several genome-wide studies [
1‐
3] have provided evidence for significant genetic linkage between a chromosomal region on 9q22 and an increased risk of colorectal cancer (CRC). A further study confirmed this linkage signal and fine-mapped the association to a region centred around 98.15 Mb [
4]. Biologically, this chromosomal region houses several interesting candidate CRC susceptibility genes including
PTCH1,
XPA, GALNT12 and
TGFBR1[
5]. Follow up efforts have particularly focused on
TGFBR1 (hg19 coordinates, chr9:101.87-101.92 Mb), but with largely inconclusive results [
3,
5‐
7].
The transforming growth factor β receptor type 1 (
TGFBR1) gene is an attractive candidate as TGF-β signalling plays an important role in the control of a range of biological functions associated with colon carcinogenesis including tissue homeostasis, angiogenesis, inflammation, proliferation and cellular differentiation and has and has also been implicated in both the suppression and promotion of CRC (see [
1] for a recent review). On binding of the TGF-β ligand to TGFBR1, this serine/threonine protein kinase-containing receptor forms a heteromeric complex with type II TGF-β receptors thereby transducing the TGF-β signal from the cell surface to the cytoplasm. A common variant of
TGFBR1, rs11466445 (heterozygote frequency 0.211; dbSNP135), contains a deletion of three GCG triplets from the sequence of exon 1, resulting in the expression of a mutant receptor protein with six consecutive alanine (
TGFBR1*6A) rather than nine consecutive alanine (
TGFBR1*9A) residues. This is a hypomorphic mutation encoding a TGFBR1 variant protein with reduced TGF-β growth inhibition-signalling activity. The
TGFBR1*6A allele has been proposed to act as a low-penetrance susceptibility allele for a number of malignancies [
8], perhaps acting by decreasing
TGFBR1 allelic expression. Allele specific expression (ASE) of TGFBR1 in peripheral blood lymphocytes has been observed, with decreased expression associated with the *6A allele and two other SNPs in linkage disequilibrium [
9]. Another study examined SNPs in the 3′ untranslated region of TGFBR1 and found that 29 of 138 patients with MSI-negative CRC showed ASE, with 14 of the 29 (48%) having a *6A/*9A genotype and clear enrichment of ASE in familial cases [
10].
Although some studies have suggested that the
TGFBR1*6A allele confers an elevated risk of colorectal cancer [
5,
8,
11], most studies have not found such an association [
12‐
17]. A recent large meta-analysis of rs11466445 and colorectal cancer risk assessed nine association studies totalling 6,765 CRC patients and 8,496 unrelated controls and found that heterozygous *6A/*9A carriers showed a significantly increased risk of CRC with a pooled odds ratio (OR) of 1.12 (95% CI = 1.02–1.23; p = 0.013) compared to homozygous *9A/*9A carriers [
18]. A further recent meta-analysis, which included 15 subgroups (7,154 case and 8,851 controls), did not find an association with CRC with overall significance (OR = 1.085, 95% CI = 0.963, 1.222; additive model), but instead found a significant association with breast and ovarian cancer. The difference from the previous meta-analysis was the exclusion of one study and the inclusion of two further studies [
19]. One of the included studies genotyped rs11466445 in a Spanish cohort somewhat enriched for familial cancer, with ~15% of cases having an affected first-degree relative and found it to be borderline significant with diagnosis of CRC (p = 0.0491; 515 cases, 515 controls) [
5]. In the context of familial CRC in particular, two studies have examined families with genetic predisposition [
15,
20]. In both studies, a case–control design was used - drawing on only one affected member from each family and comparing this group with unrelated controls. In each instance,
TGFBR1*6A was not found to be associated with an increased familial colorectal cancer risk. Interestingly, a further study found evidence that the
TGFBR1*6A allelic frequency is higher amongst familial CRC patients with mismatch-repair (MMR) negative disease [
21].
There have been no reports to date that have explored the likelihood of an association of
TGFBR1*6A with hereditary CRC using any family-based association test (FBAT) [
22‐
24], or family-based case–control test designed for related individuals [
25]. The family of FBATs examine associations within family groups and so are robust to population stratification, a known confounder of case–control studies [
24]. It has been suggested this robustness comes at some cost. Simulations show that classical FBATs are less powerful than case–control tests [
24,
26], as the latter examine between-family associations instead of exclusively within-family associations. Counter to this argument, the groups of affected relatives sampled from multiplex families should have more power to detect an association due to the higher than expected frequency of susceptibility alleles, compared with affected individuals having sporadic disease [
25]. It is also possible to use quasi-likelihood score (QLS) tests, an alternative class of tests to FBATs with different theoretical underpinnings. As opposed to within-family tests, these are between-family case–control tests that can account for the correlation between individuals in families [
25].
We recently completed a new genome-wide linkage study [
27] using non-syndromic CRC families from three distinct regions in Australia and Spain. One of the linkage regions of interest identified in that study was located on chromosome 9q, proximal to the previously reported 9q22 linkage region, which contains the
TGFBR1 locus. We genotyped an expanded set of families for rs11466445 and used FBATs and the “More Powerful” Quasi-Likelihood Score Test (M
QLS) to test for association with diagnosis of colorectal neoplasia (i.e. either colorectal adenocarcinoma or advanced adenoma). We report that after applying several family-based association tests we only found a nominally significant result using the PBAT rapid algorithm (p = 0.028), with another three FBAT algorithms all non-significant, but each yielding a p-value <; 0.10. There was no evidence of an association using the M
QLS case–control model (p = 0.41).
Discussion
Testing with the rapid PBAT algorithm gave a nominally significant result under an additive genetic model (p = 0.028). Under the GDT, GDT-PO and FBAT approaches we did not find a significant association, but all the p-values were consistently borderline, with p-values <; 0.10. Unlike the FBAT approaches, we found no evidence of an association using MQLS, a case–control method that corrects for relatedness amongst subjects (p = 0.41).
The differences in p-values between the methods, under the same hypotheses and genetic models are due to the formulation of the test and also the treatment of family structures, which leads to differences in groupings informative for the test statistic.
The GDT, a robust generalisation of the intuitively simple transmission-disequilibrium test (TDT), examines transmission disequilibrium between pairs of discordant relatives. The variant GDT-PO test, considers only parent–child discordant pairs. As relatively few parent–child pairs were genotyped in this study, the test will have much reduced power. However, given the age of the parents, the result should be more robust to misspecification of phenotype. The FBAT and PBAT algorithms are highly related and examine transmission disequilibrium from parents to affected offspring.
For the FBAT statistic, informative families are those with at least one parent heterozygous for the two
TGFBR1 alleles and having affected offspring. The use of only affected offspring in the association statistic makes it robust to phenotype misspecification of affected people as unaffected. In the case of a missing parent, or parents, the test conditions on the sufficient statistic for the genotype distribution in each family; where a parent genotype is expressed as a set of likelihoods conditioned upon known offspring genotype(s). The design of the FBAT necessitates that extended pedigrees are split into nuclear families, which can introduce bias due to correlation. The FBAT also requires specification of the genetic model. The PBAT rapid algorithm differs from FBAT in that extended pedigrees are broken up into clusters of trios who share the same parents. The rapid algorithm in PBAT tests only the minor alleles and offers the ability to generate p-values using a robust Monte Carlo permutation based method instead of the asymptotic Normal distribution [
23]. It also offers time to onset analyses with the same empirical p-values. Finally, the M
QLS test is very different to the others, and is a regression model rather than a family-based association test. It considers both within- and between-family associations using a linear regression of genotypes on affection status with correlations for relatedness modelled as a kinship coefficient random effect.
The closeness in p-values between all the family-based methods demonstrates the finding is not particularly sensitive to different assumptions underpinning these various algorithms. More broadly, given the difference in informative pairs/families in each FBAT method and nature of the algorithm, the general agreement across methods suggests this marginal evidence of an association is not a spurious result.
However, the nominally significant PBAT result should perhaps be treated with some caution. While the p-value was generated using a robust empirical Monte-Carlo based method, it is possible the partitioning of people into clusters of trios may inflate type I error. As the 22 informative nuclear families are broken into 26 informative clusters of trios, there is a degree of correlation between some clusters that is unaccounted for by the approach. Conversely, there is reason to think that such correlation may not greatly influence the p-value. The similarity in p-values between the GDT and the FBAT, which also splits extended pedigrees, suggests the difference in handling of extended pedigree structure between the methods did not overly affect the association test result in this instance.
In essence, a method (PBAT) which examines transmission to affected relatives but breaks pedigrees up for computational reasons is significant, while a method (GDT) that examines discordant relative pairs that does not adjust pedigree structure is non-significant. It is unclear how the different pairings or pedigree structure between these methods is contributing to the difference in p-value. In simulations, the GDT test was found to have more power over the FBAT in the majority of nuclear family and extended pedigree structures tested [
24]. However, nuclear families with two missing parents (which include most of this present cohort) were not simulated, so it is possible the FBAT implemented in the PBAT software is more powerful in this scenario and may help explain the lower p-value.
Only one SNP was tested for association with CRC, however, the genotype data was reformulated into several tests and genetic models with different treatment of the genotype data and family structures. Correction for these multiple tests can be applied, but such correction assumes independence of the tests. As these tests are very dependent, such a correction is highly conservative. Given the number of tests made of this single hypothesis, the PBAT association will become non-significant after correction for multiple tests.
All reported association studies of rs11466445 have been of a case–control design [
18]. Even studies that have gathered affected cases from families with an inherited predisposition have used a case–control design, with one case selected from each family and compared to unrelated controls [
15,
20]. To our knowledge, this present study is the first to examine the association between rs11466445 and colorectal cancer using family-based association statistics or case–control methods for related individuals. Our study is designed to examine the association of rs11466445 with colorectal neoplasia diagnosis within families predisposed to CRC. The within-family approach frees the analysis from concerns about population stratification.
Collectively, current evidence suggests the *6A allele is a relatively minor contributor to CRC prevalence. A modest 12% and 8.5% increase in CRC risk, respectively, was found by two meta-analyses across large populations [
18,
19]. The first meta-analysis found the
TGFBR1 *6A allele to be significantly associated with CRC while the latter did not. The authors of a recent review considering the *6A allele association and ASE studies together concluded that the effect of the allele on CRC predisposition is, at best, very subtle [
33].
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
LJL performed all the genotyping and experimental lab work, prepared data for analysis and helped with the manuscript; GSB performed experimental lab work; JPR, BT and IWS analysed the data; GPY, FM, IB and GC coordinated the original collection of the samples. JPR, BT and GNH wrote the manuscript; TJL contributed programme support and insightful critique; GNH conceived and designed the current study. All authors participated in data interpretation and critical revision of the manuscript. All authors read and approved the final manuscript.