Introduction

Oesophageal atresia (OA) with or without tracheoesophageal fistula (TOF) are anatomical congenital malformations believed to be caused by multiple genetic and environmental factors.1 With a prevalence of two to three in 10 000 live births, OA/TOF is a rare foregut-related anomaly.2 Around 50% of affected individuals present with additional congenital anatomical malformations.3 Often – but not exclusively – these belong to the VATER/VACTERL association spectrum of vertebral defects (V), anorectal malformations (A), cardiac defects (C), TOF with or without OA (TE), renal anomalies (R) and radial limb defects (L).4, 5

A confirmed genetic syndrome or a chromosomal anomaly – including aneuploidies as trisomy 13, 18 and 21 – can be identified in 6–10% of patients,6 and there is a strong suspicion that genetic factors are involved in the remainder. A genetic background is further suggested by reports of families with multiple affected individuals, higher concordance rates in monozygotic twins compared with dizygotic twins,7 higher recurrence risk for siblings and children of affected individuals and OA/TOF as a component features in numerous known chromosomal aberrations and monogenic syndromes.8 Reports describing disease-causing copy number variations (CNVs) in patients with OA/TOF are rare.9, 10 In addition to their well-established role in the development of congenital anatomical malformations in general,11 CNVs contribute to disease aetiology in several genetic syndromes. These include those having OA/TOF as part of their phenotypic spectrum such as Feingold syndrome,12 22q11 deletion syndrome,13 CHARGE syndrome14 and mandibulofacial dysostosis.15 Furthermore, de novo disease-causing CNVs have been described in patients with non-syndromic OA/TOF and the VACTERL association.16

To determine the contribution of CNVs in OA/TOF aetiology, we profiled 375 Dutch, German and American OA/TOF patients in a comprehensive multiplatform array. We suggest that genomic de novo and rare overlapping CNVs contribute to isolated and non-isolated OA/TOF. These CNVs would harbour one or more disease-related genes or phenotype-modifying factors. We describe the variation detected in our large cohort. This study enabled us to identify several rare overlapping CNVs and nonoverlapping de novo CNVs, which potentially provide new insights into the biological pathways and disease mechanisms involved in the development of OA/TOF.

Methods

Study design

We assessed the CNVs according to the consensus statement for chromosomal microarray analysis described by Miller et al.17 Our study design was based on the assumptions that CNVs are most likely to contribute to the abnormal phenotype in congenital anomalies if (I) a CNV is absent in large cohorts of unaffected individuals, (II) is absent in the unaffected parents of the affected individual and/or (III) is absent or has a population frequency below or comparable to the disease frequency, and (IV) if it targets relevant genes or noncoding RNAs. Recurrence of loci affected by de novo CNVs in single cases could indicate loci harbouring genes mutated or otherwise affected in larger disease cohorts. A detailed description of the study design is given in the Supplementary Methods.

Patient cohort

This study was approved by the institutional ethics committee of each participating centre, and was conducted in accordance with the principles of the Declaration of Helsinki. Patients with OA/TOF (isolated or non-isolated) were identified from the medical records. All patient records were reviewed by the treating physicians or geneticists of each participating centre. After retrieval of parental informed consent, blood was drawn from a total of 375 patients and their parents, comprising 239 patients from the Erasmus MC – Sophia, 28 from the Baylor College of Medicine and 108 from a German multicentre study regarding the genetic and environmental cause of OA/TOF (‘The genetic risk for OA consortium (GREAT consortium)’).

Microarray analysis

High-resolution analyses were performed using single-nucleotide polymorphism (SNP) microarrays (Illumina Inc., San Diego, CA, USA, and Affymetrix Inc., Santa Clara, CA, USA) and CGH oligonucleotide-based arrays (Agilent Inc., San Diego, CA, USA) using standard protocols. SNP data (log-R ratio, B-allele frequency) were visualized to identify potential CNVs via Biodiscovery Nexus CN7.5. (Biodiscovery Inc., Hawthorne, CA, USA) and the GenomeStudio genotyping module (v1.9.4, www.illumnia.com). A detailed description of chip types, normalized output generation and analysis settings is provided in the Supplementary Methods. CNVs were – prior to validation studies – first filtered and prioritized based on size, probe content, quality, frequency in reference cohorts, gene content and frequency in our OA cohorts. All CNVs passing the filter criteria were evaluated manually in a modified version (ie, excluding BAC arrays and small InDels) of the database of genomic variation (http://dgv.tcag.ca/dgv/app/home), ISCA (http://dbsearch.clinicalgenome.org/search/), ClinGen (https://www.clinicalgenome.org/data-sharing/clinvar/) and DECIPHER (http://decipher.sanger.ac.uk). We classified CNVs to be rare if they were absent or present once in our in-house cohort of unaffected individuals (n=3235 individuals). We searched for overlap in large CNV cohorts of control individuals published by Cooper et al.,18 Coe et al.19 and Kaminsky et al.20 We also evaluated the CNVs significantly different in these studies between patients and controls. To confirm the putative de novo and putative deleterious CNVs, patient and parental DNAs were tested with either additional SNP array, real-time quantitative PCR, fluorescence in situ hybridization (FISH) and/or multiplex amplicon quantification (MAQ; Multiplicon N.V., Gent, Belgium). A detailed description of these methods is given in the Supplementary Methods. All rare CNVs are listed in Supplementary Table 2 and are deposited in the ClinVar database (http://www.ncbi.nlm.nih.gov/clinvar/) using the submission name ‘CNV study in EA/TEF’ and using the exact identifiers as described in this manuscript.

Results

Patient cohort

In this study a total of 375 OA/TOF patients were screened for their respective CNV profile. Of these patients, 129 presented with OA/TOF as an isolated defect (34.4%). Of the non-isolated patients with OA/TOF, 142 met the aforementioned criteria for the clinical diagnosis of VACTERL (37.8%).

Microarray analysis

Screening the respective cohorts (see Figure 1) with high-resolution oligonucleotide and SNP microarrays led to the identification of 169 CNV (gene-rich – containing genes – (n=167) and gene-poor (n=2)). These will be addressed as rare CNVs in the remainder of the manuscript. Their size distributions are depicted in Figure 2, genomic locations, evaluation of presence in control databases and classifications are given in Supplementary Table 2. Almost all of the rare CNVs were widely distributed over the genome. However, our analysis yielded a total of 12 loci that were affected by a rare CNV more than once and were present in more than one patient (see Supplementary Table 1 for the regions and phenotypes of patients with rare CNVs and overlapping loci). Inheritance was determined using secondary technology as MAQ assay or qPCR in 17 out of 74 CNVs either suspected to be de novo CNVs after trio-analysis using microarray or based on suspected deleteriousness in single-patient microarray analysis (see Supplementary Figure 1).

Figure 1
figure 1

Filtering and prioritizing CNVs. After quality control and manual evaluation of CNVs, 374 CNVs larger than 30 kb, either absent or rare in the modified Database of Genomic Variants incorporated in the Nexus software, remained. Out of 374, 123 did not contain genes. In all, 257 were absent and 5 were present once in our in-house control database. These 262 CNVs were either gene-rich – containing genes – (n=167) or gene-poor (n=95). Two gene-poor CNVs were suspected of being de novo in microarray trio analysis. Eight out of 74 evaluated CNVs were de novo. Almost all of the rare CNVs (140) were widely distributed over the genome. However, our analysis yielded a total of 12 loci – containing 29 CNVs – which were affected by a rare CNV more than once and were present in more than one patient.

Figure 2
figure 2

Size and type distribution of rare CNV. Total number of rare CNVs in the Erasmus MC – Sophia, Baylor College of Medicine and University of Bonn OA/TOF cohort (=375). Homozygous loss is counted as loss. Bins represent size ranges, for example, the 50–100 kb bin contains all CNVs within the size range of 50–100 kb.

Eight out of these 74 rare CNVs selected for further investigation (10.8 %) – in six patients (1.6%) – were confirmed to be de novo (see Table 1, and Figures 3 and 4 for examples). In addition, one locus harboured a 15q11 de novo CNV deletion (hg19 chr15:g.(?_19339852-20216728_?), common in the database of genomic variants (see Supplementary Figure 1). All but one de novo CNVs were non-recurrent and nonoverlapping in our cohort. For four patients, DNA of only one parent was available, thus preventing determination of inheritance of the rare CNV in the missing parent. Haplotype analysis of the locus could confirm that the haplotype present in the patient was not the haplotype of the available parent in three out of four CNVs. In Table 1 the phenotypes of patients with confirmed de novo CNVs detected in this study and in Table 2 the de novo CNVs described in literature are shown. Most de novo CNVs described here and in literature are non-recurrent, that is, there are no overlapping loci. The only recurrent affected de novo locus is 7q35q36 (see Figure 3). One de novo CNV (16p13.3 duplication, see Table 1) overlapped two inherited 16p13.3 duplications (see Table 3). We classified the rare CNVs as benign (45), uncertain – likely benign (106) and uncertain (7). Interestingly, we could classify nine CNVs as uncertain-likely pathogenic and two as pathogenic. These putative deleterious CNVs seen in 10 patients (2.6%) are depicted in Table 3. Two of these were confirmed to be de novo, four were inherited from parents without OA and for four CNVs the inheritance pattern is not known.

Table 1 De novo CNVs in this cohort
Figure 3
figure 3

De novo deletion ranging from chromosomal band 7q35 to 7q36.3. Note the loss (red) in the patients logR track and the loss of heterozygosity (yellow) in the patients B-allele frequency (BAF) plot. qPCR/FISH/MAQ assay validation results in Supplementary Figure 1.

Figure 4
figure 4

De novo duplication on chromosome 8p22. Note the gain (blue dots/arrow) in the patients’ logR track and allelic imbalance (purple dots/arrow) in the patients BAF plot. qPCR/FISH/MAQ assay validation results in Supplementary Figure 1.

Table 2 De novo CNVs in OA/TOF patients described in literature
Table 3 Putative deleterious rare CNVs in this cohort

Discussion

We hypothesized that both de novo and rare overlapping CNVs could predispose to – or modify the phenotype of – OA/TOF patients. These disease-associated CNVs should be below or in the same frequency range as OA/TOF disease prevalence. We identified 169 of these rare CNVs including eight de novo CNVs (nonoverlapping) and 12 loci with overlapping rare gene-rich CNVs. Six patients in our cohort had rare CNVs confirmed to be de novo. The distribution of these de novo CNVs is comparable between isolated and non-isolated OA/TOF patients: two patients with isolated OA/TOF had one de novo CNV each (0.53% of total patient cohort; 1.55% of patients with isolated OA/TOF). Two patients with non-isolated OA/TOF had one de novo CNV each and two had two de novo CNVs (1.06% in total cohort; 1.62% of non-isolated OA patients). All de novo CNVs were non-recurrent in our cohort. However, there is overlap with structural chromosomal anomalies previously described in OA/TOF.29 For instance, the chromosomal anomaly described by Jackson et al.30 (46,XX,-13,+der(18)t(13;18)(q12;p11.2) overlaps with the 13q12 deletion detected in patient SKZ_1662. Genes in the deleted region may contribute to the OA/TOF aetiology. Unfortunately, little is known about the genes within the region of overlap.

On the basis of the assumption that a CNV has a high likelihood of being pathogenic if it is not present in cohorts of unaffected individuals as well as absent in both unaffected parents, we could classify two out of eight de novo CNVs as (likely-) pathogenic, that is, those at the loci 16p13.11 and 7q35q36. No tracheoesophageal mouse phenotypes are described for any of the genes affected by de novo CNVs except for two genes in the 7q35q36 locus – Shh and Slc4a2. Shh knockout mice have numerous malformations including TOF, a small stomach, reduction of oesophageal tissue fused with the trachea, anal atresia and duodenal stenosis.31 Slc4a2 knockout mice develop hyperkeratosis in oesophageal and stomach epithelium.32 As the 7q35q36 deletion is very large and contains many genes, other genes could also contribute to the abnormal phenotype seen in this patient. The remaining six de novo CNVs affected loci present in large CNV control cohorts (see supplementary table 2). The total number of patients with rare de novo CNVs is 6 out of 375 (1.6%). The de novo rate of 1.6% is slightly elevated compared with the de novo rate per genome/generation described by Itsara et al.33, 34 They estimate a de novo rate of large CNVs to be 1.2%. However, these include more prevalent CNVs and not a selected subset. In other diseases, de novo CNVs have a much higher impact, for example, congenital diaphragmatic hernia35 or intellectual disability.36 The de novo CNVs seen in this study are nonoverlapping and some of them do not affect genes with clear association to the abnormalities seen in patients. Therefore, the significance of some of these de novo CNVs to the disease remains uncertain.

For instance, one de novo 13q11.2 deletion involves a female patient (SKZ_1662) born out of a twin pregnancy. Her twin died in utero. Unfortunately, there was neither patient material nor information regarding the observed congenital anomalies or zygosity status of this fetus available. The female index patient had OA/TOF, tracheal stenosis and a sacral abnormality, and her left kidney was abnormally positioned in the midline. Within the deleted 13q12.11 region one transcribed mRNA (AK054845) and one lncRNA (LINC00540) are located. No biological role or putative function has been described for these RNAs so far. However, one family – with congenital fibrosis of extraocular muscles – was reported having a translocation breakpoint (t(2;13)(q37.3;q12.11) in this region. Mice fgf9 knockouts showed a wide variety of abnormalities, including developmental problems of the skeletal, respiratory and the gastrointestinal systems.37 The 13q12.11 de novo deletion observed in our patient is ~500 kb of FGF9, suggesting that a regulatory region of FGF9 might be affected by the deletion. Female patient SKZ_1307 has a de novo duplication affecting a long noncoding RNA, LINC00114, on chromosome 21 (chr21:40100880-40154748) confirmed with MAQ assay. LINC00114 is located between V-Ets Avian Erythroblastosis Virus E26 Oncogene Homolog (ERG) and V-Ets Avian Erythroblastosis Virus E26 Oncogene Homolog 2 (ETS2) within the Down's syndrome critical region.19 The girl has OA/TOF and anal stenosis as main additional features. She does not have distinct Down's syndrome facial features or mental retardation. The ERG and ETS2 transcription factors might be regulated by LINC00114. Unfortunately, no mouse orthologue for this region exists.38 ERG and ETS2 are implicated as secondary hits – after an initial truncating GATA1 mutation – in the development of neonatal transient myeloproliferative disease preceding myeloid leukaemia seen in Down's syndrome patients.39 Patients with Down's syndrome have a higher prevalence of several gastrointestinal defects, including OA/TOF.40 This is the first de novo duplication involving only one gene or long non-oding RNA in a patient with OA. Further investigation of the role of LINC00114 in OA/TOF and Down's syndrome patients with intestinal atresia is warranted. The identified de novo duplication on chromosome 3p26.1 in female patient DE12OSOUKBD100206 with OA and tracheomalacia comprises LMCD1 encoding LIM and cysteine-rich domain protein 1, which acts as a transcriptional cofactor restricting the function of GATA6,41 a protein having an important role in endodermal differentiation.42 Moreover, GATA6 expression has previously been reported to be elevated during the development and progression of Barrett’s oesophagus in squamous epithelial cells.43 Hence, the present finding of a de novo duplication comprising LMCD1 in a patient with OA/TOF is suggestive of its pathogenic involvement in the development of OA/TOF. The importance and biological impact of the other de novo deletions/duplications is uncertain.

Of note, one de novo loss – a common polymorphism – was detected: hg19 chr15:g.(?_19339852)_(20216728_?)del. This CNV was detected during visual inspection of patient and parental SNP arrays for inheritance of other CNVs. This 15q11.2 polymorphism overlapped with a previously described genetic loss implicated in patients with congenital anatomical malformations, including OA/TOF.44 This region is deleted in three more OA patients in our cohort.8 However, its high frequency in unaffected individuals and repetitive nature (eg, many LINE, SINE and other repetitive elements) hampers interpretation and classification of this CNV.

Overlapping rare CNVs

Rare CNVs are proposed to arise after replication errors11 and have such a low population frequency that either they have arisen recently and have no biological meaning or are somehow detrimental and are virtually extinct from the population. Interpretation of these CNVs is difficult. For instance, they can be ancestry-specific.45 Inheritance of a single CNV from a healthy parent is generally a characteristic of a benign CNV. However, absence of distinct abnormalities in parents carrying the same rare CNV could, for instance, be explained by a subclinical phenotype in these parents, variable gene expressivity, incomplete penetrance, skewed X-inactivation and/or mutations elsewhere in the genome.11 Reduced penetrance or variable expressivity of CNVs has been described in patients with OA/TOF. For instance, Faguer et al.46 described differences in expression of a microduplication in patients with the same microduplication, a father with bilateral vesicoureteric reflux and renal hypodysplasia and his child with left multicystic dysplastic kidney with megaureter, vesicoureteric reflux, bladder diverticulae and OA/TOF. Both patients have the same duplication on chromosomal locus 17q12, which includes HNF1B, a gene mutated in one-fifth of patients with dysplastic kidneys.46

The best way to see whether a CNV is associated to a disease is to do a formal burden test.47 We are not able to do this test because of the limited number of patients in a rare disease, and because of technical limitations (use of different array chips). More details are given in the supplementary discussion. However, we can look for overlap with CNVs described previously in CNV burden studies and inspect whether OA/TOF has been described in patients with such a CNV. Therefore, we used the CNV burden studies published by Cooper et al.,18 Coe et al.19 and Kaminsky et al.20 as a proxy (developmental delay vs controls) after filtering all common CNVs. Here, they did use sufficient numbers of patients and controls and find an enrichment of a small number of loci in this heterogeneous patient population of developmental delay and/or congenital anomalies. Only the 16p13.3 duplication enriched in patients in these studies was recurrent in our cohort. The largest of the three duplications – seen in patient SKZ_2111 – was de novo. The two other paternally inherited 16p13.3 duplications were present in patients SKZ_1988 and SKZ_1150. Duplications of this region between the NOMO1 and XYLT1 gene have been described previously in patients with various phenotypical anomalies, including the OA/TOF-associated congenital anomalies and anal and cardiac malformations.48 None of the other overlapping rare CNVs found in our cohort (see supplementary table 1) was enriched in the developmental delay study.

Non-recurrent CNVs seen in our cohort did overlap enriched CNVs in these burden studies or with CNVs published in patient databases. For instance, the 15q13.3 deletion seen in male patient SKZ_0856 overlaps with a known deleterious CNV18 seen in patients with a highly variable phenotype, which include mild to moderate intellectual disability and variable dysmorphic features.49 Other CNVs with overlap in our study are the gain involving FAT1 on 4q35.2 in patient SKZ_1248, the 6p22 deletion in patient SKZ_1856,18 the 2q13 duplication seen in patient DE61OSOUKBD10019719 and 22q11 gain18 seen in female patient SKZ_1780. Interestingly, two additional published EA/TEF patients have a 22q11 duplication overlapping the one seen in patient SKZ_1780. The DECIPHER database contains an inherited gain (chr22:19095778-19928090) described in patient 3771, with TOF, upper respiratory tract abnormality, coloboma, hearing impairment, horseshoe kidney and a right aortic arch with mirror image branching. The second is a paternally inherited duplication in a patient with OA/TOF and ventricular septal defect.50

Rare CNVs could be determinants in secondary phenotypical anomalies and/or serve as a second ‘hit’ tilting the balance from normal to abnormal development. Duplications might be rescue mechanisms in which a normal copy is duplicated to balance out a copy affected by a mutation, resulting in increased gene expression or deletions might worsen an otherwise less severe condition. OA/TOF is a variable feature in several single-gene disorders. Perhaps the presence of these disorders is higher than currently diagnosed. Recognizing the phenotypical spectra might be hampered by uncharacteristic phenotypical features in patients carrying both a modifying rare CNV and a gene mutation. It might be worthwhile to screen large OA/TOF patient cohorts retrospectively for mutations in known disease genes. Unfortunately, owing to the large number of genes and non-recurrence of de novo CNVs, it is not feasible to establish their contribution to OA/TOF disease aetiology. Moreover, the lack of availability of OA/TOF patient samples and heterogeneity of the rare CNVs hamper formal burden analysis to prove association. However, the de novo nature of CNVs in patients and absence of overlapping CNVs in a large control cohorts is interesting. Perhaps, future CNV profiling or sequencing studies will detect deleterious variation in overlapping genes, paving the way for further single-gene-based functional studies.

Concluding remarks

We hypothesized that de novo and overlapping rare recurrent CNVs could contribute to the disturbed development of the oesophagus. Quantifying CNV prevalence and identity could aid in genetic diagnosis and clinical care selection. We found several de novo and rare overlapping CNVs. Our screening indicated that the prevalence of de novo CNVs on OA/TOF patient population is 1.6%. On the basis of their function, overlap with loci in published case–control studies, known CNV syndromes and foregut phenotypes in animal models, we suggest SHH and SLC4A2 as contributing factors in a contiguous gene deletion to OA/TOF disease aetiology, and 15q13.3, 16p13.3 and 22q11.2 as candidate susceptibility loci. With aneuploidy and structural chromosomal anomalies (~4%) and single base pair mutations (~6 %) now CNVs (~1–2%) total the genetic contribution of OA/TOF disease aetiology to ~11–12%. Mutation screening using candidate gene approaches, whole-exome or whole-genome sequencing as well as sequencing large patient–parent cohorts – both prospectively and retrospectively – will likely reveal known and new pathogenic DNA variations, increasing the contribution of genetics and our knowledge of OA/TOF disease aetiology.