Validation of CRISPR-based dCas9-DNMT3A-CD targeted
MGMT hypermethylation and differential RNA expression via correlation of
Illumina EPIC 850K methylation array and RNA-Seq analysis.
a Overview of the
Illumina pipeline and generation of supervised hierarchical heatmaps. Raw data from the
Illumina array (.idat files) were imported into R and matched to the
Illumina annotation manifest by probe. Methylation values by probe were passed through a quality control check, CpG sites with single nucleotide polymorphisms (SNPs) were removed, and the data was normalized. These data were then clustered for methylation state by sample and probe ID using M-values with high variance (>2.5 SDs), for unsupervised heatmap generation. Probes were further isolated by adjusted p-value < 0.05 (linear fit and eBayes analysis) for supervised heatmap generation. “On Target” and “Off Target” probes were identified by further filtering the supervised differential methylation data by M-values with high variance across samples (>2 SDs) and a low control average M-value (<−1) and high sgRNA average M-value (>−1). These genes were cross referenced with bulk RNA-Seq differential expression data (evaluated via DESeq2) to determine functionally significant “On Target” and “Off Target” hits. Each step shows a donut plot of approximate percentage of genes from the total array that emerged from the filter criteria for that step, with hypermethylation shown in red, hypomethylation shown in blue, and no change (under that criterion) shown in gray.
b Unsupervised hierarchical clustering of M-values by
Illumina probe (rows) and LN18 cell treatment (columns). M-value variance (standard deviation) across cell type for each probe was calculated for the entire
Illumina 850K array and probes with the highest level of variance (2.5 SDs > average M-value SD; N = 21,278) were isolated and plotted as a heatmap.
Definition of nomenclature: (1), (2) = replicates; t0 = baseline harvest time point (corresponds to approximately 2 weeks after final lentiviral transduction, in this case, s/p GFP-sgRNA or GFP-scRNA transduction); t2 = harvested 2 months after t0; a = indicates samples run on first array batch; b = indicates samples run on second array batch, subsequent to first array. (We performed two separate arrays, at different times, distinguished here by a and b.)
c Raw M-value distributions for all
Illumina probes and cell samples. Control LN18 scRNA samples are shown by the blue traces (scRNA (1) t0 a, traces scRNA (1) t2 a, traces scRNA (1) t0 b) traces, while the LN18 sgRNA samples are shown by the magenta traces (sgRNA) (1) t0 a, (sgRNA) (1) t2 a, (sgRNA) (1) t0 b) and orange traces (sgRNA) (2) t0 a, (sgRNA) (2) t2 a, (sgRNA) (2) t0 b) traces. The approximate cutoff point for the first peak and “low methylation” threshold is indicated by the vertical gray line (−1).
d Average trace of all control LN18 NSC samples, with the average M-value across all probes indicated by the vertical gray line. Vertical blue lines represent M-value standard deviations of varying degrees above and below this average.
e Distribution of M-value variance for all
Illumina probes across all samples, with summary statistics similar to panel
d superimposed. These M-value distribution plots were used for establishing thresholds for determining large increases in methylation state between control LN18 scRNA and sgRNA samples.
f Following validation via unsupervised hierarchical clustering and generation of an initial supervised heatmap (refer to pipeline in
a and Methods), we applied additional filtering for hierarchical clustering of all CpG island probes found to be differentially methylated according to the following criteria: (1) p-adj < 0.05, (2) found in CpG island region, (3) exhibited an increase in methylation M-value from control LN18 scRNA to sgRNA cells greater than 2 standard deviations above the average M-value variance (SD) in all cells/probes, and (4) contained an average control LN18 scRNA methylation M-value of less than −1. Of these criteria, three probes within the
MGMT gene were identified: cg12434587 (open star), cg01341123 (half-closed star), and cg12981137 (closed star), all of which were near sgRNA loci (see Fig.
1).
g Of all the probes surveyed in
a, 333 unique genes were identified and intersected with
DESeq2 differential expression bulk RNA-Seq data (Wald test, p-adj < 0.05). Genes found in both data sets were deemed to be “On-Target and Off-Target” effects, which included
MGMT and nine other genes (refer to Results main text for details)