Different sequencing platforms each have different read-lengths, depths and error profiles (Additional file
1: Tables S1-2). 454 sequencing uses emulsion PCR and pyrosequencing and can produce reads potentially over 800 bp [
17], and therefore has the capacity to sequence a full BCR amplicon in a single read. However, the 454 platform has high homopolymeric base pair error-rates caused by accumulated light intensity variance [
16],[
18]–[
20]. The Illumina MiSeq has the highest throughput per run (1.6 Gb of sequence/run, 60 Mb/hour) [
17] and lower overall error rate, particularly in homopolymeric regions [
21]. MiSeq however has its own distinct error profile of single-base errors associated with GGC motif [
22] and at the 3’ end of the reads compared to the 5’ end. MiSeq can currently generate up to 300 bp paired-end reads that allows for paired-end joining and full coverage of multiplex PCR amplicons. We compared sequencing technologies by taking two portions of RNA from 8 CLL and 6 healthy PB samples and performed PCR followed by 454 or MiSeq (250 bp paired-end) sequencing (Figure
1A, sequencing comparison). The IgHV frequencies between the sequencing methods were highly correlated (R
2-value = 0.9844, y = 0.998x, Figure
1F). As the correlation might be skewed by the very high clonality of the CLL samples, we assessed the correlation at low frequency gene usages. Again, greater variation of low frequency variants suggests both effects of stochastic re-sampling and platform-specific differences (gene frequencies <15% representing typical observations from diverse B-cell samples, Figure
1G, R
2-value = 0.5885). The individual BCR sequence frequencies were also highly correlated (Additional file
1: Figure S3A), suggesting that repertoire structure is retained when using the same amplification method on different sequencing platforms. However, due to the lower homopolymeric indel rate, only the MiSeq platform is currently appropriate for filtering read sets for open reading frames (and subsequent translation into protein sequence). MiSeq also has the advantage of a higher sequencing depth per lane, therefore allowing higher levels of multiplexing of samples and reducing the per-sample cost.