Background
Tumors are composed of a diverse multicellular microenvironment that dictates cancer progression and response to therapy. While cells share an identical genome, their phenotype and behavior are driven by their transcriptome and proteome [
1]. Cellular heterogeneity within the tumor ecosystem has precluded the ability to fully understand the cell biology and interactions that drive cancer progression [
1]. Recently, single-cell RNA-seq (scRNA-seq) has emerged as an innovative technology to characterize individual cells from heterogeneous tissues in order to understand cell types, states, and lineages [
2]. The rapid adoption of this technology has led to a flurry of research generating single-cell atlases for many organs, cancers, and developmental models, enriching our understanding of cell biology [
3].
Despite the tremendous success of this technology when applied to different cancer types, sarcomas, which are cancers of mesenchymal origin, have not yet widely benefited from the adoption of scRNA-seq. Differences in tissue origin may require optimized dissociation to capture accurately in vivo gene expression and cellular composition. Further, the enzymatic and mechanical methods used to dissociate cells are known to bias cellular composition and reduce cellular quality. Many gold standard dissociation protocols require extended incubation at 37 °C, where cellular transcription is still active and may introduce gene expression artifacts [
4]. Cold-active protease is a recent alternative to dissociation at 37 °C, which may limit and minimize transcriptional activity and environmental stresses on cells [
4,
5].
Challenges in obtaining fresh clinical specimens and the logistical issues to process specimens immediately have also hindered workflows [
6]. While cancer models for sarcoma, including cell lines, xenografts, and PDXs, are readily accessible for scRNA-seq, the extent that these models represent the original cancer specimen have not yet been adequately evaluated. Single-nucleus RNA-seq (snRNA-seq) of accessible frozen tissue has demonstrated concordance with scRNA-seq [
6‐
10]. SnRNA-seq can remove the limitations for obtaining fresh tissue and immediate processing by enabling access to archival tissue and ease the coordination of tissue acquisition by allowing sequencing of snap-frozen tissue. Furthermore, difficulties with cell fragility or size when considering scRNA-seq can be circumvented using snRNA-seq.
The biases introduced by different methods have been studied between single-cell and single-nucleus and dissociation using cold-active proteases and standard digestion at 37°C [
4]. However, these studies did not include sarcoma specimens, which differ significantly from epithelial tissues and carcinomas in their expression not only by lineage but also integrins and cell–cell adhesions [
11,
12]. To fully realize the potential of scRNA-seq and snRNA-seq in three of the fifty or more unique sarcoma subtypes, we systematically assessed the effect temperature has upon enzymatic dissociation of fresh tissue and, secondarily, studied whether snRNA-seq maintains key transcriptomic profiles determined using scRNA-seq. We focused our analysis on well-controlled PDX specimens of different and rare sarcomas to enable sample accessibility since fresh sarcoma specimens are difficult to acquire. This further enabled our group to explore multiple dissociation methods on the same sample.
Though more than fifty distinct sarcoma subtypes exist, our work takes an essential step to lay out the technical and analytical framework needed for scRNA-seq and snRNA-seq analysis of osteosarcoma, ES, and DSRCT, three highly aggressive sarcoma samples that affect adolescents and young adults. Our work highlights notable method-dependent biases, as well as computational tools used to remove them when rare archival frozen samples are assessed by snRNA-seq.
Methods
Collection of fresh tissue for scRNA-seq
All experiments were conducted per protocols and conditions approved by the University of Texas MD Anderson Cancer Center (MDACC; Houston, TX) Institutional Animal Care and Use Committee (eACUF Protocols #00000712-RN02). Male NOD (SCID)-IL-2Rg
null mice (The Jackson Laboratory; Farmington, CT) were subcutaneously injected with PDX explants (2 mm) to generate xenografts. All mice were maintained under barrier conditions and treated using protocols approved by The University of Texas MD Anderson Cancer Center’s Institutional Animal Care and Use Committee. SA98 (full id: MDA-SA98-TIS02), OS1, and OS31, are PDX lines maintained by the Pediatric Solid Tumors Comprehensive Data Resource Core [
13]. DSRCT and ES PDX lines were generated from the Sarcoma Tissue Bank at MD Anderson Cancer Center and maintained by the Ludwig lab. Once their tumors reached a volume of 150 mm [
3], tumors were explanted and a portion was flash-frozen for snRNA-seq, while the remainder underwent dissociation.
Dissociation workflow from fresh solid tumor samples
Samples were collected and immediately placed into MACS® Tissue Storage Solution (Miltenyi Biotec) and kept on ice during transport. On arrival to the laboratory, samples were minced using a scalpel into fragments < 4 mm under aseptic conditions. Next, samples were evenly split for either warm or cold enzymatic dissociation.
For warm dissociation of ES and DSRCT PDX specimens, the human Tumor Dissociation Kit (Miltenyi Biotec) was used. The dissociation was performed under manufacturer’s protocol using the gentleMACS™ Dissociator (Miltenyi Biotec), a benchtop instrument for the semi-automated dissociation of tissues into single-cell suspensions. The gentleMACS Program sequenced followed the suggestion for ‘Soft’ Tumor type. Briefly, tissue pieces were placed in the gentleMACS™ C Tubes containing the enzyme mix. The gentleMACS™ C Tubes were then placed onto the gentleMACS™ Dissociator, and the program ‘h_tumor_01’ was run, followed by a 30-min incubation at 37 °C with rotation using the MACSmix™ Tube Rotator. Afterward, we placed the gentleMACS™ C Tube onto the gentleMACS™ Dissociator and ran the ‘h_tumor_02’ program. This was followed by a 30-min incubation at 37 °C with rotation. Finally, we placed the gentleMACS™ C Tube onto the gentleMACS™ Dissociator and ran the ‘h_tumor_03’ program. Following completion of the program, 2 × volume of media was added to the samples. This was followed by filtration through a MACS SmartStrainer (70 μm, Miltenyi Biotec) and centrifugation at 300 g for 5 min. Cells were resuspended in 90% FBS and 10% DMSO at a concentration of 1 million cells per mL and placed in a Thermo Scientific™ Mr. Frosty™ Freezing Container in a -80 °C freezer.
For warm dissociation of OS PDX specimens, tissue was minced into < 4 mm pieces with a sterile scalpel or scissors. The tissues were washed several times with Hank’s Balanced Salt Solution (HBSS). HBSS was next aspirated, and dissociation buffer (HBSS, 1 mg/mL collagenase, 3 mM CaCl2, 1 μg/mL DNase) was added to submerge the tissue. The tissue is then incubated at 37 °C for up to 12 h. The cell suspension was then filtered using a 40 μm cell strainer. The filtrate is pelleted using centrifugation at 400 g for 5 min. Cells were resuspended freezing medium and placed in a Thermo Scientific™ Mr. Frosty™ Freezing Container in a -80 °C freezer.
For cold dissociation, the protocol was adapted from Adam et al. [
5]. Cold protease solution was prepared from 5 mM CaCl
2, 10 mg/mL
B. Licheniformis protease, and 125 U/mL DNase I in 1 × PBS. Tissue was minced using a scalpel into fragments under 0.5 mm. Pieces were placed in a MACS C-tube, and 5 mL of ice-cold cold protease solution was added. The samples were incubated for 10 min at 4 °C with rocking. This was followed by placing the pieces in a gentleMACS™ Dissociator (Miltenyi Biotec) and running the m_brain_03 program twice. Afterward, the samples were centrifuged at 300 g for 5 min and resuspended in 3 mL of trypsin–EDTA for 1 min at room temperature. The trypsin–EDTA was then neutralized using ice-cold 10% FBS in 1 × PBS and triturated. This was followed by filtration through a MACS SmartStrainer (70 μm, Miltenyi Biotec) and centrifugation at 300 g for 5 min. Cells were resuspended freezing medium at a concentration of 1 million cells per mL and placed in a Thermo Scientific™ Mr. Frosty™ Freezing Container in a -80 °C freezer. Cryovials were moved to LN2 storage for the long-term.
Thawing cryopreserved cells
The cells were removed from the LN2 or -80 °C freezer, if they were recently cryopreserved and placed into a 37 °C water bath for 3 min. The contents were then transferred to a 15 mL centrifuge tube. 1 mL of complete medium was used to wash the cryovial and added dropwise into the centrifuge tube. Next, 8 mL of complete medium was added dropwise to reduce osmotic shock. Cells were then centrifuged at 300 g for 5 min and resuspended in 1 × PBS supplemented with 0.04% BSA. This was followed by live cell enrichment using FACS. Single-cell suspensions were stained with Calcein AM live cell stain and SYTOX™ Red dead cell stain.
Nuclei isolation workflow
The protocol was adapted from Habib et al. [
9]. We isolated nuclei from fresh-frozen tissue using the Nuclei EZ Prep Kit (Sigma-Aldrich). Fresh-frozen tissue specimens were cut into pieces < 5 mm over dry ice and then placed in 0.5 mL ice-cold EZ lysis buffer. This was followed by homogenizing using a Chemglass Life Sciences Supplier BioVortexer Mixer (Fisher Scientific) attached with a plastic microcentrifuge pestle on ice. Then 1 mL of ice-cold EZ lysis buffer was added, and samples were incubated on ice for 5 min. Debris was filtered out using a pluriStrainer Mini 70 μm into a new tube. This was followed by centrifugation at 500 g for 5 min. Samples were then incubated with 1 mL of ice-cold EZ lysis buffer on ice for 5 min, followed by centrifugation. Afterward, the supernatant was aspirated, and 0.5 mL of Nuclei Wash and Resuspension Buffer (NWRB, 1X PBS supplemented with 1.0% BSA and 0.2U/μl RNase Inhibitor) was carefully added without disrupting the pellet, which was followed by 5 min of incubation. Next, we added 0.5 mL of NWRB and centrifuged at 500 g for 5 min. We repeated the wash and incubation once more, followed by centrifugation. The supernatant was aspirated, and the nuclei were resuspended in NWRB. A portion was visualized with Trypan blue under the microscope to inspect for debris and nuclei integrity.
To sort nuclei, single-nucleus suspensions were stained with 7-AAD in NWRB for 5 min on ice. Then a BD cell sorter was used to sort up to 100,000 7-AAD positive events. Quality control of post-sort nuclei concentration was evaluated under a microscope to ensure adequate count. This was followed by loading nuclei onto a 10x chip.
Library preparation and sequencing
We followed the standard protocol set by 10x Genomics for single-cell/single-nucleus capture. A targeted capture of 5000 single cell or single nucleus were loaded onto each channel of a Chromium single-cell 3’ Chip. The single cells and single nuclei were partitioned using the gel beads within the Chromium Controller. Afterward, we performed cDNA amplification and fragmentation. This was followed by index and adapter attachments. Samples were pooled and sequenced on a NovaSeq 6000 with targeted sequence depth at 100,000 reads/cell or nucleus.
sc/snRNA-seq data preprocessing
We used Cell Ranger mkfastq to generate demultiplexed FASTQ files. Reads were aligned to the human GRCh38 genome, and reads were then quantified as UMIs by Cell Ranger count. For snRNA-seq, reads were mapped with both introns and exons in Cell Ranger 5.0 using the include-introns option for counting intronic reads [
10].
We performed QC and normalization separately for each sarcoma PDX. We followed the guidelines for QC from OSCA and others [
14]. We inspected UMIs, gene counts, and the percentage of mitochondrial genes and identified outliers based on median absolute deviation (MAD). We used a strict value of 2 or more MADs from the median while also using generic cut-offs. Cells that did not meet the criteria were removed from the analysis. Scrublet was used to predict and detect doublets within the data [
15]. While doublets were flagged, there was not a single cluster of doublets, which would be evident as an artifact, so no cells were removed. The number of cells analyzed pre- and post-quality control are listed in Table S
1.
Data normalization, dimensional reduction, and comparisons
Seurat v3 was used for sample normalization, dimensional reduction, scaling, and differential expression analysis [
16]. We used the Wilcoxon test to compare gene expression between protocols. Enrichr was used for pathway enrichment. We set a log2 fold change threshold of log2(1.5) or greater. This will result in genes that are 50% greater than the baseline. The AddModuleScore function in Seurat v3 was used to observe the averaged gene expression of the different gene sets. We used curated gene sets of a warm dissociation signature from O’Flanagan et al. [
4], EWS-FLI1 gene targets [
17], EWS-WT1 gene targets [
18] and osteoblastic and chondroblastic signatures classically associated with the tissue origin of OS (Table S
2). The osteoblastic and chondroblastic signatures were found on Harmonizome (
https://maayanlab.cloud/Harmonizome/). The osteoblastic signature was specifically found in the GeneRIF Biological Term Annotations under ‘Osteoblastic’. The chondroblastic signature was specifically found in the TISSUES Text-mining Tissue Protein Expression Evidence Scores under ‘Chondroblasts’. To find conserved markers between dissociation methods, we used the function FindConservedMarkers in Seurat v3. We performed integration using the integration functions within Seurat v3. The datasets were integrated by dissociation protocol.
Predicting sample type by bias scores
To classify nuclei and cells using the length bias and warm dissociation scores, data sets were randomly split into a training and test set. To prevent data leakage, scaled data was not used. We then calculated the gene set scores separately on the training and test sets. A logistic regression model was fit to the training set on either the warm dissociation or length bias score to predict for cells and nuclei, respectively. We calculated the probabilities and the area under the curve using the pROC v1.18.0 package. This was compared to a random gene signature equal in the number of genes of either length bias or warm dissociation gene sets.
Statistical analyses
Results reported as boxplots display the data distributions (centerline: median, box limits: first and third quartiles, and the whiskers are the highest and lowest values that are no greater and smaller than 1.5 × the interquartile range) as specified in the Figure Legends. Numerical values are reported as mean ± SEM. One-way ANOVA and the Wilcoxon rank-sum test were performed using the R packages ggpubr and stat. A p-value of less than 0.05 was considered statistically significant.
Discussion
The advent of single-cell transcriptomic profiling has revolutionized the ability to decipher gene expression in a way that would have been otherwise unimaginable just a decade ago. Of significant value for cancer research is the opportunity to measure the cellular composition of each tumor, as well as the individual states and phenotypes of individual cancer cells that would have otherwise been obscured with whole-tumor RNA-seq approaches. Accurate interpretation of the results, however, requires a keen appreciation for the technical and computational biases introduced by the chosen methods for tissue handling, cell dissociation, and cell or nuclei preservation.
In this work, we sought to elucidate the inherent biases of different dissociation protocols on the transcriptome of sarcomas, focusing initially on three subtypes that predominantly affect children and young adults. To avoid consuming scarce clinical research specimens, we limited our research scope to early-generation sarcoma PDXs, which maintain close fidelity to the OS, ES, or DSRCT patients from whom they were derived. The choice to use PDXs, rather than human tumors, also stemmed from our ability to tightly regulate how tumors were collected, stored, and processed. Further, the PDX tissues afforded an opportunity to receive fresh and snap frozen tissue simultaneously from the same tumor to avoid temporal biases. In contrast, the human tumors, as they exist in our institution, were collected months or years apart, often at different points in each patient’s treatment course, and typically snap frozen or formalin-fixed and paraffin-embedded without gathering a fresh tissue comparator.
Our work builds upon prior studies in normal tissues and carcinomas that have analyzed the protocol-dependent biases used for scRNA-seq and snRNA-seq [
4‐
6,
14,
16,
22]. Consistent with prior studies, enzymatic digestion at 37 °C invoked a marked stress response, manifest by upregulation of immediate early genes (IEGs), such as FOS, JUN, and MYC [
4]. As expected, this stress response was minimized in the Cold protocol and almost absent in the Nuclei dissociation.
Interestingly, because many sarcoma subtypes are caused by chromosomal translocations that produce pathognomonic fusion proteins, we had the opportunity to determine if protocol-specific technical biases interfered with the downstream target gene signatures induced by EWS-FLI1 or EWS-WT1 in ES and DSRCT, respectively. Though we hypothesized a stress response could affect the expression of EWS-FLI1 target genes, we observed, in fact that snRNA-seq had a significantly greater impact, possibly due to enrichment for genes with longer transcripts. This unexpected bias towards longer transcripts resulted in an EWS-FLI1 target gene set that was overexpressed in samples assessed by snRNA-seq, as opposed to scRNA-seq.
As to why the EWS-FLI1 target genes contain an overabundance of long genes, we explored a few possibilities. The EWS-FLI1 transcription factor is known to bind to GGAA microsatellite repeats of 9 or more [
23,
24]. This may be influenced with transcription length like the increase of polyA region with increasing length. However, many of the microsatellite repeats that enable EWS-FLI1 binding were found within the first intron or the promoter region, but may also be located as far as 1 Mb upstream of the transcription start sites [
24]. A more likely explanation may be found in the broader analysis of long genes. A review of the effect of gene length found positive correlations with intron number, protein size, and SNPs [
25]. Remarkably, gene length is also associated with cancer, heart diseases, and neuronal development [
25,
26]. Given that a portion of EWS-FLI1 targets are known to be neural genes, we can speculate that some of the long genes in the EWS-FLI1 gene set are neural related [
27].
Overall, special care must be considered when comparing data between whole-cells and nucleus. To remove the technical bias introduced by snRNA-seq, we generated a length bias signature using genes with long transcripts. Others have shown that technical biases or batch-to-batch effects can be regressed from snRNA-seq or scRNA-seq data [
22]. Regression of the length bias from the snRNA-seq can produce comparable results to scRNA-seq [
21]. However, comparing whole-cell and nucleus transcriptomes between specimens of different tissue origins or diseases should be interpreted with caution. As noted already, gene length is associated with cancer, heart diseases, and neuronal development and correlated with SNPs [
25,
26]. On the other hand, our data and others have demonstrated that snRNA-seq data is enriched with lncRNA as compared to scRNA-seq [
19]. While this may seem like a confounding variable when trying to compare the two different modalities (i.e., scRNA- and snRNA-seq), it may be beneficial to utilize snRNA-seq if the intent is to enrich and study lncRNA that regulates cell biology.
Computational methods play an important role in normalizing data for known technical biases. After applying Seurat v3 integration, matched PDX specimens with similar cell states clustered together on the UMAP embedding. This is to be expected since Seurat v3 jointly reduces the dimensionality of datasets using a diagonalized CCA to identify shared biological markers and conserved gene expression signatures [
16]. The algorithm then finds mutual nearest neighbors in this low-dimensional representation to recover matching cell states between datasets [
28]. Since feature selection for integration is limited to variable features within each dissociation protocol, subtle differences between protocols (such as the warm dissociation signature) will play a smaller role.
Not performed in this study, but an important concept to highlight when using different dissociation protocols is the effect on cellular composition bias. While scRNA-seq and snRNA-seq adequately represent the original cell populations, others have noted some differences, especially for immune cells [
6,
7]. An unavoidable limitation of our study was the placement of PDXs within immunocompromised murine models that lack a full immune cell repertoire. Thus, we did not have the opportunity to assess whether snRNA-seq underestimates the prevalence of T-cells, B-cells, and NK cells, as has been reported previously in carcinomas [
29]. Others have shown that methanol fixation was superior to cryopreservation with respect to epithelial cell preservation. It remains to be explored whether one preservation method is superior to another in retaining the native cell distribution or sarcomas or normal mesenchymal tissue. As spatial image omics (SIO) gains traction, one could envision using this technology as a ‘gold standard’ to meticulously catalog cancer’s true cell composition without suffering the aforementioned technological artifacts [
30].
While our research clearly cautions scientists of some of the dissociation-specific biases introduced, we recognize that a one-size-fits-all approach may not be optimal for all labs, scenarios, or cancer types. Since scRNA-seq remains a rapidly evolving technology, we anticipate that labs, at least for the foreseeable future, will continue using scRNA-seq from fresh tissues and snRNA-seq from archival tissue that exist already in labs throughout the world. Depending on the scenario, high-quality data can be generated from either methodological approach. We envision our work, as well as many others’ understandings of dissociation-specific biases, will serve as a roadmap to guide scientists in recognizing how their experiments could introduce biases in the expression of genes and pathways observed in their data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.