Often, differential analysis is not sufficient to cover the diversity or dimensionality of the samples in rheumatology. This may be the case when comparing multiple patient samples with disease subtypes [
22] or when collecting multiple samples from different tissues or differentiation states [
13,
36]. In these cases, supervised classification may be used when the labeling (e.g., clinical subtype) of samples is known. Otherwise, clustering provides an unsupervised option that does not rely on prior assumptions in order to address the question. Both samples and genes may be clustered based on their similarity by Pearson’s correlation, although there are alternative distance metrics. Milano et al. [
22] used hierarchical clustering to group SSc patients by intrinsic gene expression. In contrast,
k-means clustering was used to group enhancers based on H3K4me1 intensity across hematopoiesis [
36]. Clustering of genes can also be used to categorize the coregulation of genes or regulatory elements. ImmGen executed a substantial effort of this sort to identify modules of coregulated genes across numerous immune cell types and link them to candidate regulators [
79]. In human disease, we used the correlation of expression across macrophage samples isolated from the joints of RA patients to identify gene modules that may be associated with disease subtype (Mandelin et al, accepted). Similarly, Olsson et al. [
80] use an iterative filtering approach to identify the most coregulated genes for clustering of single-cell data. Alternatively, assignment to meta-genes (or regulatory profiles) can be used to capture the major patterns of regulation across conditions, such as that seen by non-negative matrix factorization of gene expression through microglia development [
40]. As the data sets get larger, dimension reduction methods enable visualization of samples as points in a graph of two-dimensional space. For instance, principal components analysis (PCA) of endogenous tissue-resident macrophage populations compared with those differentiated from bone marrow transplants after irradiation demonstrated the proximity of the chimera cells to their reference populations [
13]. In addition, t-distributed stochastic neighbor embedding (tSNE) is commonly used for visualization of single-cell data, as in the dissection of DC and monocyte subpopulations in blood [
37]. In other cases, it may be useful to compare the samples of interest to related data in order to better demarcate the genes involved. By including additional samples, researchers identified a consistent interferon signature between SSc and SLE blood samples [
81] and distinguished between pulmonary fibrosis and hypertension in SSc through comparison with unrelated interstitial lung disease samples [
23]. Alternatively, a meta-analysis may be used to indirectly compare the relevant genes between data sets; for example, to determine how well a mouse model of lupus reflects human disease [
27] or to characterize which patients in a published cohort share a signature associated with imatinib response [
82]. In practice, a combination of these techniques is used to address different questions and provide a robust analysis of the data. However, to fully capture the underlying GRNs, one must integrate different data types and use more sophisticated approaches.