Background
Generalized overview of a dimensionality reduction-based multi-modal data fusion strategy
Reference | Data | Method |
---|---|---|
Moutselos et al. [65] | Skin images | Combining features into a confusion matrix |
Gene expression | ||
Golugula et al. [6] | Histopathology | Correlating features via CCA, combining CCA-based confusion matrices |
Proteomics | ||
Dai et al. [20] | sMRI | Construct classifiers from features, weighted combination of classifier decisions |
fMRI | ||
Gode et al. [66] | mRNA | Compute LDR/classifier decisions, unweighted combination of LDR- or classifier-based confusion matrices |
miRNA | ||
Raza et al. [22] | Gene-expression | Compute classifier decisions, unweighted combination of classifier decisions |
FNAC | ||
Sui et al. [67] | DTI | Correlate features via CCA, unweighted combination of CCA-based confusion matrices |
fMRI | ||
Wolz et al. [7] | T1-w MRI | Compute LDR, weighted combination of LDR-based confusion matrices |
ApoE genotype, A β
1−42
| ||
Wang et al. [62] | T1-w MRI, FDG-PET | Feature selection, weighted concatenation of selected features |
Gene-expression | ||
Lanckriet et al. [9] | Protein expression | Compute kernel representations, weighted combination of kernels |
Gene-expression | ||
Yu et al. [68] | Text ontologies | Compute kernel representations, fuse kernel-based confusion matrices |
Gene-expression | ||
Higgs et al. [54] | CT | Compute LDR, fuse LDR maintaining manifold structure |
Gene-expression | ||
Lee et al. [4] | Gene-expression | Compute LDR, unweighted concatenation of LDR |
Histopathology | ||
Viswanath et al. [5] | T2-w | Compute LDR, combine LDR-based confusion matrices using label information |
ADC, DCE | ||
Tiwari et al [8] | T2-w MRI | Compute kernel representations, weighted LDR-based combination of kernels using label information |
MRS |
Methods
Description of methods utilized for multi-modal data fusion
Notation
Knowledge representation
Generation of multiple representations (resampling)
Knowledge fusion
Weighted and unweighted data fusion
Experimental design
Dataset | # Studies | Modalities | Clinical problem addressed |
---|---|---|---|
S
1
| 77 | T1-w MRI, protein-expression | Differentiating Alzheimer’s patients from normal subjects |
S
2
| 40 | Histology, protein expression profiles | Predicting biochemical recurrence in prostate cancer |
S
3
| 36 (3000 voxels) | T2-w MRI, MR spectroscopy | Detecting prostate cancer on a per-voxel basis |
Multimodal data fusion strategies compared
Strategy | Resampling | Representation | Weighting | Fusion |
---|---|---|---|---|
DFS-DD | - | Decision | Unweighted | Direct fusion (AND operation) |
DFS-EC | Feature perturbation | PCA | Unweighted | Co-association matrix fusion |
DFS-KC | - | Kernels | Weighted, semi-supervised | Co-association matrix fusion |
DFS-ES | - | LLE | Unweighted | Structural fusion |
Dataset S1: MRI, proteomics for Alzheimer’s disease identification
Dataset S2: Histology, proteomics for prostate cancer prognosis
Dataset S3: Multiparametric MRI for prostate cancer detection
Evaluation measures
Results
Strategy | Dataset S
1
| Dataset S
2
| Dataset S
3
|
---|---|---|---|
Non-imaging | 0.774 ± 0.043 | 0.511 ± 0.078 | 0.771 ± 0.009 |
Imaging | 0.885 ± 0.034 | 0.503 ± 0.076 | 0.564 ± 0.036 |
DFS-DD |
0.905 ± 0.035
| 0.496 ± 0.079 | 0.752 ± 0.026 |
DFS-EC | 0.675 ± 0.065a
| 0.465 ± 0.111 | 0.720 ± 0.020 |
DFS-KC | 0.888 ± 0.040 |
0.808 ± 0.067
b
|
0.857 ± 0.009
b
|
DFS-ES | 0.789 ± 0.035 | 0.531 ± 0.086 | 0.748 ± 0.013 |
Experiment 1: Integrating MRI and proteomics to identify patients with Alzheimer’s disease
Experiment 2: Integrating histopathology and proteomics to predict prostate cancer recurrence after surgery
Experiment 3: Integrating MRS and MRI to identify voxel-wise regions of prostate cancer recurrence after surgery in vivo
Discussion
-
In terms of the knowledge representation module, a kernel-based method (DFS-KC) demonstrated the best classifier performance consistently across all 3 applications, implying that kernels may offer distinct advantages for multimodal data representation. This performance may have been further enhanced by the fact that DFS-KC utilized differential weighting for individual data modalities based on their contributions, in addition to using semi-supervised learning. However, we must note that this method was also amongst the most computationally expensive in terms of memory usage.
-
For the knowledge fusion module, co-association matrix fusion yielded consistently high classifier performance; albeit when combined with kernels (as done by DFS-KC) rather than when combined with embeddings (reflected by the poor performance of DFS-EC). However, further exploration of how each representation strategy interplays with each fusion strategy is required to understand this aspect better, which was out of the scope of the current work.
-
One of our multimodal data fusion methods (DFS-EC) demonstrated consistently poor classifier performance across all 3 applications. While this method has demonstrated significant success in previous work [5], its poor performance in the current work could be attributed to (a) inability to handle sparse feature spaces (as seen in Dataset S 3), and (b) use of a linear embedding method (PCA) which is likely unable to handle representation of potentially non-linear biomedical data [30].
-
Our experimental datasets demonstrated wide variability in terms of the classifier performance associated with the individual data modalities, which had significant bearing on the performance of different multimodal data fusion methods. For example, in dataset S 1 where both modalities showed a relatively high classifier AUC individually, a simple combination of decision representations offered the highest performance amongst the integrated representations (DFS-DD). However, in dataset S 2 where both modalities showed relatively poor discriminability individually, most of the data fusion methods failed to create accurate, discriminatory representations.
-
Dataset S 2 was an example of a Big-P-Small-N (number of features P>> number of samples N) problem where the large noisy feature space ensured that most representation strategies failed to yield an accurate classifier. In additional experiments involving feature selection (not shown) to assuage this mismatch, we found that kernel-based approaches performed better in the absence of feature selection (i.e. when provided the entire feature space). By contrast, with feature selection applied, LDR-based approaches improved in performance, likely because they could better identify a discriminatory projection for the data.
-
Dataset S 3 was an example of a Small-P-Big-N (number of samples N>> number of features P) problem, wherein very sparse feature space caused embedding-based methods (DFS-EC, DFS-ES) to throw a number of errors during our experiments. The issue of very few number of input dimensions was further exacerbated by having a large number of samples causing these methods to become more computationally expensive than when P>>N.
-
While one would expect multimodal data fusion strategies to always perform better than at least the weaker modality under consideration, our experimental results suggest otherwise. When suboptimal representation or fusion strategies are utilized e.g. using PCA within DFS-EC for representation, or simple structural fusion within DFS-EC, such data fusion methods tend to perform comparably or worse than the individual modalities. Conversely, when a method leverages different modules in a complementary manner (e.g. kernels, weighting, and semi-supervised learning in DFS-KC), we can construct a truly robust, accurate multimodal data fusion predictor.
T1w MRI | # | Description |
FreeSurfer ROIs extracted | 327 | Subcortical, cortical volumes, surface area, thickness average and standard deviation for Pallidum, Paracentral, Parahippocampal, Opercularis, Pars Orbitalis, Triangularis, Pericalcarine, Cingulate, Frontal, Pareital, Temporal, Caudate, Insula, Occipital etc. |
Proteomic data | Description | |
Plasma proteomics | 146 | Microglobulin, Macroglobulin, Apolipoproteins, Epidermal growth factors, Immunoglobulins, Interleukins, Insulin, Monocyte Chemotactic Proteins, Macrophage Inflammatory Proteins, Matrix Metalloproteinases etc. |
Conclusions
Appendix
Morphological | # | Description |
Gland Morphology | 100 | Area Ratio, distance Ratio, Standard Deviation of Distance, Variance of Distance, Distance Ratio,Perimeter Ratio, Smoothness, Invariant Moment 1–7, Fractal Dimension, Fourier Descriptor 1–10 (Mean, Std. Dev, Median, Min/ Max of each) |
Architectural | Description | |
Voronoi Diagram | 12 | Polygon area, perimeter, chord length: mean, std. dev., min/max ratio, disorder |
Delaunay Triangulation | 8 | Triangle side length, area: mean, std. dev., min/max ratio, disorder |
Minimum Spanning Tree | 4 | Edge length: mean, std. dev., min/max ratio, disorder |
Co-occurring Gland Tensors | 39 | Entropy, energy: mean, std. dev., range |
Gland Subgraphs | 26 | Eccentricity, Clustering coefficient C, D, and E, largest connected component: mean, std. dev. |
Proteomic | Description | |
Proteins Identified | 650 | Protein-disulfide isomerase A6, T-complex protein subunit delta, ADP-ribosylation factor 1/3, Protein di-sulfide-isomerase, Ras GTPase-activating-like protein IQGAP2, T-complex protein subunit beta, Ras-related protein Rab-5C, ATP-dependent RNA helicase DX3X/DDX3Y, 40S ribosomal protein S17, Serine/arginine-rich splicing factor 7, Tubulin alpha-1A chain/alpha-3C/D chain/ alpha-3E chain, Laminin subunit alpha-4, Collagen alpha-1 (VIII) chain, Tubulin-tyrosine ligase-like protein 12 |
Texture features | # | Description |
Kirsh Filters | 4 | X-direction, Y-direction, XY-diagonal, YX-diagonal |
Sobel Filters | 4 | X-direction, Y-direction, XY-diagonal, YX-diagonal |
Directional Filters | 5 | x-Gradient, y-Gradient, Magnitude of Gradient, 2 Diagonal Gradients |
First order Gray Level | 8 | Mean, Median, Standard deviation, Range for window size = 3×3,5×5 |
Haralick features | 13 | Contrast Energy, Contrast Inverse Moment, Contrast Average, Contrast Variance, Contrast Entropy, Intensity Average for window size = 3×3, Intensity Variance, Intensity Entropy, Entropy, Energy, Correlation, info. Measure of Correlation 1, Info. Measure of Correlation 2 |
Gabor filters | 24 | Filterbank constructed for different combinations of scale and orientation |
Metabolic features | Description | |
Metabolites Identified | 6 | Area under peaks for choline (A
ch
), creatine (A
cr
), citrate (A
cit
), and ratios (A
ch
/A
cr
,A
ch
/A
cit
,A
ch+cr
/A
cit
) |