Background
Radiomics and biological ‘omics in the field of cancer research: state of the art
Radiomics framework
Biological multi-omics integration tools
Name | Description | Data type | Software type/Programming language | Key task | Operating system | Latest update |
---|---|---|---|---|---|---|
Caleydo StratomeX [24] | Tools allowing exploration of relationship among multiple datasets | Multi-omic | Application/Java | Data visualization | Windows Unix/Linux Mac OS | 2018 |
CAS-viewer [25] | Visualization of Cancer Alternatively Splicing (CAS) is a dynamic interface providing an integrated knowledge of alternative mRNA splicing patterns along with multi-cancer omic data from 33 TCGA cancer types | DNA methylation, miRNAs, and SNPs | Web Application/- | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2018 |
The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets from The Cancer Genome Atlas as well as many carefully curated published data sets | Transcriptomic, DNA methylation, CNVs, SNPs and clinical data | Web Application/Python, Java, Perl, R, MatLab | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2018 | |
Genboree Workbench [28] | Genboree is a web-based platform for multi-omic research and data analysis using the latest bioinformatics tools. | Transcriptomic and epigenomic data | Web Application/- | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2014 |
MARIO [29] | MArkov Random fields to Integrate Omics variables (MARIO) is a hierarchical Bayesian model approach for the parallel, integrative analysis of data from several genomic types | Multi-omic and beyond | BUGS software/- | Data integration/analytics | Unix/Linux | 2017 |
mixOmics [30] | mixOmics offers exploration and integration of biological data and allows multivariate statistical approaches to identify similarities between two heterogeneous datasets | Multi-omic and beyond | Bioconductor Package/R | Data integration/analytics | Windows Unix/Linux Mac OS | 2018 |
ModulOmics [31] | ModulOmics identifies cancer driver pathways, or modules, by integrating multiple data types on the basis of DNA and RNA cancer patient data, integrated with PPI networks and known regulatory connections | Multi-omic and beyond | Package/R or Python | Data integration/analytics | Unix/Linux Mac OS | 2018 |
Omics Integrator [32] | Omics Integrator provides integration of proteomic data, gene expression data and/or epigenetic data using a protein–protein interaction network. It is comprised of two modules, Garnet and Forest | Multi-omic and beyond | Package/Python | Data integration/analytics | Unix/Linux | 2018 |
XENA UCSC browser [33] | It offers interactive visualization and exploration of TCGA genomic, phenotypic, and clinical data, as produced by the Cancer Genome Atlas Research Network | Multi-omic | Web client | Data visualization | Windows Unix/Linux Mac OS | 2017 |
Challenges of radiomics in multi-omics framework
Role of radiogenomics in cancer phenotype definition
Radiomics in multi-omics framework: limits, challenges and limitations
Existing integrated databases
Name | Description | Data type | Data access | Data download | Latest update |
---|---|---|---|---|---|
Oncological disease | |||||
The Data Portal represents the NCI’s largest public repository of proteogenomic comprehensive sequence datasets | MS proteomic and phosphoproteomic data and gene expression | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and programmatic | 2018 | |
The Cancer Genome Atlas is a large cancer genomics data collection covering 43 projects with normal-control. Patient outcomes, treatment details, pathology, and expert analyses are also provided when available. Many subjects possess corresponding imaging data on The Cancer Imaging Archive (TCIA) | Gene expression, DNA methylation, germline and somatic mutations, clinical data | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web-based, web client and Programmatic | 2018 | |
ICGC [50] https://dcc.icgc.org/ | The International Cancer Genome Consortium archives large number of datasets with molecular data from more than 20,000 donors including the Pan cancer Analysis of Whole Genomes (PCAWG) study | Germline and somatic mutations, gene expression, DNA methylation | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and programmatic | 2018 |
The Cancer Imaging Archive collects medical cancer images accessible for public download. Data include 78 collections and different image modalities. Many subjects possess corresponding genomics data on the GDC (ex TCGA) | Medical images in DICOM format, clinical data | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and Programmatic | 2018 | |
Neurological and neurodegenerative disorders | |||||
1000 Functional Connectomes Project/INDI International NeuroImaging Data-sharing Initiative [52] and curse of dimensionality [4]. https://www.nitrc.org/projects/fcon_1000/ | It provides the broader imaging community complete access to a large-scale functional imaging dataset such as prospective, retrospective dataset | Imaging and clinical data | NITRC account for some public datasets and some controlled dataset | Amazon Web Services S3 and CyberDuke web client and command line | 2018 |
LONI Database (The Laboratory of Neuroimaging at University of Southern California) [53] https://loni.usc.edu/about_loni | Repository for sharing and long-term preservation of neuroimaging and biomedical research data especially on neurological, neurodegenerative and psychiatric diseases. Some studies ongoing are: ADNI, ENIGMA, GAAIN, PPMI | Clinical, imaging (MRI, PET, MRA, DTI and other imaging modalities), genetic and behavioral data from multisite longitudinal study | Open use data required account controlled access by Image and Data Archive (IDA) request otherwise data use application request | Web-based Image and Data Archive (IDA)* | 2018 |
LRRK2 Cohort consortium (The Michael J. Fox Foundation (MJFF) for Parkinson’s Research) [54] https://https://www.michaeljfox.org/page.html?lrrk2-cohort-consortium | The LRRK2 Cohort Consortium (LCC) comprises three closed studies: the LRRK2 Cross-sectional Study, LRRK2 Longitudinal Study and the 23 and Me Blood Collection Study | Clinical data and biospecimens (blood, urine and cerebrospinal fluid) from PD and control volunteers | Account controlled access data | LONI (IDA) repositorya | 2018 |
National Institute of Neurological Disorders and Stroke/The Michael J. Fox Foundation (MJFF) for Parkinson’s Research BioFIND [55] http://biofind.loni.usc.edu/ | BioFIND is a cross-sectional clinical study designed to discovery new Parkinson’s disease biomarker | Clinical data and biospecimens (blood, urine and cerebrospinal fluid) from PD and control volunteers | Account controlled access data | LONI (IDA) repository* | 2018 |
The National Institute of Mental Health (NIMH)/NIMH Repository and Genomic resources (RGR) [56] https://https://www.nimhgenetics.org/ | The NIMH Repository is an infrastructure for sharing data collected by hundreds of research projects in concerns clinical and genetic analysis of mental health disorders (e.g. schizophrenia, bipolar disorder, depression, Alzheimer’s disease, autism, obsessive–compulsive disorder, etc.). For instance the National Database for Autism Research (NDAR) website is the primary point of entry for Autism Research | Imaging Genetic and Clinical data | NIMH account approval | Web-based and web client Open Database License (ODbL) | 2018 |
The National Institute of Neurological Disorders and Stroke (NINDS) [57] https://https://www.ninds.nih.gov/, https://pdbp.ninds.nih.gov/ | The NINDS is divided into basic, clinical and translational research projects to advance the study of neurological disorders to both academic and industry investigators. One dataset is the PDBP DMR Parkinson’s Disease Biomarkers Program Data Management Resource | Gene expression, clinical data | NINDS account approval | Web-based and web-client Open Database License (ODbL) | 2018 |
The National Institute on Aging (NIA)/AMP-AD Knowledge Portal Accelerating Medicines Partnership-Alzheimer’s Disease [58] https://http://www.synapse.org/#!Synapse:syn2580853/wiki/409840 | The AMP-AD Knowledge Portal is the NIA-designated repository for distribution of data from multiple NIA-supported programs on Alzheimer’s disease | Various types of molecular data from human, cell-based and animal model biosamples | Account controlled access data | Synapse web browser and web client | 2018 |
The National Institute on Aging Genetics of Alzheimer’s Disease (Data Storage Site NIAGADS) [59] https://http://www.niagads.org/ | The NIAGADS provides access to publicly available NIAGADS summary statistics datasets for Alzheimer’s Disease and related neuropathologies | Multi-omic GWAS, whole genome (WGS) and whole exome (WES), expression, RNA Seq, and CHIP Seq analyses | Open to investigators return secondary analysis data to the database | Web-based (NIAGADS genome browser) and web-client Open Database License (ODbL) | 2018 |
Cardiovascular disease | |||||
Cardiac Atlas Project [60] http://http://www.cardiacatlas.org/ | A multi-center cardiac MRI data sets with the most robust manual contours defined by the consensus of 7 independent expert readers from 7 world-class core labs. Datasets related to 6 different studies | Imaging (MRI data) and clinical data | Controlled CAP data access request | Web client | 2018 |
National Heart, Lung, and Blood Institute (BioLINCC) [61] https://biolincc.nhlbi.nih.gov/home/ | NHLBI is the NIH center devoted to research, training, and education of heart, lung, blood and sleep disorders. It provides teaching datasets and public use datasets | Clinical data and sometimes corresponding biospecimens | Open and controlled data on request | Web-based user interface (BioLINCC) | 2018 |
The Cardiovascular Research Grid (CVRG) [62] http://cvrgrid.org/ | The CardioVascular Research Grid (CVRG) project is supported by the National Heart Lung & Blood Institute for creating an infrastructure for sharing cardiovascular data and data analysis tools | Imaging (ex vivo DWI and in vivo heart CT) and clinical data | Open/Controlled | Web-based | 2018 |
The Qatar Cardiovascular Biorepository (QCBio) [63] http://http://www.qcbio.org/ | Cases include patients needing percutaneous intervention for symptomatic coronary heart disease (CHD) or admitted with an acute coronary syndrome (myocardial infarction or unstable angina). Controls are individuals identified from the Hamad Medical Corp. blood bank who have no history of CHD. The goal of QCBio is to archive plasma and DNA of 1000 Qatari patients with coronary heart disease and 1000 controls, who are matched on age, sex and ethnicity | Biospecimens (plasma and DNA) and clinical data | Open to Qatari investigators and controlled access data for others | Web-based and web client | 2018 |
Vascular Diseases Biorepository [63] https://http://www.mayo.edu/research/labs/atherosclerosis-lipid-genomics/research-projects/vascular-diseases-biorepository | Biorepository for common vascular diseases, including: (PAD) Peripheral artery disease, aortic aneurysm, (CAD) carotid artery stenosis, fibromuscular dysplasia. These samples are linked with demographic information, conventional cardiovascular risk factors, and comorbidities ascertained from Mayo Clinic’s electronic health record using EHR-based electronic phenotyping algorithms | Biospecimens (DNA, serum and plasma) and clinical data | Open/controlled | Web-based and web client | 2018 |
Multiple diseases | |||||
DAA [64] http://ageing-map.org/atlas/ | The Digital Aging Data is a portal of age-related changes covering different biological levels. It integrates to create an interactive portal that serves as the first centralised collection of human ageing changes and pathologies | Gene expression and proteomic, psychological and pathological age-related data | Publicly available by DAA account approval | DAA account approval for open | 2017 |
The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans. Over 150 NCI studies are registered in dbGaP | Genome wide studies and clinical data | Open/controlled NCBI account approval | Web client and programmatic | 2018 | |
The European Genome-phenome Archive collects human biomedical data across Europe. It allows authorised users to search sequenced material, patient samples stored in biobanks, patients illnesses, treatments, outcomes | Imaging Gene expression, genome wide studies and clinical data | Controlled data use application request, then EGA account approval | Web client and programmatic | 2018 | |
Gene Expression Omnibus provides multiple level datasets (4348 in total) related to cancer and other diseases | Gene expression, genome wide studies and clinical data | Most data are publicly available, sometimes data use on request | Web client and programmatic | 2018 | |
The Human Ageing Genomic Resources (HAGR) is a collection of databases and tools designed to help researchers study the genetics of human ageing using modern approaches such as functional genomics, network analyses, systems biology and evolutionary analyses | Gene expression and clinical data | Publicly available raw data, processed data on request | Web based download (zip, csv files) | 2018 | |
Japanese Genotype-phenome Archive is a service for archiving and sharing of all types of individual-level genetic and de-identified phenotypic data | Imaging, gene expression, genome wide studies and clinical data | NBDC Human Database approval | Web client and programmatic | 2018 |
Statistical challenges
Dimension reduction
Data integration or data fusion
Causal inference
MAE framework design: a case study
-
Use assays of a summarizedExperiment to store the matrix-like data of each time-point. In this case, multiple time-point data are associated to a single experiment, for example BRCA_T1_weighted_DCE_MRI, with as many assays as time-points (BRCA indicates breast cancer data) (Fig. 3).
-
Use different summarizedExperiment to store different time-point data. In this case two experiments may be, for example, BRCA_T1_weighted_DCE_MRI_TP1 and BRCA_T1_weighted_DCE_MRI_TP2 (TP indicates Time Point) (Fig. 4).×