Appl Clin Inform 2016; 07(02): 260-274
DOI: 10.4338/ACI-2015-09-RA-0125
Research Article
Schattauer GmbH

Integrating Heterogeneous Biomedical Data for Cancer Research: the CARPEM infrastructure

Bastien Rance
1   University Hospital Georges Pompidou, Paris, France
2   INSERM UMR_S 1138, CRC, Paris, France
,
Vincent Canuel
1   University Hospital Georges Pompidou, Paris, France
,
Hector Countouris
1   University Hospital Georges Pompidou, Paris, France
2   INSERM UMR_S 1138, CRC, Paris, France
,
Pierre Laurent-Puig
1   University Hospital Georges Pompidou, Paris, France
3   Université Paris Sorbonne Cité, Inserm UMR-S 1147, Paris, France
,
Anita Burgun
1   University Hospital Georges Pompidou, Paris, France
2   INSERM UMR_S 1138, CRC, Paris, France
› Author Affiliations
Further Information

Publication History

received: 02 October 2015

accepted: 07 February 2016

Publication Date:
16 December 2017 (online)

Summary

Cancer research involves numerous disciplines. The multiplicity of data sources and their heterogeneous nature render the integration and the exploration of the data more and more complex. Translational research platforms are a promising way to assist scientists in these tasks. In this article, we identify a set of scientific and technical principles needed to build a translational research platform compatible with ethical requirements, data protection and data-integration problems. We describe the solution adopted by the CARPEM cancer research program to design and deploy a platform able to integrate retrospective, prospective, and day-to-day care data. We designed a three-layer architecture composed of a data collection layer, a data integration layer and a data access layer. We leverage a set of open-source resources including i2b2 and tranSMART.

Citation: Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating heterogeneous biomedical data for cancer research: the CARPEM infrastructure.

 
  • References

  • 1 Hagemann IS, Cottrell CE, Lockwood CM. Design of targeted, capture-based, next generation sequencing tests for precision cancer therapy. Cancer Genet 2013; 206 (12) 420-431.
  • 2 Chang F, Li MM. Clinical application of amplicon-based next-generation sequencing in cancer. Cancer Genet 2013; 206 (12) 413-419.
  • 3 Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S. et al. Secondary use of clinical data: The Vanderbilt approach. J Biomed Inform. 2014 Feb 14.
  • 4 Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a Research Patient Data Repository. AMIA Annu Symp Proc AMIA Symp AMIA Symp. 2006; 1044.
  • 5 Vayena E, Salathé M, Madoff LC, Brownstein JS. Ethical challenges of big data in public health. PLoS Comput Biol 2015; 11 (02) e1003904.
  • 6 MacKenzie SL, Wyatt MC, Schuff R, Tenenbaum JD, Anderson N. Practices and perspectives on building integrated data repositories: results from a 2010 CTSA survey. J Am Med Informatics Assoc JAMIA 2012; 19 e1 e119-e124.
  • 7 Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Informatics Assoc JAMIA 2010; 17 (02) 124-130.
  • 8 Grande D, Mitra N, Shah A, Wan F, Asch DA. Public preferences about secondary uses of electronic health information. JAMA Intern Med 2013; 173 (19) 1798-1806.
  • 9 Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform. 2014 Mar 7.
  • 10 Rockville (MD): National Cancer Institute (US). Cancer Biomedical Informatics Grid (caBIG®). [cited 2015 Feb 13]. Available from: https://cabig.nci.nih.gov/
  • 11 Shimokawa K, Mogushi K, Shoji S, Hiraishi A, Ido K, Mizushima H. et al. iCOD: an integrated clinical omics database based on the systems-pathology view of disease. BMC Genomics 2010; 11 (Suppl. 04) S19.
  • 12 Tan A, Tripp B, Daley D. BRISK – research-oriented storage kit for biology-related data. Bioinforma Oxf Engl 2011; 27 (17) 2422-2425.
  • 13 Szalma S, Koka V, Khasanova T, Perakslis ED. Effective knowledge management in translational medicine. J Transl Med 2010; 08: 68.
  • 14 Ohno-Machado L, Bafna V, Boxwala AA, Chapman BE, Chapman WW, Chaudhuri K. et al. iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Informatics Assoc JAMIA 2012; 19 (02) 196-201.
  • 15 Evans RS, Lloyd JF, Pierce LA. Clinical use of an enterprise data warehouse. AMIA Annu Symp Proc AMIA Symp AMIA Symp 2012; 2012: 189-198.
  • 16 Chute CG, Beck SA, Fisk TB, Mohr DN. The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data. J Am Med Informatics Assoc JAMIA 2010; 17 (02) 131-135.
  • 17 Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci 2010; 2010: 1-5.
  • 18 Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE – An integrated standards-based translational research informatics platform. AMIA Annu Symp Proc AMIA Symp AMIA Symp 2009; 2009: 391-395.
  • 19 Segagni D, Tibollo V, Dagliati A, Perinati L, Zambelli A, Priori S. et al. The ONCO-I2b2 project: integrating biobank information and clinical data to support translational research in oncology. Stud Health Technol Inform 2011; 169: 887-891.
  • 20 Firnkorn D, Ganzinger M, Muley T, Thomas M, Knaup P. A Generic Data Harmonization Process for Cross-linked Research and Network Interaction. Construction and Application for the Lung Cancer Phenotype Database of the German Center for Lung Research. Methods Inf Med 2015; 54 (05) 455-460.
  • 21 Boyce RD, Ryan PB, Norén GN, Schuemie MJ, Reich C, Duke J. et al. Bridging islands of information to establish an integrated knowledge base of drugs and health outcomes of interest. Drug Saf 2014; 37 (08) 557-567.
  • 22 Otero-Cerdeira L, Rodríguez-Martínez FJ, Gómez-Rodríguez A. Ontology matching: A literature review. Expert Syst Appl 2015; 42 (02) 949-971.
  • 23 http://www.ontologymatching.org/ [Internet] Available from: http://www.ontologymatching.org/
  • 24 Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap) a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42 (02) 377-381.
  • 25 Ganslandt T, Mate S, Helbing K, Sax U, Prokosch HU. Unlocking Data for Clinical Research – The German i2b2 Experience. Appl Clin Informatics 2011; 02 (01) 116-127.
  • 26 McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J. et al. SHRINE: enabling nationally scalable multi-site disease studies. PloS One 2013; 08 (03) e55811.
  • 27 Zapletal E, Rodon N, Grabar N, Degoulet P. Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case. Stud Health Technol Inform 2010; 160 (01) 193-197.
  • 28 Talend Open Studio. [cited 2015 Feb 9]. Available from: http://www.talend.com/products/talend-open- studio
  • 29 FreeMind. [cited 2014 Jul 24]. Available from: http://freemind.sourceforge.net/wiki/index.php/Main_Page
  • 30 FreeMind Wikipedia. [cited 2015 Feb 9]. Available from: http://en.wikipedia.org/wiki/FreeMind
  • 31 CDISC ODM. [cited 2015 Feb 9]. Available from: http://www.cdisc.org/odm
  • 32 St Sauver JL, Grossardt BR, Yawn BP, Melton 3rd LJ, Pankratz JJ, Brue SM. et al. Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int J Epidemiol 2012; 41 (06) 1614-1624.
  • 33 Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. [cited 2014 Jul 31]. Available from: http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guid ance.html
  • 34 Escudié J-B, Jannot A-S, Zapletal E, Cohen S, Malamut G, Burgun A. et al. Reviewing 741 patients records in two hours with FASTVISU. In AMIA 2015. San Francisco: 2015
  • 35 Bretonnel KCohen, Demner-Fushman D. Biomedical Natural Language Processing [Internet]. Amsterdam: John Benjamins Publishing Company; 2014 [cited 2015 Dec 22]. Available from: http://www.jbeplatform.com/content/books/9789027271068
  • 36 Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 02 (05) 401-404.