A toolbox to explore NMR metabolomic data sets using the R environment

doi:10.1016/j.chemolab.2013.04.015

Chemometrics and Intelligent Laboratory Systems

Volume 126, 15 July 2013, Pages 50-59

https://doi.org/10.1016/j.chemolab.2013.04.015 Get rights and content

Abstract

We describe herein the implementation of graphical and statistical tools developed in the R free software environment to explore metabolomic data sets. This toolbox, available upon request from the authors for the latest releases, includes univariate, bivariate and multivariate existing approaches accompanied with various graphical displays and interactive facilities. Concretely, very basic knowledge in R is required: from Excel data files as input to graphical and numerical outputs the user is led through a set of questions he only has to answer. We illustrate the potential of the toolbox on a data set coming from a ¹H NMR metabolomic study of cerebellums from a murine model of Alzheimer's disease. We show the complementarity of various graphical techniques in order to provide information easier to interpret. In particular, a simple correlation study can be highly meaningful, and competitive with a more sophisticated multivariate analysis, when using ad hoc graphical representations depending on the level of interest: global, multiple or single metabolite focus.

Introduction

Molecular biology is now strongly driven by high-throughput facilities resulting in large-scale molecular profiling. Many aspects of a biological system can be studied by exploring the “omics-land” (www.omics.org) with ad hoc technologies: genomics (sequencer), transcriptomics (microarray), proteomics (mass spectrometry, MS), metabolomics (mainly Nuclear Magnetic Resonance (NMR), and MS)…. The application of these techniques results in a huge amount of data generated from a single biological sample. In the “omics” context, metabolomics plays a key role as the metabolome can be viewed as the response of living systems to biological perturbations (genetic modification, physiological and/or pathophysiological stimuli for instance).

NMR-based metabolomic analyses provide spectrum-shaped data that require specific mathematical pre-processing. De-noising, baseline adjustment, peak detection, multiple peaks alignment, binning, etc. are topics that have triggered many methodological developments [1], [2], [3], [4].

Once the pre-processing has been done, exploratory analyses must be performed to face the overwhelming amount of data. Unsupervised (Principal Component Analysis, PCA) and supervised (Projections to Latent Structures-Discriminant Analysis, PLS-DA) methods are commonly used to highlight relevant underlying information [5].

In addition, relevant features can be extracted from the analysis of the correlation matrix between peaks or buckets defined from a spectrum. In particular, analyzing correlated peaks can be very meaningful to identify signals from the same molecule as well as metabolites belonging to the same metabolic pathway [6]. The link between statistical correlation and relationship in the underlying metabolic network cannot be drawn directly [7] but this topic has stimulated many investigations [8], [9] and an in-depth analysis of the correlation matrix could reveal useful information [10], [11].

The purpose of the present work was to develop in the R free software environment [12] statistical and graphical techniques to study NMR data. Concretely, very basic knowledge in R is required. The user just has to source one of the R script files corresponding to the analysis he wants to perform. Once the script is sourced, data are imported from xls files. Then the user can customize his analysis by answering some very simple questions asked by the program like “Do you want different colors? yes/no” or “Insert the order to display boxplots”. Output files including numerical and graphical results are produced and stored in one main directory with sub-directories to facilitate the localization of data. This toolbox includes univariate, bivariate and multivariate statistical analyses. Univariate approaches propose a set of graphical and numerical tools to provide information for each variable of the data set. Bivariate approaches lead to various representations of the correlation matrix to explore highly correlated metabolites. Then, multivariate methods provide a global overview of the data set either in an unsupervised (PCA) or a supervised (PLS-DA) framework. Sparse versions of these methods enable the selection of the most relevant variables to focus on. All these tools are accompanied with interactive facilities to make easier the biological interpretation of the results.

Several other packages or toolboxes already exist with a quite similar purpose: see for instance metaP-Server [13], and the metabonomic [14] and MUMA [15] R packages. Other references can be found in a review recently proposed in [16] and many analyses performed using in-house codes developed in Matlab are sometimes available upon request. In general, considering the statistical analysis of the data, i.e. once the pre-processing has been done, these packages essentially deal with multivariate unsupervised (PCA) or supervised (PLS-DA) analysis or classification approaches (k-nearest neighbors). In our toolbox, we opt for a larger choice of graphical representations rather than methods. We also include bivariate approaches less commonly used except in the STOCSY representation [6].

To illustrate the potential of our toolbox, we present the data of a ¹H NMR metabolomic study comparing the cerebellum metabolism of control and Alzheimer's disease (AD) model mice. In this case study, we chose to concentrate on the interpretation of the correlation matrix with three levels of interest. The first one is global as it takes place before the identification of potential discriminant metabolites. It uses pairwise correlations on the whole set of metabolites in the samples. The second and third levels of interest require a previous selection of either only one variable of interest (therefore called single) or several variables (multiple). We illustrate in this case study the pros and cons of three graphical techniques: STOCSY [6], heatmap [17] and correlation networks [18], [19]. We show that some are more appropriate than others to provide information easier to interpret depending on the level of interest.

A set of R scripts that requires only basic knowledge in R for the user is available from the authors upon request.

Section snippets

Methods

A synthetic view of the methods available in our toolbox is displayed in Fig. 1. They are presented in the following paragraphs according to the variables the methods can deal with simultaneously.

Software

Routines for computation and graphics were written in the free software environment R using various packages including: xlsReadWrite to read and write Excel files [28] (this dependency constrains the user to work under Windows 32 bits), igraph [29] and tcltk (based on Tcl/Tk, www.tcl.tk) to build and plot networks and mixOmics for multivariate analysis [23]. Concretely, very basic knowledge in R is required. The user has only to source one of the R script files corresponding to the analysis he

Sample collection and tissue extraction

Seventeen cerebellums were collected after cervical dislocation of 8 control (Tg⁻) and 9 transgenic (Tg⁺) AppSwe Tg2576 mice. Tissues were extracted according to Beckonert's procedure [30] with methanol/chloroform/water. The upper methanol/water phase was collected. Methanol was eliminated by vacuum centrifugation (Speed-Vac). Borate buffer (550 μL) at pH 10.0 was added to the remaining aqueous phase, which was then lyophilized and stored at − 80 °C. Before NMR analysis, the dried-frozen extract

Reviewer assessments

B. Féraud

Institut de Statistique Biostatistique et Sciences Actuarielles, Université Catholique de Louvain. 20 voie du Roman Pays 1348 Louvain-La-Neuve, Belgique.

I, Baptiste Féraud, PhD researcher (at ISBA, UCL, Belgium) working on 2D-NMR metabonomics under the supervision of Prof. Bernadette Govaerts and Prof. Michel Verleysen declare to have tested this R toolbox in a strictly independent way. I received from Mr. Stéphane Balayssac and Mr. Sébastien Déjean all necessary information and files

Conclusion

The toolbox we developed provides many graphical techniques associated with statistical methods in order to facilitate the interpretation of the results for addressing biological problems. This toolbox can be easily handled by non experienced R users because: i\ inputs are xls files, ii\ questions are asked of the user to design the desired analysis, iii\ interactive facilities are available to customize some graphics and iv\ outputs are stored in various sub-directories. We have illustrated

Acknowledgments

The authors are grateful to Floriane Gaffet, Nadia Saouate, Thibault Duprat and Leïla Ait Ou Ammi who contributed to the development of the software during their internships.

References (47)

X. Li et al.
A wavelet-based data pre-processing analysis approach in mass spectrometry
Computers in Biology and Medicine
(2007)
W. Yu et al.
Detecting and aligning peaks in analyzing MALDI mass spectrometry data
Computational Biology and Chemistry
(2006)
J.L. Izquierdo-García et al.
Descriptive review of current NMR-based metabolomic data analysis packages
Progress in Nuclear Magnetic Resonance Spectroscopy
(2011)
O. Fiehn
Metabolic networks of Cucurbita maxima phloem
Phytochemistry
(2003)
N.P.V. Nielsen et al.
Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping
Journal of Chromatography. A
(1998)
Y. Rubin et al.
The effect of N-acetylaspartate on the intracellular free calcium concentration in Ntera2-neurons
Neuroscience Letters
(1995)
A.C. Paula-Lima et al.
Activation of GABAA receptors by taurine and muscimol blocks the neurotoxicity of beta-amyloid in rat hippocampal and cortical neurons
Neuropharmacology
(2005)
Y. Koga et al.
Brain creatine functions to attenuate stress responses through GABAnergic system in chicks
Neuroscience
(2005)
D.J. Reed
Regulation of reductive processes by glutathione
Biochemical Pharmacology
(1986)
C. Kim et al.
The production of superoxide anion and nitric oxide by cultured murine leukocytes and the accumulation of TNF-alpha in the conditioned media is inhibited by taurine chloramine
Immunopharmacology
(1996)

C.D. Pederzolli et al.

N-acetylaspartic acid promotes oxidative stress in cerebral cortex of rats

International Journal of Developmental Neuroscience

(2007)

F.B. Goldstein

Biosynthesis of N-acetyl-l-aspartic acid

Biochimica et Biophysica Acta

(1959)

A. Antoniadis et al.

Nonparametric pre-processing methods and inference tools for analyzing time-of-flight mass spectrometry data

Current Analytical Chemistry

(2007)

A.C. Sauve et al.

Normalization, baseline correction and alignment of high throughput mass spectrometry data

R. Rousseau et al.

Comparison of some chemometric tools for metabonomics biomarker identification

Chemometrics and Intelligent Laboratory Systems

(2007)

O. Cloarec et al.

Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets

Analytical Chemistry

(2005)

D. Camacho et al.

The origin of correlations in metabolomics data

Metabolomics

(2005)

R. Steuer

On the analysis and interpretation of correlations in metabolomic data

Briefings in Bioinformatics

(2006)

M. Müller-Linow et al.

Consistency analysis of metabolic correlation networks

BMC Systems Biology

(2007)

E. Allen et al.

Correlation Network Analysis reveals a sequential reorganization of metabolic and transcriptional states during germination and gene-metabolite relationships in developing seedlings of Arabidopsis

BMC Systems Biology

(2010)

S. Sato et al.

Time-resolved metabolomics reveals metabolic modulation in rice foliage

BMC Systems Biology

(2008)

R Development Core Team

R Foundation for Statistical Computing, Vienna, Austria

(2012)

G. Kastenmüller et al.

metaP-Server: a web-based metabolomics data analysis tool

Journal of Biomedicine and Biotechnology

(2011)

Cited by (10)

Metabolomics in Alzheimer's disease: The need of complementary analytical platforms for the identification of biomarkers to unravel the underlying pathology
2017, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences
Citation Excerpt :
Dedeoglu et al. compared in vivo MRS and in vitro NMR to investigate the differences in the neurochemical profile between APPTg2576 transgenic mice and wild type (WT) littermates, thus demonstrating that a wider range of compounds can be measured by using high resolution spectroscopy (Fig. 1) [29]. Metabolomic analysis of cerebellum samples from this animal model also showed significant alterations in levels of important neurochemicals, such as N-acetyl-aspartate, γ-aminobutyric acid or glutamate, among others [30]. On the other hand, Forster et al. examined longitudinal metabolic changes in whole brain extracts from TASTPM transgenic mice aged between 3 and 18 months, and surprisingly did not find significant differences in N-acetyl-aspartate levels [31].
Alzheimer’s disease is a complex neurodegenerative disorder characterized by a multi-factorial etiology, not completely understood to date. In this context, the application of metabolomics is emerging in the last years because of its potential to monitor molecular alterations associated with disease pathogenesis and progression, as well as to discover candidate diagnostic biomarkers. However, the huge heterogeneity and dynamism of the human metabolome makes impossible the simultaneous determination of the entire set of metabolites from complex biological samples. Thus, the most common strategy to get a comprehensive overview of the organism’s phenotypic expression is the combined use of complementary metabolomic platforms. In this review, we summarize the advantages and limitations of the most important analytical techniques usually employed in metabolomics, including nuclear magnetic resonance, direct infusion mass spectrometry and hyphenated approaches based on the coupling of orthogonal separation mechanisms (i.e. liquid chromatography, gas chromatography, capillary electrophoresis) with mass spectrometry. Moreover, the suitability of metabolomics to unravel the complex pathology underlying to Alzheimer’s disease is also presented.
Fungi isolated from Madagascar shrimps - investigation of the Aspergillus niger metabolism by combined LC-MS and NMR metabolomics studies
2017, Aquaculture
Multi-element, multi-compound isotope profiling as a means to distinguish the geographical and varietal origin of fermented cocoa (Theobroma cacao L.) beans
2015, Food Chemistry
Citation Excerpt :
Analytical performance was checked by inserting laboratory standards of GA (13C = −27.30‰, (−0.45 as correction factor); δ15N = 4.85‰, (−0.14 as correction factor)) between samples to check for stability and to allow drift correction to be made when necessary. First, a package developed in the R environment was used for a univariate approach based on analysis of variance for each variable, and discriminating variables were uncovered through a supervised univariate approach with t-tests and boxplots (Balayssac, Déjean, Lalande, Gilard, & Malat-Martino, 2013). After mean-centering and auto-scaling, the data matrix was subjected to several multivariate statistical analyses using SIMCA-P+ 12.0 software (Umetrics, Umeå, Sweden).
Multi-element stable isotope ratios have been assessed as a means to distinguish between fermented cocoa beans from different geographical and varietal origins. Isotope ratios and percentage composition for C and N were measured in different tissues (cotyledons, shells) and extracts (pure theobromine, defatted cocoa solids, protein, lipids) obtained from fermented cocoa bean samples. Sixty-one samples from 24 different geographical origins covering all four continental areas producing cocoa were analyzed. Treatment of the data with unsupervised (Principal Component Analysis) and supervised (Partial Least Squares Discriminant Analysis) multiparametric statistical methods allowed the cocoa beans from different origins to be distinguished. The most discriminant variables identified as responsible for geographical and varietal differences were the δ¹⁵N and δ¹³C values of cocoa beans and some extracts and tissues. It can be shown that the isotope ratios are correlated with the altitude and precipitation conditions found in the different cocoa-growing regions.
Region-specific metabolic alterations in the brain of the APP/PS1 transgenic mice of Alzheimer's disease
2014, Biochimica et Biophysica Acta - Molecular Basis of Disease
Citation Excerpt :
Furthermore, the role of a dysregulated endocannabinoid–eicosanoid network in the pathogenesis of AD has been recently demonstrated in the APP/PS1 mice with inactivated monoacylglycerol lipase [52]. On the other hand, other studies focused on individual brain areas including the hippocampus [42,65,72], cortex [11] and cerebellum [1,43], because metabolic perturbations induced by AD-type disorders could be region-specific in the brain. In this sense, the characterization of regional metabolomic perturbations may be of greater interest in order to investigate the impact of disease on different brain regions and determine the most affected ones in AD mice.
Alzheimer's disease (AD) is the most common neurodegenerative disorder worldwide, but its etiology is still not completely understood. The identification of underlying pathological mechanisms is becoming increasingly important for the discovery of biomarkers and therapies, for which metabolomics presents a great potential. In this work, we studied metabolic alterations in different brain regions of the APP/PS1 mice by using a high-throughput metabolomic approach based on the combination of gas chromatography–mass spectrometry and ultra-high performance liquid chromatography–mass spectrometry. Multivariate statistics showed that metabolomic perturbations are widespread, affecting mainly the hippocampus and the cortex, but are also present in regions not primarily associated with AD such as the striatum, cerebellum and olfactory bulbs. Multiple metabolic pathways could be linked to the development of AD-type disorders in this mouse model, including abnormal purine metabolism, bioenergetic failures, dyshomeostasis of amino acids and disturbances in membrane lipids, among others. Interestingly, region-specific alterations were observed for some of the potential markers identified, associated with abnormal fatty acid composition of phospholipids and sphingomyelins, or differential regulation of neurotransmitter amino acids (e.g. glutamate, glycine, serine, N-acetyl-aspartate), not previously described to our knowledge. Therefore, these findings could provide a new insight into brain pathology in Alzheimer's disease.
Characterization of heroin samples by <sup>1</sup>H NMR and 2D DOSY <sup>1</sup>H NMR
2014, Forensic Science International
Citation Excerpt :
A total of 73 variables were considered for statistical correlation analysis. A STOCSY-like representation from bucketed data using Pearson's correlation coefficients was employed to aid in the identification of signals [18,19]. 1D 1H NMR data were processed using Bruker TopSpin software 2.1 with one level of zero-filling and Fourier transformation after multiplying FIDs by an exponential line-broadening function of 0.5 Hz.
Twenty-four samples of heroin from different illicit drug seizures were analyzed using proton Nuclear Magnetic Resonance (¹H NMR) and two-dimensional diffusion-ordered spectroscopy (2D DOSY) ¹H NMR. A careful assignment and quantification of ¹H signals enabled a comprehensive characterization of the substances present in the samples investigated: heroin, its main related impurities (6-acetylmorphine, acetylcodeine, morphine, noscapine and papaverine) and cutting agents (caffeine and acetaminophen in nearly all samples as well as lactose, lidocaine, mannitol, piracetam in one sample only), and hence to establish their spectral signatures. The good agreement between the amounts of heroin, noscapine, caffeine and acetaminophen determined by ¹H NMR and gas chromatography, the reference method in forensic laboratories, demonstrates the validity of the ¹H NMR technique. In this paper, 2D DOSY ¹H NMR offers a new approach for a whole characterization of the various components of these complex mixtures.
Altered brain metabolome is associated with memory impairment in the rTG4510 mouse model of tauopathy
2020, Metabolites

View all citing articles on Scopus

¹: These two authors equally contributed to this work.

View full text

A toolbox to explore NMR metabolomic data sets using the R environment

Abstract

Introduction

Section snippets

Methods

Software

Sample collection and tissue extraction

Reviewer assessments

Conclusion

Acknowledgments

Computers in Biology and Medicine

Computational Biology and Chemistry

Progress in Nuclear Magnetic Resonance Spectroscopy

Phytochemistry

Journal of Chromatography. A

Neuroscience Letters

Neuropharmacology

Neuroscience

Biochemical Pharmacology

Immunopharmacology

International Journal of Developmental Neuroscience

Biochimica et Biophysica Acta

Nonparametric pre-processing methods and inference tools for analyzing time-of-flight mass spectrometry data

Current Analytical Chemistry

Normalization, baseline correction and alignment of high throughput mass spectrometry data

Comparison of some chemometric tools for metabonomics biomarker identification

Chemometrics and Intelligent Laboratory Systems

Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets

Analytical Chemistry

The origin of correlations in metabolomics data

Metabolomics

On the analysis and interpretation of correlations in metabolomic data

Briefings in Bioinformatics

Consistency analysis of metabolic correlation networks

BMC Systems Biology

Correlation Network Analysis reveals a sequential reorganization of metabolic and transcriptional states during germination and gene-metabolite relationships in developing seedlings of Arabidopsis

BMC Systems Biology

Time-resolved metabolomics reveals metabolic modulation in rice foliage

BMC Systems Biology

R Foundation for Statistical Computing, Vienna, Austria

metaP-Server: a web-based metabolomics data analysis tool

Journal of Biomedicine and Biotechnology