1 Introduction

In the last decade exciting science and innovation in Life Sciences are driven from a systems view. Systems biology has found a scientific focus due to advances in the high throughput enabling technologies, measuring quickly at different biological levels such as transcripts, proteins and metabolites (Hood 2003; van der Greef et al. 2007; Wolkenhauer 2002). The progress of the systems based approach is in a large part depending on developments in biostatistics and bioinformatics to integrate high-dimensional data to obtain a systems view. Many challenges remain in this area some of which will be discussed.

A systems view recognizes that at different levels of a complex system new properties are emerging and as a consequence we need to study a system as a whole and not by focusing on elements only. In addition to this, the multi-level, interconnected, non-linear and dynamic properties become the focus from which the self-organization of a system can be understood. The dynamic characteristics both from a measurement and biostatistics point of view are becoming mandatory to reveal new system information, e.g., to understand homeostasis and resilience after perturbation. Understanding health and disease based on concepts as resilience can be understood from a biological view, but the ability to measure and to analyze the complex longitudinal high-dimensional data is mandatory to make progress in research.

The concept of dynamic diseases was coined by Glass and Mackey (1988) and the importance of biorhythms in relation to health and disease as well as intervention are surfacing as topics in multi-factorial disease etiology. From a measurement point of view, metabolomics is an attractive tool as it reveals information close to the phenotypic level and it allows for large scale measurements in a robust way.

To set the stage, we first describe which type of dynamic metabolomic data will be the topic of this paper. We will not discuss approaches in metabolic flux analysis, since that topic is covered elsewhere (Kholodenko 2004; Stephanopoulos et al. 1998) and occurs at a different time scale involving other mechanisms than the data we want to discuss. Also metabolic network inference methods through dynamic data will not be discussed because this is a topic in itself (Samoilov et al. (2001); van Berlo et al. (2003)). The dynamic data we are going to discuss can originate from different sources, depending on the relevant biological question and the study design. In human metabolomics, dynamic data from a challenge test may be available: a person receives a challenge (e.g., a test meal) and blood is sampled afterwards (Bijlsma et al. 2006) (time scale of minutes/hours). This points to the topic of personalized food and medicine where each person is subjected to a challenge test that serves as a blueprint of the ‘metabolic status’ of that person (van der Greef et al. 2006). In animal studies, also serial tissue sampling (besides bodily fluids) might be available (Kleemann et al. 2007) (time scale of hours/days). Another example in animal studies is toxicology, where a toxic compound is administered at different dosage levels and samples of urine, blood and liver are collected in time (Heijne et al. 2005; Keun et al. 2004) (time scale of hours/days). A completely different type of dynamics is in microbial metabolomics, where time-resolved measurements are done on the intracellular metabolome of an organism in a fermentation process (Rubingh et al. 2009) (time scale of hours).

In this paper, we will focus on metabolomics data (mostly) obtained through instruments such as NMR, LC-MS and GC-MS. We will discuss methods to understand underlying dynamic behavior of biological systems based on analyzing metabolomics data. We will give an overview on approaches taken in other fields such as chemical engineering, systems theory and psychometrics. Since these fields are very large, only those approaches are discussed which are of potential use in metabolomics. The existing approaches in transcriptomics will also be discusssed in this framework. We will discuss the methods in the context of multivariate and high-dimensional data. Multivariate means that for a single sample, multiple metabolites are measured and in high-dimensional data, many more metabolites are measured than samples. For metabolomics, hardly any methods incorporating dynamics exist sofar, to our best knowledge. We will specify this statement later.

Three real-life data sets will be used as examples throughout this paper to illustrate the working of some of the methods. A brief introduction to the specifics of the examples; building blocks of dynamic metabolomic data analysis and a definition of ‘dynamics methods’ are also provided to set the stage. Many of the methods reviewed and proposed are not yet used in metabolomics, hence, the area of dynamic metabolomics data analysis is still very open for research.

2 Short description of the examples

Hormones are signaling and regulatory molecules. In humans many hormones exhibit a circadian rhythm. There are indications that the dynamic behavior of hormones are related to disease states and also change upon treatment (Kok et al. 2004, 2006). Hormones are secreted in pulses, delivered to the bloodstream and subsequently degraded. In the example, women were hospitalized for a study and during a 24 h period, blood samples were taken every 10 min (n = 145 per individual). These blood samples were analyzed for certain hormones, among them cortisol and luteinizing hormone (LH). These measured hormone levels are shown in Fig. 1 for one female. The data show clearly pulsatile patterns.

Fig. 1
figure 1

Measured cortisol (a) and luteinizing hormone (b) levels in a woman during 1 day showing pulsatility and biorhythms

The second example concerns NMR spectra of urine of rhesus monkeys (Macaca mulatta) measured in time. Samples are taken of ten monkeys (five male and five female) at n = 29 days unevenly spread over a time course of 57 days. This is a normality study: the monkeys were kept in a non-stressed environment to study their natural biorhythms. Details of the study were published elsewhere (Jansen et al. 2004).

The third example is on nutrikinetics: the kinetic fate of nutritional compounds (van Velzen et al. 2009). In a randomized, placebo controlled double blind cross-over study, 20 healthy volunteers were subjected to a tea treatment. NMR measurements were performed of their urine which was collected over time (n varies between 9 and 14). This allowed for estimating kinetic parameters for metabolites in their urine.

3 Short description of methods

The potential viable methods are categorized in six groups of methods where each group shares a similar underlying idea. There is a loose ordering of the categories by the amount of a priori knowledge needed to perform the data analysis or, stated differently, by the strictness of the imposed assumptions.

The first group consists of methods that are based on fundamental models. This means that a priori knowledge should be available about the functional form of the dynamics for the metabolites. The second group consists of methods based on predefined basis functions; such as wavelets. Hence, some form of the dynamics must be reasonable given the underlying biology. The third group discusses dimension reduction methods, such as principal component analysis. These methods work if there is an underlying low dimensionality in the metabolomics data. Group four discusses multivariate time series models, which can be used if certain stationarity assumptions hold. Group five deals with analysis-of-variance (ANOVA) type models and finally, the sixth group discusses methods imposing smoothness, using the intrinsic consecutiveness of time-resolved measurements.

When selecting a specific method for modeling dynamic metabolomics data, it is useful to think in terms of the ‘data generating process’. A possible data generating mechanism is when the biological system under study is perturbed thereby inducing changes in unobservable biological processes. These in turn then affect the manifest variables, which are the measured metabolites. Variations on this theme are possible; this particular example is the idea behind data generating processes for which dimension reduction methods are suitable. Postulating a specific data generating process presupposes knowledge about the biological system under study and the way the experimental design has perturbed that system. Ideally, the knowledge about the form of the dynamic system behavior as made explicit in the data generation process is matched to the requirements for the data analysis method. We will therefore make assumptions of the methods as explicit as possible.

4 Building blocks

4.1 Fundamental models

Fundamental models of biological processes are usually put in the framework of differential or difference equations. These will therefore be discusses in some detail. Good introductory textbooks exists for both linear (Fortmann and Hitz 1977) and nonlinear dynamics (Strogatz 1994). The general form of a first-order differential equation for x(t) is

$$ \dot{x} = f(x(t),t;\alpha), $$
(1)

where f(x(t), t;α) is a (possible nonlinear) function of x(t) and t and only the first order derivative of x(t) is present; the function f contains (possibly unknown) parameters α. An example of a simple DE is

$$ \dot{x} = \alpha x, $$
(2)

which is a first order (only first order derivatives), linear (only linear terms in x), autonomous (time appears only implicitly through x(t)) and homogenous (no forcing functions or inputs) differential equation and the solution of this equation is x = a e αt for the initial condition x(0) = a. Depending on the value of α, Eq. 1 has a stable solution (a decreasing e-power for α < 0) or an unstable solution (an increasing e-power for α > 0); for \({\alpha = 0,}\;\dot{x} = 0,\) and x(t) = x(0) for all t. If x(0) = 0, the derivative \(\dot{x} = 0\) and the solution is then x(t) = 0 for all t and this solution is indicated with x* (a fixed-point of Eq. 1). These solutions show the typical behavior of linear first order differentail equations: constant, blow-up or decaying towards zero. Hence, no oscillations can take place.

A second order linear differential equation may look like

$$ \ddot{x} = \alpha_1 x + \alpha_2 \dot{x}, $$
(3)

which can be rewritten as

$$ \begin{aligned} \dot{x}& = y\\ \dot{y}& = \alpha_1 x + \alpha_2 y, \end{aligned} $$
(4)

or, using obvious matrix notation

$$ \left(\begin{array}{l} \dot{x}\\ \dot{y}\\ \end{array}\right) = \left(\begin{array}{ll} 0 & 1 \\ \alpha_1 & \alpha_2 \\ \end{array}\right) \left(\begin{array}{l} \ x\\ \ y \\ \end{array}\right) = {\bf A} \left(\begin{array}{l} x \\ y\\ \end{array}\right). $$
(5)

Hence, higher-order linear differential equations can always be transformed to first-order systems. The solutions of a second-order differential equation are richer in behavior, e.g, oscillations are possible (Strogatz 1994). These solutions are characterized by the eigenvectors and eigenvalues of the system matrix A: there are oscillations if the imaginary parts of the eigenvalues differ from zero.

Another way of expressing dynamics is in the form of difference equations using discrete time points. Such an equation can look like

$$ x_{t+1} = g(x_t,t;\alpha), $$
(6)

or, in a simple example

$$ x_{t+1} = \alpha x_t. $$
(7)

There are relationships between Eqs. 1 and 6 (Fortmann and Hitz 1977) but that is beyond the scope of this paper. In the sequel, continuous time functions are denoted as x(t) and discrete time functions as x t .

Examples of using fundamental models and differential equations will be given in Sect. 7.1 using both the hormones and nutrikinetics data.

4.2 Time series models

A time series of a single metabolite can be approximated with time series models such as an autoregressive process of order 1 (AR(1))

$$ x_{t+1} = \theta x_t + \epsilon_{t+1}, $$
(8)

where θ is the parameter to estimate and ε t is a so called random shock. The parameter θ has to obey a regularity condition (|θ| < 1 for the AR(1) process) to be meaningful. Alternatively, moving average (MA) processes can be used

$$ x_{t+1} = \epsilon_{t+1}-\phi \epsilon_{t}, $$
(9)

or combinations of both (ARMA processes),

$$ x_{t+1} = \theta x_t + \epsilon_{t+1}-\phi \epsilon_{t}, $$
(10)

which is an ARMA (1,1) model. These are also available for higher orders and in nonlinear ways. It is important to realize that Eqs. 8 and 9 make assumptions about the time series x t . They both assume stationarity: the mean and standard deviation do not depend on t and the autocovariance only depends on τ (the lag time is defined as a time interval τ = t 2 − t 1, where t 1 and t 2 are two time points of the process). The autocovariance of an AR(1) model decays exponentially as a function of the lag time and for an MA(1) model the autocovariance is zero for lag τ > 1. Hence, such models are not suitable for modeling periodicity and oscillations since these would require autocovariances with periodic lags. Although autocovariances or autocorrelations strictly speaking can only be used for stationary time series, they can also convey information from general time series. Figure 2 shows the autocorrelation function of the LH-hormone data of Fig. 1. This autocorrelation function shows clearly periodicity which relates to the periodicity in the original signals.

Fig. 2
figure 2

An example of an autocorrelation function of the LH data showing periodicity

Second-order time series models look like

$$ x_{t+1} = \theta_1 x_t + \theta_2 x_{t-1} + \epsilon_{t+1}, $$
(11)

and such models are capable of describing damped oscillations. They are more versatile in describing dynamics with periodic events. A more versatile class of times series models are ARIMA models, where the capital ‘I’ stands for integrating. Such models are also able to describe non-stationary behavior. There is a host of literature on how to estimate parameters in AR, MA and ARIMA models, see e.g., Box et al. (1994).

4.3 Correlations

A key feature of multivariate measurements is the covariation of the individual variables, usually measured in terms of covariance or correlation. Covariance is a measure of association of two random variables and appears as a set of parameters in the multivariate distribution function of the two random variables. For a bivariate normal distribution, this comes down to

$$ {\bf x} \sim N(\mu,\Sigma) $$
(12)

with

$$ {\bf x} = \left(\begin{array}{l} x_1\\ x_2\\ \end{array}\right), \mu = \left(\begin{array}{l} \mu_1 \\ \mu_2 \\ \end{array} \right), \Sigma = \left(\begin{array}{ll} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \\ \end{array} \right), $$
(13)

where σ ij is the covariance (or variance if i = j) of variables x i and x j . In the context of dynamic metabolomic data, consider the time sequences x t and y t . If both metabolites are driven by, or probing, the same underlying biological process, then they will show similar behavior. Although this similarity can be described by a covariance measure between x t and y t , this is, strictly speaking, not a covariance (see, e.g., Anderson 2003). The correct way of describing their mutual behavior is by writing

$$ \begin{aligned} x_t & = \gamma_x \xi_t + \nu_{x,t}\\ y_t & = \gamma_y \xi_t + \nu_{y,t}, \\ \end{aligned} $$
(14)

where ξ t represents the underlying dynamic process, γ x , γ y are parameters and νx,t, νy,t are disturbances. Depending on the variances of these disturbances relative to the variation in ξ t and the sizes of γ x , γ y , the time series x t and y t show similar behavior. From now on, we will use the concepts of covariance and correlation for the association between x t and y t , although this is a simplification.

4.4 Dimension reduction

When measuring many metabolites, a way to bringing down the complexity of the data is using (linear) dimension reduction of which there are essentially two classes of methods: (common) factor analysis and principal component analysis. The factor analysis model for the vector x (J × 1) containing the measured metabolites can be written as

$$ {\bf x} = {\varvec{\Uplambda}} {\varvec{y}} +{\varvec {\epsilon}} + {\varvec{\mu}}, $$
(15)

where \({\varvec{\Uplambda}} (J \times R)\) is a matrix of constants (loadings); y (R × 1) and ε(J × 1) are random vectors. The elements of y are called common factors and the elements of ε specific or unique factors. The vector μ is a vector of means of x. This model is a direct extension of model (14). Upon making assumptions regarding distributions and independence of terms, the parameters of model (15) can be estimated (Mardia et al. 1979). In words, the factor analysis model tries to model the covariance structure of the variables in x by using common factors.

Principal component analysis (PCA) can be interpreted in different ways: a transformation of the original variables or as a subspace approximation method (see, Smilde et al. (2004) for an extensive discussion). The transformation comes down to z′ = xP where z is an (R × 1) vector of scores and P a (J × R) matrix of loadings. This equation is invertible for J = R resulting in x′ = zP′ or x = Pz, and upon deciding on the value of R (usually smaller than J,) x = Pz becomes an approximation of x. This is usually expressed in the equation

$$ {\bf X} = {\bf ZP}' + {\bf E}, $$
(16)

where X is the T × J matrix containing the measured time series; Z (T × R) contains the time series component scores; P (J × R) is the loading matrix and E (T × J) the matrix of residuals. The loading matrix P maximizes the variance of the scores and minimizes the sum of squared residuals. Hence, PCA focusses on the variance of x.

Both PCA and factor analysis models reduce the dimensionality of the original problem (J) to R, where R is usually much smaller than J. There are differences between the models (Mardia et al. 1979; Jolliffe 1986), e.g, PCA does not provide facilities for the unique factors and the factors y are no linear combinations of the x-variables (note that the z variables are indeed linear combinations of the variables in X). If the unique factor contributions are small relative to the common factor contributions, PCA and factor analysis give similar results.

The factors y and scores z are sometimes called latent variables to distinguish them from the the manifest variables x. Although this nomenclature is somewhat sloppy in the case of PCA, the term nicely illustrates the basic assumption underlying the PCA and factor analysis models: the variation in x is summarized by a small set of underlying and unobservable variables.

4.5 What are dynamic methods?

It is useful to give a precise definition of a dynamic metabolomics data analysis method. This will be done with the example of PCA which is not a dynamic method in our definition.

Suppose a metabolomics data set is available where ten metabolite concentrations are measured at five time points. The resulting matrix has five rows and ten columns representing the measured metabolite values and can be decomposed using PCA. For simplicity, only the first principal component is considered. The PCA results in score vector z orig and loading vector p orig . Next, the original data are shuffled, such that the time evolution between the rows is broken. The subsequent PCA of the data the gives scores z shuffled and loadings p shuffled . After this PCA, the z shuffled can be reshuffled thereby undoing the initial shuffling. The resulting scores z reshuffled are exactly equal to the original scores z orig , and it also holds that p orig  = p shuffled  = p reshuffled . Hence, PCA is insensitive to the evolutionary nature of the time axis and is thus not a dynamic metabolomics method. The definition of a dynamic metabolomic data analysis method is now simple: that method should be sensitive to the evolutionary nature of the time axis.

5 Dynamic metabolomic data analysis

5.1 Fundamental models

When a time series for a single metabolite is measured (denoted as x t ) and the form of the difference equation is known, then finding dynamics comes down to estimating unknown parameters α in the difference equation

$$ x_{t+1} = f(x_t,\alpha), $$
(17)

where an autonomous system is assumed (no explicit t in (17)). Several methods exists for estimating the parameters α. One of those methods is using least squares (or nonlinear least squares), however, such problems can be very complicated in terms of irregular error surfaces and very correlated parameter estimates. Moreover, they have the risk of getting stuck in local minima. This can be avoided to some extent by using natural computational methods such as genetic algorithms or simulated annealing (Apostu and Mackey 2008). A viable alternative is to use smoothness constraints for regularization, thereby making the error surface less rugged and the problem better solvable (Ramsay et al. 2007).

An example of using a difference equation in practice are hormone dynamics. A simple model describing measured dynamic hormone behavior in human blood is

$$ x_t = x_{t-1}e^{-k}+\phi_t+\epsilon_t, $$
(18)

where t is the index for time points, the parameter k is the first-order decay constant, ϕ t is the pulsatility function and ε t is the measurement error. The pulsatility function ϕ t is nonlinear and represents the secreted hormone. This function can be defined in different ways (Vis et al. 2009). Hence, Eq. 18 is an example of a nonlinear non-autonomous difference equation.

The measured hormone levels are shown in Fig. 3 as dots in the upper panel. The pulsatility function was constrained to have only a limited number of pulses (bars in middle panel). The decay is clearly visible in the slopes of the drawn line (upper panel) and the residual ε is presented in the lower panel. The model fits the data well, and gives a decay rate and information about pulsatile behavior important for endocrinologists to study normal physiology and pathophysiology (including diseases).

Fig. 3
figure 3

Measured and fitted luteinizing hormone (LH) during 1 day. Legend: dots upper panel are original data; drawn line upper panel is fitted model; bars middle panel are estimated hormone pulses; line lower panel are residuals

The idea of using difference equations can be generalized to multiple metabolite measurements. When measurements are available on J metabolites as a function of time, then these can be symbolized as x t (J × 1). A second-order nonlinear difference equation for such a vector is then

$$ {\bf x}_{t+1} = F({\bf x}_t,{\bf x}_{t-1};\alpha), $$
(19)

where an autonomous system is assumed, \(F : R^{2J} \rightarrow R^J\)

is a vector valued function and α is a set of parameters. The underlying biology or physiology dictates the specific form of the function F and it comes down to estimating parameters α. In principle, the same methods can be used as in the univariate case, but the multivariate problem is usually much harder to solve. An example for a system of two genes can be found in the gene-expression literature (Cao and Zhao 2008).

For a large number of metabolites and for high dimensional α fitting a set of difference equations is difficult. Moreover, in most cases the exact form of F is unknown and alternative models have to be discriminated with a limited set of samples. This poses a challenging experimental design question: at which time points should the samples been taken to optimally discriminate between competing models?

An alternative to modeling simultaneously all measured metabolites in one single set of equations is to first select the most important metabolites and model only those. This route was taken in the nutrikinetics example, where data analysis preselected three metabolites of potential interest. Subsequently, for each metabolite and each subject a set of two coupled fundamental models were used: one describing the behavior under placebo conditions and one the behavior under treatment conditions. The power of this approach is that each subject serves as her/his placebo thereby reducing the inter-person variability dramatically. This is especially important in nutritional studies because the effect sizes are usually small (van Velzen et al. 2008). The equations to describe the behavior of the cumulative excreted metabolite in the placebo (x pla) and treatment (x tea) period are

$$ x^{pla}_{np} = \alpha^{pla}+\beta t_{np}+\epsilon^{pla}_{np}\\ $$
(20)
$$ x^{tea}_{nt} = \alpha^{tea}+\beta t_{nt}+x^{tea}_{max}(1-e^{-k_e(t_{nt}-\tau)})+\epsilon^{tea}_{nt};\quad t_{nt}\geq\tau $$
(21)
$$ x^{tea}_{nt} = \alpha^{tea}+\beta t_{nt}+\epsilon^{tea}_{nt}; \quad t_{nt}<\tau $$

where ‘pla’ abbreviates placebo; ‘tea’ abbreviates the treatment with tea; t np , t nt indicate the time points of measurements for placebo and treatment periods, respectively (these are not equal). The parameters to estimate are τ (lag time); α pla , α tea (off sets); β (linear cumulative increase); x tea max (maximum output of metabolite); and k e (first-order rate constant). The sums of squared values of εpla np and εtea nt are simultaneously minimized using a least squares fit. The working of these equations is shown in Fig. 4. After fitting the data, the estimated parameters can be used for phenotyping. For instance, the net cumulative urinary excretion after 48 h can be calculated as \(\widehat{x}^{tea}_{net} = \widehat{x}^{tea}_{max}(1-e^{-\widehat{k}_ e(48-\widehat{\tau})})\)

Fig. 4
figure 4

Nutrikinetic modeling. The dots and stars represent measured metabolite concentrations at different time points. Shown are the original data (a); the net treatment effect (b) and the model residuals (c). Some of the parameters, explained in the text, are also indicated

and, subsequently, can be used to characterize individual metabolic status (van Velzen et al. 2009).

5.2 Basis functions

If the underlying biology dictates a certain preset form or basis function of the dynamics, then this form can be fitted to the data. Some basis functions (e.g., monotonic decreasing, monotonic increasing and unimodal profiles) are shown in Fig. 5. Examples of the use of splines as basis functions for dynamic data can be found in the gene-expression literature (de Hoon et al. 2002; Storey et al. 2005). No examples are available for similar approaches in metabolomics. One of the reasons may be that postulating basis functions for dynamic metabolomic data is not that easy. Once the basis functions are chosen, the approach is simple because it comes down to simple regression steps. Usually, only few parameters are required and hence the sample sizes can be small. After fitting the individual metabolomic time profiles on the basis functions, the best fitting ones are selected and metabolites with a similar time course behavior are clustered. Hence, a special type of correlation is found, namely the covariation with basis functions. These basis functions are guesses of underlying functions ξ t and, hence, fit in the framework of (15). This procedure will automatically give a dimension reduction because the basis functions serve as ‘latent’ variables.

Fig. 5
figure 5

Examples of simple basis functions

For periodic or oscillating time series, Fourier Analysis or Wavelet transformations can be used. Fourier Analysis requires large sample sizes and repeated patterns; in that respect Wavelets are more flexible. For the analysis of high-dimensional metabolomics data, forcing the latent time variables to follow a wavelet or Fourier transform structure is worthwhile. Combining wavelets with principal component analysis is already done in chemical engineering (Bakshi 1998). A sophisticated way of using basis functions is by means of hidden Markov models (Schliep et al. 2003). The basis functions are implicitly defined in the emission densities of the hidden nodes, thereby also allowing for some flexibility and adaptation of the functions.

5.3 Dimension reducing methods

Usually in metabolomics, many variables are measured, this can range from 100 to 1000 (Bijlsma et al. 2006). Clearly, finding underlying dynamics in such data has to be simplified by reducing the number of variables. This can be done in several ways: by selecting important variables or by dimension reducing methods. The latter class of methods is very broad and versatile: principal component analysis, factor analysis, including all their lagged and dynamic versions. Those will be discussed in some detail.

Variable selection can be done in various ways. If biological knowledge is available, then this should drive the selection. However, in most metabolomics applications discovery of new biology is the goal and hence prior information for selecting the most important variables is by definition not available. Then data driven variable selection techniques have to be used which is a risky undertaking. Although there exist many methods for variable selection in ‘classical’ statistics (e.g. for regression problems forward selection, backward elimination and stepwise regression), the main problem in high-dimensional data sets is overfitting. By testing (almost) all combinations of variables in a high-dimensional problem, this number becomes so high that overfitting cannot be avoided. Hence, such a selection has always to be accompanied with a good validation strategy to avoid the so-called selection bias (Ambroise and McLachlan 2002). The whole topic of variable selection, including proper validation, deserves a critical review in itself. Obviously, upon assuming that we have selected a number of relevant variables, preferably using a priori biological knowledge, we can use some of the dynamic methods as exemplified in this paper.

Combining a priori knowledge of the underlying dynamics with a dimension reduction approach is best explained by using the factor analysis framework:

$$ \begin{aligned} {\bf x}_t & = \Uplambda {\bf y}_t + \epsilon_t + \mu \\ {\bf y}_{t+1}& = f({\bf y}_t,t;\alpha), \\ \end{aligned} $$
(22)

where, again, α contains the (unknown) parameters. The issue is the form of the functional relationship f. Either this function is known or it has to be estimated from the data. To the best of our knowledge, models like (22) have not been explored in X-omics data analysis. Model (22) can be simplified (dropping the term μ for simplicity) by postulating

$$ \begin{aligned} {\bf x}_t & = {\varvec{\Uplambda}} {\bf y}_t + \epsilon_t\\ {\bf y}_{t+1}& = \varvec{\Uptheta} {\bf y}_t,\\ \end{aligned} $$
(23)

where the dynamics are in the latent variables y t in a simple way. This is a combination of dimension reduction and time series analysis.

The idea of making factor analysis models dynamic can also be implemented differently (dropping again the term μ). Dynamic factor analysis (Molenaar 1985) models the data as

$$ {\bf x}_t = \sum_{l = 0}^L{\varvec{\Uplambda}}_l {\bf y}_{t-l} + \epsilon_t, $$
(24)

where y t contains the R factor scores at time t and these factors scores are assumed to be generated by a white noise process (uncorrelated). The index l stands for lag and, hence, lags upto and including L are considered. All the dynamics in x t is captured by the lagged loading matrices \({\varvec{\Uplambda}}_l\) .

Component models can also be made dynamic. Besides the obvious extension of (16) where the scores z t are forced to follow a predefined dynamic model, there are two alternative ways of constructing dynamic PCA models, called lagged-PCA and dynamic PCA. Lagged-PCA is a simplified version of the more general Lagged Simultaneous Component Analysis (Timmerman 2001) for analyzing multiple data sets simultaneously. To explain the idea of lagged-PCA, it is convenient to introduce the backshift matrix B l - where l = 0,…, L defines the time lags - which is defined as follows

$$ {\bf B}_l = [0_{T \times (L-l)}| I_T | 0_{T \times l}]. $$
(25)

Using the scores Z ((T + L) × R), the loadings P (J × R), and residuals E (T × J), the lagged-PCA model becomes

$$ {\bf X} = \sum_{l = 0}^L {\bf B}_l {\bf Z} {\bf P}_{l}' + {\bf E}. $$
(26)

A small numerical example for L = 2 is given to illustrate the working of this model:

$$ {\bf B}_0 {\bf Z} = \left(\begin{array}{lllllll} 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array} \right) \left(\begin{array}{ll} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ 13 & 14 \\ \end{array} \right) = \left(\begin{array}{ll} 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ 13 & 14 \\ \end{array}\right), $$
(27)

which shows the implicitly defined zero shift matrix B 0 and scores Z. The first lag is modelled as

$$ {\bf B}_1 {\bf Z} = \left(\begin{array}{lllllll} 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \end{array}\right) \left(\begin{array}{ll} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ 13 & 14 \\ \end{array}\right) = \left(\begin{array}{ll} 3 & 4 \\ 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{array}\right), $$
(28)

and the second lag is modelled as

$$ {\bf B}_2 {\bf Z} = \left(\begin{array}{lllllll} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ \end{array}\right) \left(\begin{array}{ll} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ 11 & 12 \\ 13 & 14 \\ \end{array}\right) = \left(\begin{array}{ll} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ 7 & 8 \\ 9 & 10\\ \end{array}\right), $$
(29)

which results in the model of X:

$$ {\bf X} = {\bf B}_0 {\bf Z} {\bf P}_0' + {\bf B}_1 {\bf Z} {\bf P}_1'+{\bf B}_2 {\bf Z} {\bf P}_3' + {\bf E}. $$
(30)

Clearly, the lagged-PCA model has three sets of loadings (P 0, P 1 and P 2) representing the relationships between the variables in X and the scores Z at different lag-times.

The backshift operator works on the scores in lagged-PCA, this operator can also work directly on the X matrix, resulting in dynamic-PCA (Ku et al. 1995). The idea is shown using a simple example for X:

$$ {\bf X} = \left(\begin{array}{ll} 1 & 8 \\ 2 & 9 \\ 3 & 10 \\ 4 & 11 \\ 5 & 12 \\ 6 & 13 \\ 7 & 14 \\ \end{array}\right) $$
(31)

in which the rows represent time points and columns metabolites. Upon using backshift operators, the submatrices

$$ {\bf B}_0 {\bf X} = \left(\begin{array}{ll} 3 & 10 \\ 4 & 11 \\ 5 & 12 \\ 6 & 13 \\ 7 & 14 \\ \end{array}\right), {\bf B}_1 {\bf X} = \left(\begin{array}{ll} 2 & 9 \\ 3 & 10 \\ 4 & 11 \\ 5 & 12 \\ 6 & 13 \\ \end{array}\right), {\bf B}_2 {\bf X} = \left(\begin{array}{ll} 1 & 8 \\ 2 & 9 \\ 3 & 10 \\ 4 & 11 \\ 5 & 12 \\ \end{array}\right), $$
(32)

can be concatenated to form \(\widetilde{{\bf X}}\)

$$ \widetilde{{\bf X}} = \left [{\bf B}_0 {\bf X} | {\bf B}_1 {\bf X} | {\bf B}_2 {\bf X} \right] $$
(33)

which can then be subjected to an ordinary PCA. Hence, the dynamics are in the manifest variables and subsequently, a PCA captures these dynamics. Note that the covariance matrix of \(\widetilde{{\bf X}}\) contains three types of covariances: (i) those between variables (lag = 0), (ii) those within different time points of the same variables (auto-covariance) and (iii) those between different time points of different variables (cross-covariance). Hence, the subsequent PCA-scores capture a mixture of these three, obscuring the individual contributions.

In fact, matrix \(\widetilde{{\bf X}}\) is a matricized three-way array (Kiers (2000)) \({\underline{\bf X}}\) of size T × J × L. Hence, an alternative would be to analyze this array with PARAFAC or Tucker3 models (Smilde et al. 2004). It is difficult to say how many samples are needed for stable results for the estimates of the different dynamic factor analysis, lagged-PCA and dynamic-PCA models. A disadvantage of dynamic-PCA is that it ‘cuts off’ parts of X (the higher L the more severe this cut-off is) thereby reducing the number of samples in the time direction. This stability will depend on the measurement error, the intrinsic dynamics and the complexity (i.e., intrinsic rank) of X.

Dynamic-PCA also has a close cousin called dynamic-PLS. There are three alternatives for dynamic-PLS. The first version takes lagged x-variables and performs then an ordinary PLS between the expanded (and lagged) X matrix and the phenotype y. This procedure is based on Finite Impuls Response models as used in systems identification (Ljung 1987). An extension of this is to incorporate also lagged y-variables in the new X-block (Qin and McAvoy 1996).This is a direct generalization of the ARMA modeling strategy (see below). The drawback of both methods is that the X matrix (which is already huge) is even expanded with many lagged variables thereby aggravating the problem of low sample-to-variables ratio’s. Hence, this does not seem to be a viable route to take, despite the dimension reduction capability of PLS.

An alternative is presented in the process control literature and consists of defining a dynamic filter to account for the dynamics in X and, subsequently, building an (static) model between the filtered X and y with PLS (Kaspar and Ray 1993). Stated otherwise, the dynamics in X are ‘whitened’ and then related to y. Although this approach does not have the drawback of ‘blowing-up’ the dimensions of X unfavorably, it is sensitive to the specified form the filter. Tuning such a filter might not be a trivial task in dynamic metabolomics.

Another way to account for dynamics is to use time as an external variable. This is the approach taken by batch modelling (Wold et al. 1998). The idea is building a PLS model between X (T × J) and a y-vector containing either a maturity variable or the time corresponding to the sampling of the rows of X (see Fig. 6).

Fig. 6
figure 6

Batch process modeling: a PLS model is built between time-resolved measured metabolites (collected in X and a y-variable measuring time. The score vectors t 1 and t 2 are obtained from the PLS modeling

A maturity variable is a measured variable indicating the progress of the biological process. In both cases, the PLS model finds features in X related to a time axis and as such is a dynamic approach. This approach has been used in several metabolomics applications (Antti et al. 2002; Jonsson et al. 2006), but has also some drawbacks. One of the problems of this approach is that it will poorly describe features in X that do not align with the imposed time axis (Westerhuis et al. 1999). A new method has been published based on OPLS models to describe successive differences between two adjacent time points (Rantalainen et al. 2008). The drawback of this method is that the time trajectory information is given in a set of models hampering interpretation.

5.4 Time series analysis

There exists a multivariate extension of time series models:

$$ {\bf x}_{t+1} = \Uptheta {\bf x}_{t} + \epsilon_t, $$
(34)

which is a multivariate AR(1) model (or a Vector AR(1) model) with Θ an J × J matrix of coefficients and ε t a J-valued vector of random shocks. Again, the matrix Θ has to fulfill some regularity conditions. Extension to second-order systems and moving average models also exist. Estimating the model parameters Θ is possible, but requires a lot of samples for stable results, especially if J is rather large (Holtz-Eakin et al. 1988).

For regression type problems, ARMAX models can be used. An example of an ARMAX (1,1) model is

$$ y_{t+1} = \theta y_t + \varphi_1 x_{t+1} + \varphi_2 x_{t} + \epsilon_{t+1}, $$
(35)

where θ, φ1 and φ2 are parameters to be estimated. The time lags used for x and y are both 1, hence the notation ARMAX (1,1) model.

An alternative is to write the set of difference equations in state-space notation,

$$ \begin{aligned} {\bf x}_{t+1} & = {\bf A} {\bf x}_{t} + {\bf B} {\bf u}_t\\ {\bf y}_t & = {\bf C} {\bf x}_t + \epsilon_t, \\ \end{aligned} $$
(36)

where A is the J × J system matrix, C the K × J measurement matrix, u t a M × 1 vector of inputs and B an M × J input transfer matrix and y (k × 1) the vector of measurements (Fortmann and Hitz 1977; Ljung 1987). For generality, the forcing term B u t is introduced. In the case of metabolomics experiments, this forcing term is usually complicated. This forcing term can be a diet, a toxic compound or an administered drug. Then B represents the influence of such an intervention directly on the metabolites, which is hard to estimate and is usually not explicitly considered. Since we consider all measured metabolites, C = I (the identity matrix). Then, by rearranging (36) and solving for y t , we get

$$ {\bf y}_{t+1} = {\bf A} {\bf y}_t + \epsilon_{t+1}- {\bf A} \epsilon_{t}, $$
(37)

which is a multivariate ARMA (1,1) model, showing the intimate relationship between time series models and state-space models. Some application of state-space models are reported in the gene-expression literature (Wu et al. 2004) but to our knowlegde, no applications have been reported in metabolomics.

State-space models can also be combined with dimension reduction when the state variables x t are regarded as underlying (latent) variables and y t represent the measured metabolites. The matrix C then relates the manifest to the latent variables and the dimensionality of x t can be much lower than y t . The dynamics are now imposed on the latent variables. This approach differs from dynamic factor analysis (see Eq. 3) because in the latter case the time instances of the latent variables are considered as independent gaussian.

5.5 ANOVA

A different way of tackling dynamic data is by using analysis of variance (ANOVA) models (Searle 1971). In such models, the factor time can be accounted for in both a qualitative and quantitative way. The qualitative analysis pertains to modeling the factor ‘time’ at its different levels, whereas the quantitative analysis models the factor ‘time’ in terms (of mixtures) of linear, quadratic and/or cubic trends (depending on the number of time points available). A convenient way of quantitative modeling is by using orthogonal polynomials, since the consecutive terms in such polynomials are orthogonal thereby facilitating the estimation process. In the gene-expression literature, there are examples of both qualitative modeling (Storey et al. 2005), as well as quantitative modeling (Conesa et al. 2006).

The multivariate extension of ANOVA is called Multivariate Analysis of Variance. This extension is not straightforward (e.g., which test statistic to use (Stahle and Wold 1990) and it is not clear how to use it for multivariate time-resolved data (e.g., how to treat the factor time). Moreover, high-dimensional data gives singular covariance matrices. One way to generalize ANOVA to the high-dimensional case is by performing separate ANOVA’s on the individual metabolite profiles thereby partitioning the data according to sources of variation. Subsequent simultaneous component analysis (SCA) models on the different parts of the data perform then the necessary dimension reduction. These approaches, called multilevel-SCA (MSCA) and ANOVA-SCA (ASCA), have been used succesfully in psychometrics (Timmerman and Kiers 2003), metabolomics (Jansen et al. 2004, 2005; Smilde et al. 2005; Vis et al. 2007), proteomics (Harrington et al. 2005), geneexpression (Nueda et al. 2007) and also in process chemometrics (de Noord and Theobald 2005). There exists also a multiway version: PARAFASCA (Jansen et al. 2008). The ASCA methods do not assume linear time behavior and is therefore a general method for capturing nonlinear time behavior (Smilde et al. 2008).

An alternative - called SMART - is to apply special preprocessing steps of the metabolomics data and perform a subsequent PCA (Keun et al. 2004). SMART has some drawbacks, notably its lack of orthogonal partitioning hampering interpretation (Jansen et al. 2005). Moreover, SMART is not a dynamic method according to our definition. ASCA is only a dynamic method if the factor time is treated in a quantitative way in the ANOVA model.

A route taken in the gene-expression literature is to perform single ANOVA’s per gene and then cluster the results afterwards (Conesa et al. 2006). It is also possible to combine both steps by using mixture modeling to find genes with a similar time behavior and estimate the dynamic behavior then for the whole cluster simultaneously (Rodriguez-Zas et al. 2006). This amounts to a considerable reduction of parameters to estimate. This procedure can also be used in metabolomics.

5.6 Smoothness

A very general approach to account for the consecutiveness of time evolving processes is by using smoothness constraints. In terms of curve fitting of a single metabolite time profile, this approach can be described as

$$ \min_{{\bf y}} \left[ ||{\bf x} - {\bf y} ||^2+\lambda||{\bf D} {\bf y} ||^2 \right], $$
(38)

where x = (x 1, ..., x T )′ is the vector of original time series measurements, y contains the fitted (or smoothed) values y = (y 1, ..., y T )′, D is a matrix differentiating consecutive elements of y and λ ≥ 0 is a metaparameter regulating the constraint ||D y||2 (Eilers 2003; Ramsay and Silverman 1997). There are also other constraints possible, e.g., penalties on second-order differences in (38)

A way to combine ideas of consecutiveness with dimension reduction in time-resolved high-dimensional metabolomics data is to make the estimated principal component scores ‘as autocorrelated as possible’. The method Maximum Autocorrelation Factors (MAF) does this for spatially resolved data, and can be used directly for the time-resolved data (Larsen 2002). This method calculates components z r , r = 1,…, R for which the lag l entries have a maximum autocorrelation, while being mutually orthogonal across r = 1,…, R. The lag l has to be chosen by the user. Interestingly, the MAF method is equivalent to the Molgedy-Schuster version of Independent Component Analysis (Larsen 2002). Hence, using Molgedy-Schuster Independent Component Analysis on the time-resolved data would also invoke consecutiveness.

Combining smoothness with dimension reduction can be done by applying smooth-PCA (Westerhuis et al. 2008). The smooth-PCA method models the data by solving

$$ \min_{{\bf Z},{\bf P}} \left[ ||{\bf X} - {\bf Z} {\bf P}' ||^2+\lambda||{\bf D} {\bf Z} ||^2 \right], $$
(39)

where again D is a first-order or second-order difference matrix and \(\lambda\geq0\) is a penalty parameter. The higher the value of λ, the smoother the scores in Z will become.

An example will be shown for the monkey data and, for simplicity, results will only be shown for a typical female monkey, although an analysis using all ten monkeys (i.e., smooth-Simultaneous Component Analysis) would also be possible. Prior to analysis, the data were mean-centered across the time-mode. A smooth-PCA is compared to a normal (non-smooth) PCA. To calculate the smooth-PCA, a second-order penalty was used in (39) and a special arrangement has to be made to accommodate for the non-equidistant sampling scheme (Westerhuis et al. 2008).

The first score vectors are shown for different values of λ, see Fig. 7.

Fig. 7
figure 7

Scores of the first PC and smooth-PC. Legend: the numbers 0 and 3 refer to the value of λ

For λ = 0, the PCA solution is obtained. Raising λ penalizes roughness more and makes the scores smoother. It is hard to give objective criteria to select λ, but a value of 3 seems reasonable, whereas a value of 30 (result not shown) gives a too smooth solution. The first score vector shows a rhythm with a period of 27–28 days which may be due to the oestric cycle (Xu et al. 2007). The corresponding loadings are shown in Fig. 8, and do not differ much between normal PCA and smooth-PCA. The first score explains 21.9% of the variation in the data, whereas the solution with λ = 3 explains 13.8%. Hence, the loss of variance explained should be compensated by a better interpretability. A detailed interpretation of these results is outside the scope of this paper.

Fig. 8
figure 8

Loadings of the first PC and smooth-PC. Legend: drawn line is PCA and dotted line is smooth-PCA

6 Discussion

It would be nice to end this paper with giving a scheme on when to apply which method in what situation. As in all statistical modeling situations, the dynamic modeling process is also an art without predefined rules.

First, the type of biological question, the amount of knowledge of the system and the availability of data is important. If a phenotypic variable is available, then this variable might steer the unravelling of the dynamics in the metabolome data. Looking for specific rhythms with known dynamics calls for different methods than exploring dynamic patterns in metabolomics data of relatively unknown organisms.

Second, the experimental design is important, the number of time points, their spacing in time and the number of metabolites measured. The design puts restrictions on the methods to use. Some of the methods require many time-resolved samples (time series methods) whereas other methods can do with a limited number (basis functions). With high-dimensional data, it is worthwhile to consider methods involving dimension reduction.

Third, the type of measurements performed is important. For example, NMR and MS data have different characteristics and these should be kept in mind when using dynamic methods. Some of the methods are easily adapted to accommodate nonhomogeneous errors. Such an adaption might be profitable in terms of the quality of the estimated parameters.

Preferably, the choices for measurement and data analysis are driven by the biological question, the data generating process, the experimental design and the assumptions of the data analysis methods.