Introduction

The diet is an important environmental exposure, and many dietary factors (nutrients and non-nutrients) are associated with disease prevention or causation [5]. The measurement of habitual dietary intake is thus an essential component of much health-related research, which must be both accurate and applicable to very large numbers of free-living individuals. This makes measurement of dietary exposure one of the most challenging problems in nutrition.

Problems with measuring dietary exposure

Conventional tools for collecting quantitative information on dietary exposure, such as diet diaries, food frequency questionnaires and 24 h recalls can be unreliable methods for characterising and quantifying eating behaviour. These tools are associated with both random and systematic errors arising from the assessment of the nature and frequency of food consumption and of portion size, daily variation in intake, the failure to report usual diet (due to changes in habits whilst taking part in an investigation or misreporting of food choice or amount) and the use of food tables to convert intakes of foods to intakes of energy, nutrients and other food components [5]. Bingham and collaborators investigated the accuracy of several dietary assessment methods in the UK arm of the European Prospective Investigation of Cancer and Nutrition (EPIC) by comparing 16 days of weighed records, a food frequency questionnaire (FFQ), a 24-h recall method and a 7-day food diary [6]. Their data showed that, when compared with weighed records, the FFQ tended to over-estimate almost all intakes whereas the 24-h recall under-estimated carbohydrates, vitamin C and alcohol intakes and the food diary over-estimated fat intake but under-estimated intakes of carbohydrates and calcium [6]. In addition, such conventional tools are inappropriate and/or unreliable for certain groups such as the obese or elderly people, whose self-reported energy intakes tend to be underestimated, as assessed by energy expenditure measurements using the doubly labelled water method [50, 63].

Inaccurate measurement of dietary exposure may make it difficult or even impossible to detect correlations between dietary exposure and disease risk. The follow-up study of markers of aflatoxin exposure in relation to liver cancer undertaken by Qian and collaborators is a good example. The relative risk (RR) of cancer, from aflatoxin consumption for individuals with high dietary exposure, was only 0.9 and was not significant when exposure was assessed by frequency of consumption of 45 foods, but the RR was 59.4 (and highly significant) when exposure was measured using biomarkers in urine samples [46]. Limitations in the accuracy and/or precision of measurements of dietary intake may help explain the conflicting results about the protective effect of micronutrients such as antioxidant vitamins in respect of risk of cancers or of cardiovascular disease. For example, the associations between breast cancer risk and dietary carotenoids, retinol, vitamin C and tocopherols remain uncertain, as demonstrated by inconsistent results from studies using, even validated, FFQ [59].

In an attempt to overcome the problems with measuring dietary exposure with self-reported methods, a number of biomarkers have been developed which can be used to validate intake estimates or to act as surrogates for intake measurements. Such biomarkers include the use of urinary output of potassium for potassium intakes. The correlation between intake and excretion can be very good (at least 0.7), even when dietary intakes are calculated from food tables rather than analysed, as long as sufficient 24 h urine samples are obtained [5]. Similarly, there are good correlations between individual estimates of protein intake and the 24-h urine nitrogen output provided, that, the completeness of the urine collection is checked using an independent marker e.g. p-aminobenzoic acid [5]. Further, the fatty acid composition of blood lipids or of adipose tissue can act as a biomarker of fatty acid intakes. Good correlations exist between concentrations of pentadecanoic acid (C15:0) and intake estimates of fats from milk or dairy products [51, 66]. When total fatty acid intakes were estimated [44], the correlation with the fatty acid composition of subcutaneous tissue remained good (R = 0.50) for poly-unsaturated fatty acids, but not for mono-unsaturated or saturated fatty acids (R = 0.22 and 0.24, respectively) probably because the latter can be synthesised in the tissues as well as being obtained from the diet. Whilst such individual biomarkers are valuable, they are not suitable for describing dietary exposure as a whole because they reflect only a very small range of food constituents. Thus, there is an urgent need to develop an alternative, non-subjective tool that can be used to assess the totality of dietary exposure and which could be applied to relatively large numbers of individuals at relatively low cost.

Advent of metabolomics approaches

Foods contain thousands of compounds which, upon digestion and metabolism, give rise to the metabolites present in body fluids such as blood and urine. In theory, it should be possible to distinguish which foods have been eaten and in what amounts from an assessment of the metabolites in these fluids. However, digestion, transport, storage, metabolism and excretion of food constituents is a complex and dynamic process resulting in a myriad of different metabolites present in a very wide range of concentrations. Until very recently, this complexity has meant that it was virtually impossible to design a strategy for assessing dietary exposure which would have the technological reach to address the heterogeneity of metabolites and have sufficient capacity to cope with large numbers of samples. Through developments in both technology and in bioinformatics to support metabolomics approaches, this situation is changing rapidly [20].

Metabolomics refers to comprehensive and non-selective analytical chemistry approaches aiming to provide a global description of all the metabolites present in a biofluid at a given time [7, 14, 23, 29, 55]. Metabolite contents of biofluids may be assessed through vibrational spectrometry platforms, including nuclear magnetic resonance (NMR), infrared spectroscopy (IR) or Fourier Transform IR (FT-IR) or by capillary electrophoresis coupled either to ultraviolet absorbance detection (CE-UV) or to laser induced florescence detection (CE-LIF). In addition, there are a range of mass spectrometry (MS) based approaches, some without any chromatography e.g. flow injection electrospray ionisation MS (FIE-MS) and direct infusion MS (DIMS) and others coupled with a chromatographic step to first attempt to separate metabolites before detection such as gas chromatography (GC–MS), liquid chromatography (LC-MS) or high pressure liquid chromatography (HPLC-MS). Any of these chromatographic steps may be followed by tandem MS or both NMR and MS [55]. The selection of the most suitable technology is generally a compromise between speed, selectivity and sensitivity.

Metabolomics datasets have specific characteristics which require appropriate statistical tools for their analysis. Indeed, where the intention is to measure simultaneously the entire metabolite content of biofluid samples collected from highly complex organisms (humans), the data produced by metabolomics experiments have enormous dimensionality (from 200 to 300 signals using GC–MS with time of flight detectors to about 2,000 using FIE-MS) and large biological variability [13, 30]. Such dimensionality and variance demand the use of powerful, multivariate data analysis tools for sample classification or discrimination [13, 21, 23, 30]. One of the best-known of these is principal component analysis (PCA), an unsupervised method which assesses natural clustering of sample classes and can be used to identify extreme outliers. For supervised analysis, typical multivariate algorithms used to separate treatment classes are linear discriminant analysis (LDA), partial least squares (PLS-DA), both discriminant analyses, and orthogonal projection to latent structures (OPLS), a form of regression analysis.

To date, metabolomics pipelines, which provide guidelines for all the steps from sample collection (with an appropriate study design), to the identification of two or more significantly different groups using pattern recognition statistics, has been applied to microbes, plants and some rodent models. Investigations with microbes have included the development of chemical taxonomy approaches to investigate the genetic diversity of fungal contaminants in food [52] or identity of bacterial species in mixed populations [57]. In plants, for example, metabolomics has been used to investigate possible unintended consequences in plants genetically engineered to exhibit novel enzyme activity [10]. In rodents, metabolomics approaches have been used in physiological evaluation, drug safety assessment, characterisation of genetically modified animal models of disease, and drug therapy monitoring [8, 34].

Metabolomics approaches applied to measurement of dietary exposure

In humans, metabolomics has been used mainly in studies focusing on diagnosis of disease [9, 11, 15, 25, 32, 39, 40, 56, 58], mode of drug/toxin action [4, 31, 36, 42, 47], and characterisation of novel foods [10, 48]. However, several recent commentary articles have suggested that metabolomics will have great value for nutritional studies [12, 1620, 38, 62, 65] and thus it is timely to exploit this technology platform to assess dietary exposure.

The first published study in which a metabolomics approach was described in a human nutrition experiment used NMR technology to monitor the effect of supplementing the diet with soy [53]. Only a small number of plasma samples were available and there was considerable inter-person variability but, despite these limitations, careful data pre-processing in combination with powerful discriminatory analysis grouped the samples into two main classes that reflected the dietary intervention.

Since then there have been a few studies using the power of metabolomics to link metabolite contents in human biofluids to acute or chronic dietary exposure. The study of urine samples obtained from healthy British and Swedish subjects revealed characteristic dietary and cultural features between the subjects of both countries, such as high trimethylamine-N-oxide (TMAO)-excretion in the Swedish population and high taurine-excretion, due to the Atkins diet [33]. Urine samples have also been used to investigate responses to ingestion of chamomile (Matricaria recutita) tea. Despite substantial inter-subject variation in metabolite profiles, clear differentiation between the samples obtained before and after chamomile ingestion was achieved with urinary excretion of hippurate and glycine being important discriminatory metabolites [60]. The effects of three experimental diets described as “vegetarian”, “low meat”, and “high meat” on urinary metabolite contents were also investigated [54]. PCA allowed differentiation of the characteristic metabolic signatures of the diets with creatine, carnitine, acetylcarnitine, and TMAO being elevated during the high-meat consumption period. Application of OPLS discriminant analysis allowed the low-meat diet and vegetarian diet signatures to be characterised, and p-hydroxyphenylacetate (a microbial-mammalian co-metabolite) was higher in the vegetarian than meat diet samples, signalling an alteration of the bacterial composition or metabolism in response to diet [54]. More recently, using urine samples from the large (4,630 participants) INTERMAP epidemiological study, involving 17 population samples in China, Japan, UK and USA, Holmes and collaborators have shown that a metabolomics approach can be used to distinguish East Asian from Western populations. In this study novel associations between urinary metabolites and blood pressure were discovered which suggested that both dietary factors per se and also altered gut microbial metabolism may be related to raised blood pressure [26]. Besides NMR-based technology, reports describing metabolite analysis using MS-based techniques in nutrition studies are emerging. For example, studies of polyphenol concentrations in human urine (using HPLC-tandem MS) after consumption of six different polyphenol-rich beverages showed that concentrations of chlorogenic acid, gallic acid, epicatechin, naringenin and hesperetin could be used as specific biomarkers to evaluate the consumption of coffee, wine, tea, cocoa and citrus juices, respectively [27].

Selection of metabolomics approaches for characterising dietary exposure in humans

The nutritional metabolomics studies undertaken in humans in the last 5–10 years have already demonstrated the ability of these techniques to measure known, or to discover new, compounds whose presence in blood or urine can be correlated with dietary exposure. This biomarker approach, which focuses on assessing specific biomarkers in biofluids to reveal the consumption of specific foods (e.g. TMAO, β-carotene or eicosapentanoic acid concentrations to identify high meat, vegetable or oily fish intakes, respectively), has limitations. Firstly, biomarker assays for a particular dietary component normally consists of the measurement of the concentration of one, or a few, blood/urine metabolites using pure standards as a reference. Establishment of such assays presupposes sufficient background knowledge both of the areas of chemistry involved and the spectrum of metabolic responses to be expected from ingestion of the target food or food constituent to allow identification of the appropriate biomarker. For compounds such as anthocyanins for example, there is little reliable information on their absorption and metabolism in human subjects, and available studies have reported contradictory results [28]. Secondly, such approaches may demand specific extraction/purification procedures and use detailed, targeted, often low-throughput analytical procedures bespoke for each biomarker.

These targeted metabolomics analyses have been successfully applied in pharmacology, toxicology and medical screening [11, 15, 25, 32, 40, 42, 58], but are not entirely suitable for the characterisation of dietary exposure. The problem centres on the fact that currently well over 60% of the natural metabolites in raw food material have yet to be structurally characterised [14, 55] and, in consequence, the fate of such metabolites post-consumption are poorly understood. In addition, standards for many metabolites that potentially might be used as biomarkers are simply not available. With this lack of prior knowledge, the generation of hypotheses for the development of biomarkers indicative of exposure to specific food constituents is difficult [43, 45]. Further, any potential bioassays have to cope with the dynamic nature of metabolite concentrations in body fluids following food consumption. A logical approach to work within these limitations is to utilise metabolite “profiling” or “fingerprinting” techniques which allow the simultaneous monitoring of multiple components of blood/urine whose, collective, relative behaviour may provide metabolic signals indicative of food intake. Such profiling/fingerprinting assays should (1) be sensitive enough to survey signals from as many metabolites as possible in a non-biased way to allow the derivation of global metabolite profile/fingerprint patterns associated with specific features of diet, (2) not require any direct metabolite identification to develop a consistent metabolite fingerprint, (3) use metrics of relative signal ratios, rather than absolute concentration, to allow “normalisation” of fingerprints, (4) use data mining procedures that not only can discriminate between metabolite fingerprints but which can determine the most important signals responsible for the differences, (5) use analytical chemistry procedures in which the relevant fingerprint signals can be linked easily to specific metabolites.

Non-targeted metabolite profiling, using sensitive “time of flight” detectors, was used to analyse extracts of raw food materials [10, 64] and has utility for the detection of drug metabolites [45]. By attempting to profile all metabolite peaks detected automatically by instrument software, this approach is able to find metabolite differences between samples without any prior knowledge of which signals might be discriminatory. However, the use of a chromatographic step, to first attempt to separate metabolites before detection, requires exquisite control over the chromatographic process to obtain reproducibility and demands rigorous approaches to pre-process data to deconvolve, align and annotate peaks correctly [35].

A more “global” overview of total metabolite composition can be obtained from much more rapid and reproducible metabolite fingerprinting techniques which do not incorporate a chromatographic step. For example, FT-IR [22] and NMR [37, 61] generate global chemical fingerprints with little need for specialised sample preparation [23, 24]. However, these methods are less sensitive than FIE-MS [3] and, generally, require a further level of directed analysis to link any differences in wavenumber (FT-IR) or chemical shifts (NMR) to specific chemistry [23]. In contrast, fingerprinting techniques based on MS such as FIE–MS or DIMS offer the advantage that the measured “variables” [mass-to-charge (m/z) ratios] can be linked more directly to an individual metabolite using additional information on atomic mass [1, 2, 10, 42, 49, 52, 58];). FIE-MS fingerprints are developed following “soft” ionisation of the sample during injection over a period of 1–2 min and can be regarded as simplified images of total sample composition in that the m/z ratios are compiled by integrating the levels of more than one metabolite (e.g. for isomers) that give a stable ion from each m/z. During soft ionisation the main products are charged versions of the parent molecule; fragmentation products are relatively rare and thus the identity of molecules producing signal at a specific m/z can be investigated directly based on the predicted mass of the metabolite. The high throughput nature of FIE-MS is also apparent following data acquisition as fingerprint data require little, if any, pre-processing prior to analysis. The ability to analyse rapidly a large number of samples with minimal data processing provides desirable attributes for first pass sample analysis in large scale epidemiological studies.

Perspectives

Such high-throughput, non-targeted metabolite fingerprinting using FIE-MS has been successfully applied and validated in the context of the dynamic interaction between a pathogen and its host, where a time series of leaf tissues were collected from Brachypodium leaves after infection with the fungal pathogen Magnaporthe grisea [3, 13, 41]. We are now applying this analytical approach to characterise dietary exposure in humans through measurements on blood and urine. In the MEDE Study (MEtabolomics to characterise Dietary Exposure), blood and urine samples have been collected from healthy volunteers in the fasted state and after consumption of carefully designed test meals. The results from the first phase of the MEDE Study are promising showing that generated metabolite fingerprints (1) were reproducible within individuals, (2) discriminated clearly between fasting and fed samples, and (3) displayed an overall variance dominated by sample treatment class and not by gender or individual (Favé et al. Unpublished data). The next phase of the MEDE Study will investigate the metabolite fingerprints in biofluids from volunteers exposed to specific test foods including an oily fish, a wholegrain cereal product, a green vegetable and a fruit. The MEDE Study is designed to provide proof of principle that metabolomics approaches can be used to generate novel biomarkers of dietary intake in circumstances where the biofluids have been collected at 3–4 times over a few hours after consumption of the test meal. Metabolomics might also be useful in characterising habitual dietary exposure but here the challenge is likely to be considerably greater. This will require additional systematic research including investigation of the kinetics of metabolite transfers between available biofluid pools (e.g. blood, urine and saliva) and sites of storage or sequestration e.g. adipose tissue or bone of the metabolites to identify metabolites with long whole body half-lives which will be candidates for assessment of sustained (habitual) exposure.