A contemporary review on Data Preprocessing (DP) practice strategy in ATR-FTIR spectrum

https://doi.org/10.1016/j.chemolab.2017.02.008Get rights and content

Highlights

  • There has been increasing interest in ATR-FTIR spectroscopy in diverse field of application.

  • Data preprocessing (DP) has not been given considerable attention by most ATR-FTIR spectroscopists.

  • DP methods of choice have most likely been selected according to examples from literature and have limited to derivatives, mean-centering and normalization to sum.

  • Post-DP application assessment is not widely practiced.

  • Rationales which could possibly have contributed to malpractice have been discussed.

Abstract

ATR-FTIR spectroscopy in the combination with chemometrics has been practiced over the past decades. Works presented in numerous disciplines provide ample empirical evidence in support for the coupling relationship. However, Data Pre-processing (DP) which constitutes the first step in chemometric analysis pipelines, is seldom given reasonable attentions. The aim of this paper is two-fold: (a) to review contemporary DP practice strategy by ATR-FTIR user, and (b) to critically discuss the rationales that could have been nurturing such practices. In the first part, basic concepts of chemometrics and ATR-FTIR spectroscopy are described. Then, the status quo of DP practice strategy is outlined and critically discussed on whether the contemporary practice has been malpractice or best practice. Finally, rationales that could have possibly contributed to some of the malpractices are discussed.

Introduction

Over the past two decades, technological knowledges have been evolving so rapidly and contributing to production of High Dimensionality (HD) data in various knowledge disciplines [1], [2], [3], [4], [5]. Technological advancements have made collection of analytical data from a tiny sample possible and feasible within such a short period of time [6], [7], [8], [9]. Nonetheless, the technological advancements resemble a two-bladed knife, at the same time, such cutting-edge analytical instruments tend to produce data which cannot be readily analyzed and interpreted so to achieve the targeted goal of analysis [10], [11]. Data preprocessing (DP) which is also known as data pre-treatment methods are used to remove or reduce unwanted signals from the HD data prior to modeling analysis. As such, DP step is always located right after data collection or acquisition steps in the chemometric pipeline for analytical data. An improper selection of DP methods may negatively affecting the model accuracy and interpretability [12], [13]. The vital roles of DP methods have been discussed by numerous sources of books and references that are available in the literature [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22].

Vibrational spectroscopy instruments including Raman, NIR and MIR spectroscopy, have been coupling with chemometric algorithms in accomplishing different analytical tasks [5], [8], [9], [15], [16], [17]. Recently, ATR-FTIR spectroscopy is preferred over transmission FTIR spectroscopy, in diverse field of application [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. The replacement is credited to its non-destructiveness, ease of application and relatively low analysis cost as well as rapid analysis time [6]. Following that, plenty of papers have been published in diverse application fields with the aim to “develop methods to class or differentiate or identify a particular samples by using ATR-FTIR spectra combined with chemometrics” [23], [24], [25], [26], [27], [28], [29], [30], [31]. However, most of these papers has not allocated considerable efforts to systematically select and assess DP methods, prior to modeling. The importance of proper selection of DP methods have been ignored that the user tends to just follows conventional choices of DP methods or shortlisted a few DP methods intuitively. We shall discuss on this matter more in the following section.

To date, a few reports have been reviewed on the application impacts of DP methods in HD data [10], [14], [15], but only one is devoted to DP evaluation tools [12].To the best of our knowledge, no paper is discussing on the DP practice strategy. On the other hand, most review works or tutorials related to DP methods has always been using NIR data [e.g.14] or Raman [e.g. [22], [32]] data to demonstrate its practical aspects. It is hardly found any work which is addressing the impacts of DP methods using ATR-FTIR spectrum as practical examples. Part of motivation in writing this article comes from the first author’s experience after applying chemometrics tools to solve ATR-FTIR spectrum-based problem from the context of forensic science [33], who hardly find any comprehensive references with respect to strategy that could be adopted for selection of DP methods. Thus, this work will be the first ever review on the novel aspect of DP, i.e. DP practice strategy, using ATR-FTIR spectrum as practical example, based on selected papers published since 2012.

In the subsequent sections, basic concepts of the two core subjects of concern, i.e. ATR-FTIR spectroscopy and chemometrics, will be briefly explained. Following that, status quo of contemporary DP practice strategy is reviewed according to selected articles published since 2012 and then summarized in a schematic flow chart. Last but not least, rationales that could have supported such practice are also discussed. For the sake of clarity, Fig. 1 summarizes the main ideas to be addressed in this article.

Section snippets

Typical characteristics of ATR-FTIR spectrum

ATR-FTIR spectroscopy is a powerful molecular spectroscopy technique and its advantages have been described by several references [6], [34], [35], [36], [37], [38], [39]. Fig. 2 illustrates relationship between ATR-FTIR spectroscopy and others similar techniques, of all are collectively known as vibrational spectroscopy. Theoretically, ATR-FTIR spectrum is resulted from interaction between IR light that penetrated into thin layer of surfaces of samples and chemical composition of the samples [6]

General concepts/definition

Informally, chemometrics refer to the group of tools or algorithms applied to process multivariate data acquired on chemical properties of samples via various analytical instruments [44], [45]. Spectroscopy and chromatography are two common analytical techniques used to characterize materials and presented the output data in the high-dimensional space. High dimensional data has always been a challenge for applied scientist to achieve goal of analysis easily. Over the past two decades, various

Status quo

Over the past few decades, the number of articles on ATR-FTIR spectroscopy coupled with chemometrics and related literature has been published at an exponential rate. However, DP that constitutes the first step in the chemometric analysis pipeline, does not seem to have been given considerable attention. In this section, the status quo of contemporary DP practice strategy is described based on selected works published since 2012. The respective literature summarized here is not exhaustive, but

Knowledge gaps

Two knowledge gaps from two different perspectives have been identified based on discussion in former section.

From the perspective of ATR-FTIR spectroscopy user, DP has never been considered important. Based on Table 1, we can see not all the ATR-FTIR spectroscopists habitually pre-processed their data beforehand, especially those attempted on qualitative analysis, e.g. spectral inspection. A few available published reviews devoted to DP methods always using NIR [14], [43] or Raman [32], [136]

Conclusions

We have discussed on the contemporary DP practice strategy, based on works using ATR-FTIR spectrum as input data. Some of the malpractices and good practices have been critically discussed. And the rationales that have been nurturing the unhealthy practices also have been presented. In conclusion, the contemporary DP practice strategy is under-developed and needs more contributions from various application fields to provide important insights towards achieving an established DP practice

Acknowledgements

The authors thank the UKM and the Malaysian Ministry of Higher Education for funding this work [grant no. FRGS/2/2013/ST06/UKM/02/1]. All the ATR-FTIR spectra collection was funded fully by the Forensic Laboratory PDRM, Cheras, Malaysia. The authors also would like to offer a special thank to Wan Nur Syazwani Wan Mohamad Fuad for her editorial support.

References (136)

  • J. Zhang et al.

    Characterization of post-mortem biochemical changes in rabbit plasma using ATR-FTIR combined with chemometrics: a preliminary study

    Spectrochim. Acta A Mol. Biomol. Spectrosc.

    (2017)
  • N. Cebi et al.

    An evaluation of Fourier transforms infrared spectroscopy method for the classification and discrimination of bovine, porcine and fish gelatins

    Food Chem.

    (2016)
  • J. Fahrenfort

    Attenuated total reflection: a new principle for the prediction of useful infrared reflection spectra of organic compounds

    Spectrochim. Acta

    (1961)
  • M.J. Anzanello et al.

    Selecting relevant Fourier transform infrared spectroscopy wavenumbers for clustering authentic and counterfeit drug samples

    Sci. Justice

    (2014)
  • A.M. Sila et al.

    Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties

    Chemom. Intell. Lab. Syst.

    (2016)
  • W. Dirwono et al.

    Application of micro-attenuated total reflectance FTIR spectroscopy in the forensic study of questioned document involving red seal inks

    Forensic Sci. Int.

    (2010)
  • A. Kher et al.

    Forensic classification of ballpoint pen inks using high performance liquid chromatography and infrared spectroscopy with PCA and LDA

    Vib. Spectrosc.

    (2006)
  • C.S. Silva et al.

    Classification of blue pen ink using infrared spectroscopy and linear discriminant analysis

    Microchem. J.

    (2013)
  • K.H. Liland

    Multivariate methods in metabolomics - from pre-processing to dimension reduction and statistical analysis

    TrAC -Trends Anal. Chem.

    (2011)
  • M. Blanco et al.

    Orthogonal signal correction in near infrared calibration

    Anal. Chim. Acta

    (2001)
  • K. Jetter et al.

    Principles and applications of wavelet transformation to chemometrics

    Anal. Chim. Acta

    (2000)
  • V.D. Hoang

    Wavelet-based spectral analysis

    TrAC Trends Anal. Chem.

    (2014)
  • J.-H. Cheng et al.

    Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen-thawed fish muscle

    Food Chem.

    (2016)
  • S.Y. Song et al.

    Sugar and acid content of Citrus prediction modelling using FT-IR fingerprinting in combination with multivariate statistical analysis

    Food. Chem.

    (2016)
  • F.J. Warren et al.

    Infrared spectroscopy as a tool to characterize starch ordered structure-a joint FTIR-ATR, NMR, XRD and DSC study

    Carb. Polym.

    (2016)
  • C.L. Pickering et al.

    Rapid discrimination of maggots utilising ATR-FTIR spectroscopy

    Forensic Sci. Int.

    (2015)
  • M.C.A. Marcelo et al.

    Profiling cocaine by ATR-FTIR

    Forensic Sci. Int.

    (2015)
  • C.-M. Orphanou

    The detection and discrimination of human body fluids using ATR-FTIR spectroscopy

    Forensic Sci. Int.

    (2015)
  • M.J. Anzanello et al.

    HATR-FTIR wavenumber selection for predicting biodiesel/diesel blends flash point

    Chemom. Intell. Lab. Syst.

    (2015)
  • D. Custers et al.

    ATR-FTIR spectroscopy and chemometrics: an interesting tool to discriminate and characterize counterfeit medicines

    J. Pharma. Biomed. Anal.

    (2015)
  • L. Dong et al.

    Evaluation of FTIR spectroscopy as diagnostic tool for colorectal using spectral analysis

    Spectrochim. Acta. A Mol. Biomol. Spectrosc.

    (2014)
  • A.B. Snyder et al.

    Rapid authentication of concord juice concentration in a grape juice blend using Fourier-transform infrared spectroscopy and chemometric analysis

    Food Chem.

    (2014)
  • M. Bassbasi et al.

    FTIR-ATR determination of solid non fat (SNF) in raw milk using PLS and SVM chemometric methods

    Food Chem.

    (2014)
  • E. Staniszewska et al.

    Rapid approach to analyse biochemical variation in rat organs by ATR-FTIR spectroscopy

    Spectrochim. Acta A Mol. Biomol. Spectrosc.

    (2014)
  • Y. Ge et al.

    Mid-Infrared attenuated total reflectance spectroscopy for soil carbon and particle size determination

    Geoderma

    (2014)
  • G.A. de Oliveira et al.

    Comparison of NIR and MIR spectroscopic methods for determination of individual sugars, organic acids and carotenoids in passion fruit

    Food Res. Int.

    (2014)
  • M. Khanmohammadi et al.

    Quality based classification of gasoline samples by ATR-FTIR spectrometry using spectral feature selection with quadratic discriminant analysis

    Fuel

    (2013)
  • A.M. Gomez-Caravaca et al.

    Fourier transform infrared spectroscopy-Partial Least Squares (FTIR-PLS) coupled procedure application for the evaluation of fly attach on olive oil quality

    LWT-Food Sci. Technol.

    (2013)
  • J. Haggarty et al.

    Recent advances in liquid and gas chromatography methodology for extending coverage of the metabolome

    Curr. Opin. Biotechnol.

    (2016)
  • R. Karoui et al.

    Mid-infrared spectroscopy coupled with chemometrics: a tool for the analysis of intact food systems and the exploration of their molecular structure-quality relationship-a review

    Chem. Rev.

    (2010)
  • U.P. Fringeli

    ATR and reflectance IR spectroscopy, applications

  • C.K. Muro et al.

    Vibrational spectroscopy: recent developments to revolutionize forensic science

    Anal. Chem.

    (2015)
  • C.A. Nunes

    Vibrational spectroscopy and chemometrics to assess authenticity, adulteration and intrinsic quality parameters of edible oils and fats

    Food Res. Int.

    (2013)
  • R.G. Brereton

    Chemometrics for Pattern Recognition

    (2009)
  • J. Trygg et al.

    Background estimation, denoising and preprocessing

  • S. Bijlsma et al.

    Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation

    Anal. Chem.

    (2006)
  • P. Geladi et al.

    An overview of chemometrics application in near infrared spectrometry

    J. Infrared Spec.

    (1995)
  • M.J. Baker et al.

    Using Fourier transform IR spectroscopy to analyze biological materials

    Nat. Protoc.

    (2014)
  • Cited by (122)

    • SERS combined with the SAE-CNN model for estimating apple rootstocks under heavy metal copper stress

      2024, Measurement: Journal of the International Measurement Confederation
    View all citing articles on Scopus
    View full text