A contemporary review on Data Preprocessing (DP) practice strategy in ATR-FTIR spectrum
Graphical abstract
Introduction
Over the past two decades, technological knowledges have been evolving so rapidly and contributing to production of High Dimensionality (HD) data in various knowledge disciplines [1], [2], [3], [4], [5]. Technological advancements have made collection of analytical data from a tiny sample possible and feasible within such a short period of time [6], [7], [8], [9]. Nonetheless, the technological advancements resemble a two-bladed knife, at the same time, such cutting-edge analytical instruments tend to produce data which cannot be readily analyzed and interpreted so to achieve the targeted goal of analysis [10], [11]. Data preprocessing (DP) which is also known as data pre-treatment methods are used to remove or reduce unwanted signals from the HD data prior to modeling analysis. As such, DP step is always located right after data collection or acquisition steps in the chemometric pipeline for analytical data. An improper selection of DP methods may negatively affecting the model accuracy and interpretability [12], [13]. The vital roles of DP methods have been discussed by numerous sources of books and references that are available in the literature [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22].
Vibrational spectroscopy instruments including Raman, NIR and MIR spectroscopy, have been coupling with chemometric algorithms in accomplishing different analytical tasks [5], [8], [9], [15], [16], [17]. Recently, ATR-FTIR spectroscopy is preferred over transmission FTIR spectroscopy, in diverse field of application [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. The replacement is credited to its non-destructiveness, ease of application and relatively low analysis cost as well as rapid analysis time [6]. Following that, plenty of papers have been published in diverse application fields with the aim to “develop methods to class or differentiate or identify a particular samples by using ATR-FTIR spectra combined with chemometrics” [23], [24], [25], [26], [27], [28], [29], [30], [31]. However, most of these papers has not allocated considerable efforts to systematically select and assess DP methods, prior to modeling. The importance of proper selection of DP methods have been ignored that the user tends to just follows conventional choices of DP methods or shortlisted a few DP methods intuitively. We shall discuss on this matter more in the following section.
To date, a few reports have been reviewed on the application impacts of DP methods in HD data [10], [14], [15], but only one is devoted to DP evaluation tools [12].To the best of our knowledge, no paper is discussing on the DP practice strategy. On the other hand, most review works or tutorials related to DP methods has always been using NIR data [e.g.14] or Raman [e.g. [22], [32]] data to demonstrate its practical aspects. It is hardly found any work which is addressing the impacts of DP methods using ATR-FTIR spectrum as practical examples. Part of motivation in writing this article comes from the first author’s experience after applying chemometrics tools to solve ATR-FTIR spectrum-based problem from the context of forensic science [33], who hardly find any comprehensive references with respect to strategy that could be adopted for selection of DP methods. Thus, this work will be the first ever review on the novel aspect of DP, i.e. DP practice strategy, using ATR-FTIR spectrum as practical example, based on selected papers published since 2012.
In the subsequent sections, basic concepts of the two core subjects of concern, i.e. ATR-FTIR spectroscopy and chemometrics, will be briefly explained. Following that, status quo of contemporary DP practice strategy is reviewed according to selected articles published since 2012 and then summarized in a schematic flow chart. Last but not least, rationales that could have supported such practice are also discussed. For the sake of clarity, Fig. 1 summarizes the main ideas to be addressed in this article.
Section snippets
Typical characteristics of ATR-FTIR spectrum
ATR-FTIR spectroscopy is a powerful molecular spectroscopy technique and its advantages have been described by several references [6], [34], [35], [36], [37], [38], [39]. Fig. 2 illustrates relationship between ATR-FTIR spectroscopy and others similar techniques, of all are collectively known as vibrational spectroscopy. Theoretically, ATR-FTIR spectrum is resulted from interaction between IR light that penetrated into thin layer of surfaces of samples and chemical composition of the samples [6]
General concepts/definition
Informally, chemometrics refer to the group of tools or algorithms applied to process multivariate data acquired on chemical properties of samples via various analytical instruments [44], [45]. Spectroscopy and chromatography are two common analytical techniques used to characterize materials and presented the output data in the high-dimensional space. High dimensional data has always been a challenge for applied scientist to achieve goal of analysis easily. Over the past two decades, various
Status quo
Over the past few decades, the number of articles on ATR-FTIR spectroscopy coupled with chemometrics and related literature has been published at an exponential rate. However, DP that constitutes the first step in the chemometric analysis pipeline, does not seem to have been given considerable attention. In this section, the status quo of contemporary DP practice strategy is described based on selected works published since 2012. The respective literature summarized here is not exhaustive, but
Knowledge gaps
Two knowledge gaps from two different perspectives have been identified based on discussion in former section.
From the perspective of ATR-FTIR spectroscopy user, DP has never been considered important. Based on Table 1, we can see not all the ATR-FTIR spectroscopists habitually pre-processed their data beforehand, especially those attempted on qualitative analysis, e.g. spectral inspection. A few available published reviews devoted to DP methods always using NIR [14], [43] or Raman [32], [136]
Conclusions
We have discussed on the contemporary DP practice strategy, based on works using ATR-FTIR spectrum as input data. Some of the malpractices and good practices have been critically discussed. And the rationales that have been nurturing the unhealthy practices also have been presented. In conclusion, the contemporary DP practice strategy is under-developed and needs more contributions from various application fields to provide important insights towards achieving an established DP practice
Acknowledgements
The authors thank the UKM and the Malaysian Ministry of Higher Education for funding this work [grant no. FRGS/2/2013/ST06/UKM/02/1]. All the ATR-FTIR spectra collection was funded fully by the Forensic Laboratory PDRM, Cheras, Malaysia. The authors also would like to offer a special thank to Wan Nur Syazwani Wan Mohamad Fuad for her editorial support.
References (136)
- et al.
Recent advances in capillary electrophoresis instrumentation for the analysis of explosives
TrAC-Trends Anal. Chem.
(2016) IR and raman spectroscopies, the study of art works
- et al.
Breaking with trends in pre-processing?
TrAC-Trends Anal. Chem.
(2013) - et al.
Review of the most common pre-processing techniques for near-infrared spectra
TrAC-Trend. Anal. Chem.
(2009) Chemometrics in spectroscopy. Part 1. Classical chemometrics
Spectrochim. Acta. B
(2003)- et al.
Chemometrics in spectroscopy. Part 2. Examples
Spectrochim. Acta. B
(2004) - et al.
Chemometrics tools used in analytical chemistry: an overview
Talanta
(2014) - et al.
Determination of chemical changes in heat-treated wood using ATR-FTIR and FT Raman spectrometry
Spectrochim. Acta A Mol. Biomol. Spectrosc.
(2017) - et al.
Identification and classification of textile fibers using ATR-FT-IR spectroscopy with chemometric methods
Spectrochim. Acta A Mol. Biomol. Spectrosc.
(2017) - et al.
PLS-LS-SVM based modelling of ATR-IR as a robust method in detection and qualification of alprazolam
Spectrochim. Acta A Mol. Biomol. Spectrosc.
(2017)
Characterization of post-mortem biochemical changes in rabbit plasma using ATR-FTIR combined with chemometrics: a preliminary study
Spectrochim. Acta A Mol. Biomol. Spectrosc.
An evaluation of Fourier transforms infrared spectroscopy method for the classification and discrimination of bovine, porcine and fish gelatins
Food Chem.
Attenuated total reflection: a new principle for the prediction of useful infrared reflection spectra of organic compounds
Spectrochim. Acta
Selecting relevant Fourier transform infrared spectroscopy wavenumbers for clustering authentic and counterfeit drug samples
Sci. Justice
Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties
Chemom. Intell. Lab. Syst.
Application of micro-attenuated total reflectance FTIR spectroscopy in the forensic study of questioned document involving red seal inks
Forensic Sci. Int.
Forensic classification of ballpoint pen inks using high performance liquid chromatography and infrared spectroscopy with PCA and LDA
Vib. Spectrosc.
Classification of blue pen ink using infrared spectroscopy and linear discriminant analysis
Microchem. J.
Multivariate methods in metabolomics - from pre-processing to dimension reduction and statistical analysis
TrAC -Trends Anal. Chem.
Orthogonal signal correction in near infrared calibration
Anal. Chim. Acta
Principles and applications of wavelet transformation to chemometrics
Anal. Chim. Acta
Wavelet-based spectral analysis
TrAC Trends Anal. Chem.
Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen-thawed fish muscle
Food Chem.
Sugar and acid content of Citrus prediction modelling using FT-IR fingerprinting in combination with multivariate statistical analysis
Food. Chem.
Infrared spectroscopy as a tool to characterize starch ordered structure-a joint FTIR-ATR, NMR, XRD and DSC study
Carb. Polym.
Rapid discrimination of maggots utilising ATR-FTIR spectroscopy
Forensic Sci. Int.
Profiling cocaine by ATR-FTIR
Forensic Sci. Int.
The detection and discrimination of human body fluids using ATR-FTIR spectroscopy
Forensic Sci. Int.
HATR-FTIR wavenumber selection for predicting biodiesel/diesel blends flash point
Chemom. Intell. Lab. Syst.
ATR-FTIR spectroscopy and chemometrics: an interesting tool to discriminate and characterize counterfeit medicines
J. Pharma. Biomed. Anal.
Evaluation of FTIR spectroscopy as diagnostic tool for colorectal using spectral analysis
Spectrochim. Acta. A Mol. Biomol. Spectrosc.
Rapid authentication of concord juice concentration in a grape juice blend using Fourier-transform infrared spectroscopy and chemometric analysis
Food Chem.
FTIR-ATR determination of solid non fat (SNF) in raw milk using PLS and SVM chemometric methods
Food Chem.
Rapid approach to analyse biochemical variation in rat organs by ATR-FTIR spectroscopy
Spectrochim. Acta A Mol. Biomol. Spectrosc.
Mid-Infrared attenuated total reflectance spectroscopy for soil carbon and particle size determination
Geoderma
Comparison of NIR and MIR spectroscopic methods for determination of individual sugars, organic acids and carotenoids in passion fruit
Food Res. Int.
Quality based classification of gasoline samples by ATR-FTIR spectrometry using spectral feature selection with quadratic discriminant analysis
Fuel
Fourier transform infrared spectroscopy-Partial Least Squares (FTIR-PLS) coupled procedure application for the evaluation of fly attach on olive oil quality
LWT-Food Sci. Technol.
Recent advances in liquid and gas chromatography methodology for extending coverage of the metabolome
Curr. Opin. Biotechnol.
Mid-infrared spectroscopy coupled with chemometrics: a tool for the analysis of intact food systems and the exploration of their molecular structure-quality relationship-a review
Chem. Rev.
ATR and reflectance IR spectroscopy, applications
Vibrational spectroscopy: recent developments to revolutionize forensic science
Anal. Chem.
Vibrational spectroscopy and chemometrics to assess authenticity, adulteration and intrinsic quality parameters of edible oils and fats
Food Res. Int.
Chemometrics for Pattern Recognition
Background estimation, denoising and preprocessing
Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation
Anal. Chem.
An overview of chemometrics application in near infrared spectrometry
J. Infrared Spec.
Using Fourier transform IR spectroscopy to analyze biological materials
Nat. Protoc.
Cited by (122)
Analytical chemistry meets art: The transformative role of chemometrics in cultural heritage preservation
2024, Chemometrics and Intelligent Laboratory SystemsSERS combined with the SAE-CNN model for estimating apple rootstocks under heavy metal copper stress
2024, Measurement: Journal of the International Measurement ConfederationExploring the scores: Procrustes analysis for comprehensive exploration of multivariate data
2023, Chemometrics and Intelligent Laboratory SystemsA novel infrared spectral preprocessing method based on self-deconvolution and differentiation in the frequency domain
2023, Vibrational Spectroscopy