Chromatographic fingerprinting by comprehensive two-dimensional chromatography: Fundamentals and tools

https://doi.org/10.1016/j.trac.2020.116133Get rights and content

Highlights

  • GC × GC(-MS) for effective chromatographic fingerprinting.

  • Different types of features enable effective and comprehensive fingerprinting.

  • Chromatographic fingerprinting gives access to a higher level of information.

  • Machine learning enables to fully exploit fingerprinting results.

  • Chromatographic fingerprinting by GC × GC-MS has intrinsic profiling potential.

Abstract

This contribution reviews state-of-the approaches for chromatographic fingerprinting of 2D peak patterns. Concepts of sample's fingerprint and profile, as established in metabolomics, are conceptually translated to comprehensive two-dimensional chromatography (C2DC) separations embracing the principles of biometric fingerprinting.

Approaches founded on this principle - referred to as chromatographic fingerprinting - are described and discussed for their information potential and limitations for providing a higher level of information about sample composition. The different type of features (i.e., datapoint, region, peak, and peak-region) are discussed and insights on processing tools and advances in the development of new algorithms are provided. Selected examples cover the most relevant application fields of GC × GC. Challenging scenarios with severe chromatographic misalignment, parallel detection, and translation of methods from thermal to differential-flow modulated GC × GC are also considered for their relevance in specific applications. Machine learning/chemometrics tools are briefly introduced, highlighting their fundamental role in supporting fingerprinting workflows.

Introduction

The terms ‘profiling’ and ‘fingerprinting’ have been adopted for metabolomics [1,2] to refer to distinct analytical approaches capable of informing about compositional differences between samples. For profiling, analytical platforms are set to provide detailed information (retention, mass spectrum, detector response, etc.) on qualitative and/or quantitative distributions of samples' components. Profiling can be conducted on a targeted basis [3], if analytes of interest are defined a priori and monitored across all samples. However, if the analytical process is capable of generating individual yet distinctive features for all components, the process can be conceptually extended toward a comprehensive evaluation of all detected constituents and referred to as “untargeted profiling” [4,5]. Fingerprinting, as defined by Fiehn [2], is a high-throughput process capable of unravelling compositional differences between samples, not necessarily achieving accurate quantitative data or compound identifications for all individual constituents. A fingerprint provides a comprehensive set of features ideally corresponding to all chemical constituents and aims to extract the non-evident chemical information included in the whole signal acquired from an analytical instrumental technique. This information mining process is carried out by application of statistical-mathematic tools of chemometric multivariate analysis. Note that chemometrics does not work magic. The information of concern to be mined must be previously embedded in the analytical signal, even if it is hidden to an observer, and the analytical methods to obtain that signal must be specifically designed and optimized keeping this crucial fact in mind.

Fingerprinting methodology can be effectively performed by different approaches:

  • 1.

    The fingerprint is directly obtained from the sample in its natural state without any pre-treatment except, if applicable, dissolution.

  • 2.

    The fingerprint is recorded from a particular fraction or family of compounds after a separation or fractionation step. Thus, the fingerprints would be specific of a compound family (e.g., the volatile organic compounds).

  • 3.

    The fingerprint is obtained after a chemical reaction step (e.g., derivatization), so that there is an alteration of the initial chemical composition of the sample and new compounds are produced (e.g., the fatty acids methyl esters).

A sample's fingerprint can be considered as a totally unspecific signal when the first approach is applied and a partially specific signal when the second and third approaches are employed.

In this sense, signals from spectroscopic techniques fit well with this definition; and nuclear magnetic resonance (NMR), chromatography, mass spectrometry (MS), and Fourier transform infrared spectroscopy (FT-IR) spectra are in fact the most popular fingerprinting methods in metabolomics [6]. Fingerprinting and related concepts have been extended to other fields, e.g.: foodomics [7], sensomics [8,9], nutrimetabolomics [10], and petroleomics [11]. With the rapid evolution of analytical techniques, more stable and informative multidimensional platforms now are readily available, offering further possibilities to develop the concept of fingerprinting.

Regarding analytical signals recorded by each analytical technique, there is a proper nomenclature for the different working data [6], based on the instrumental signals with different measuring setups, namely with different detection systems. In order to fully understand and to extract the relevant information of a signal, it is important to define and clarify the meaning of the terms usually employed during the step of treatment of data: dimension, way, order, vector, matrix, cube, tensor and array. The terms dimension, way and order refer to the type of signal acquired by the analytical instrument. Each analytical signal is described by a main dimension or way that is related with the signal intensity and one or more complementary dimensions or ways which characterize the position scores of each intensity value into the signal. The number of complementary dimensions defines the data order. A conventional chromatogram (e.g., 1D GC-FID) is an instance of a two-way signal (retention times and detector intensities) and constitutes a first-order data. Note that in the particular case of signals defined by two chromatographic dimensions with two retention times, the term 2D chromatogram is then applied which in turn is a three-way signal. The terms vector, matrix and cube are usually employed to name a mathematical layout where the working data are arranged once the acquired signal is exported from the instrument. For example, a vector denotes a first-order data (two-way signal), a matrix containing a second-order data (three-way signal), and a cube is used for third-order data. The term tensor is used to name collectively all of these. Finally, the term array should refer to a structure consisting of a set of tensors including the working data from a group of samples. Every array has an additional dimension, i.e., the ordinal number of each sample, with regard to the dimensionality of each sample data.

Usually the raw chromatographic signal exported from the instrument consists of several thousand intensity values and could be used as a whole to apply fingerprinting. However, the number of elements may be reduced by applying mathematical methods (e.g., resampling) or scientific-technical operations (e.g., computing peak areas). This strategy is typical of profiling. The reduction of the number of elements may reduce the dimensionality, e.g., obtaining a peak-response vector (first-order data) from a 2D chromatogram (second-order data), although this is not always applicable. A tutorial on analytical chromatographic fingerprinting is provided by Cuadros et al. [6].

Most multidimensional analytical (MDA) platforms, provide physico-chemical discrimination of a sample's constituents by chromatographic processes, e.g., gas chromatography (GC) and liquid chromatography (LC), accompanied by spectroscopic processes, e.g., MS, to achieve suitable specificity and selectivity thereby expanding discrimination potentials. When chromatography is conducted by comprehensively coupling two separation dimensions, as in the case of comprehensive two-dimensional chromatography (C2DC), the analytical output requires suitable processing to enable data visualization and interpretation.

In particular, in C2DC (e.g., GC × GC, LC × LC, or SFC × SFC), two columns are serially connected and components eluting from the first-dimension (1D) column are periodically trapped and on-line re-injected into a second-dimension (2D) column. In GC × GC, this operation is governed by a modulator, e.g., a thermal or valve-based focusing interface with a brief modulation time-period (PM), typically between 0.5 and 8 s. The detector, connected to the end of the 2D column, produces sequential data values that vary as a function of the quality/identity and amount of eluting analytes. An analog-to-digital (A/D) converter collects the signal output at a certain frequency and in a sequential order. Two-dimensional chromatogram visualization therefore is rendered by arranging data values from single modulation period (or cycle) as a column of pixels (picture elements) where each pixel corresponds to a single detector event. This process is known as rasterization. Pixel columns are sequenced along the abscissa (X-axis, left-to-right) according to 1D separation time and 2D data is presented in a right-handed Cartesian coordinate system, where the ordinate (Y-axis, bottom-to-top) corresponds to the 2D separation elapsed time [12].

2D peak patterns generated by C2DC can be treated as sample's unique fingerprint with detected compounds providing minutiae features to be used for effective cross-comparative analysis. The term minutiae derives from fingerprint recognition technology, exploited in forensic applications, where the term corresponds to ridge endings and ridge bifurcations on fingertips. Automatic biometric fingerprint verification systems localize and extract a set of minutiae from inked impressions, or detailed images of human fingertips, for cross-matching with stored templates [13].

By translating the concept of biometric fingerprinting into C2DC, any process that detects, re-aligns, and compares minutiae features extracted from 2D peaks patterns across a series of 2D chromatograms, can be classified as fingerprinting. Moreover, because, at the processing level, the 2D chromatographic fingerprint “contains unspecific and non-evident information which should be extracted by chemometric tools” [6], such an approach can be deemed “chromatographic fingerprinting”. This is in keeping with established views that chromatographic fingerprints refer “to the entire chromatogram from a certain test material which is distinctive of its composition” and that “chromatograms provide a specific and differentiating tool, as an identity card, which could be used in order to ‘identitate’ or identify a certain material” [6]. Fig. 1 illustrates how chromatographic signals can be processed according to fingerprinting or profiling principles to achieve a high level of information. The types of features available will be introduced at Section 3.

In this review, by following this conceptual track, data processing approaches and workflows that comply with the above-mentioned definition are presented, illustrated by selected applications, and critically discussed in view of their capabilities to provide higher levels of information. If 2D chromatographic signals [6], together with all their metadata, informing about components' identity and physico-chemical characteristics (retention times, detector response, spectral signatures, etc.), are subjected to chromatographic fingerprinting, the overall process achieves a truly comprehensive meaning.

Section snippets

Analytical platforms, dimensions of information available and fingerprinting specificity

To maximize the information achievable by 2D chromatographic fingerprinting, the analytical platform must be appropriately configured and sample preparation, the zeroth dimension of the system [14], should be tuned to avoid biases that compromise investigational meanings. Moreover, as stated by Fiehn [2], to access hidden information in metabolomics, fingerprinting should take into consideration that the resolution of the analytical devices must be high enough to handle critical information".

Data processing principles and tools

Here, our discussion of data processing focuses on feature extraction for pattern recognition (PR), but these data-analysis steps may require preprocessing such as for rasterization, modulation-phase adjustment, baseline correction, retention-times alignment, and peak detection. Some recent developments in these areas are discussed here as they relate to feature extraction and analysis, but several reviews discuss methodologies in these areas more comprehensively [12,[40], [41], [42], [43], [44]

Chromatographic fingerprinting with visual images and datapoint features

Comparative visualization is a chromatographic fingerprinting approach that enables prompt and intuitive evidence of compositional differences between samples pairs. It could be classified within datapoint features approaches since chromatograms pairs are compared pixel-by-pixel with or without pattern re-alignment or transformation. It has been applied to reveal differences in petrochemical applications [[93], [94], [95]], food [80,[96], [97], [98], [99]], body fluids metabolites composition [

Challenging scenarios

Chromatographic fingerprinting faces several challenges when severe misalignment occurs between the chromatograms of a set. As previously discussed, by template matching fingerprinting, retention times variations can be compensated by applying suitable transformations (see Section 3.4). However, severe misalignment might need analyst supervision in setting critical processing parameters. Stilo et al. [141] tackled pattern misalignment and detection inconsistencies, such as those occurring in

Machine learning for effective data exploration

Generally, PR with C2DC has proceeded with established methods rather than developing new methods. A fundamental division of PR is between supervised and unsupervised problems. For supervised PR, a training set of feature vectors with class labels (e.g., healthy or unhealthy) are provided; then, the training set is used to develop a method(s) to discern differences between classes. For unsupervised PR, methods must discern both natural groupings/clusters and differences between those clusters.

Concluding remarks

Chromatographic fingerprinting by C2DC is undoubtedly a profitable strategy for cross-comparative analysis of large set of samples with an almost comprehensive coverage of their constituent components. Dedicated data processing on instrumental fingerprint is necessary to extract meaningful high-level information from different types of features, while tackling issues related to retention times misalignment and MS detection inconsistencies.

Multidimensional analytical platforms combining

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Prof. Stephen E. Reichenbach has financial interests on GC Image, LLC. Dr. Federico Stilo, Dr. Ana M. Jimenez-Carvelo, Prof. Luis Cuadros-Rodriguez, Prof. Carlo Bicchi and Prof. Chiara Cordero declare no conflict of interest.

References (165)

  • H. Van De Weghe et al.

    Application of comprehensive two-dimensional gas chromatography for the assessment of oil contaminated soils

    J. Chromatogr., A

    (2006)
  • G. Semard et al.

    Comparative study of differential flow and cryogenic modulators comprehensive two-dimensional gas chromatography systems for the detailed analysis of light cycle oil

    J. Chromatogr., A

    (2011)
  • L. Nicolotti et al.

    Parallel dual secondary column-dual detection: a further way of enhancing the informative potential of two-dimensional comprehensive gas chromatography

    J. Chromatogr., A

    (2014)
  • J. Mommers et al.

    A procedure for comprehensive two-dimensional gas chromatography retention time locked dual detection

    J. Chromatogr., A

    (2016)
  • J.-J. Filippi et al.

    Qualitative and quantitative analysis of vetiver essential oils by comprehensive two-dimensional gas chromatography and comprehensive two-dimensional gas chromatography/mass spectrometry

    J. Chromatogr., A

    (2013)
  • C.E. Freye et al.

    Enhancing the chemical selectivity in discovery-based analysis with tandem ionization time-of-flight mass spectrometry detection for comprehensive two-dimensional gas chromatography

    J. Chromatogr., A

    (2018)
  • C. Cordero et al.

    Comprehensive two-dimensional gas chromatography coupled with time of flight mass spectrometry featuring tandem ionization: challenges and opportunities for accurate fingerprinting studies

    J. Chromatogr., A

    (2019)
  • C. Kulsing et al.

    Concepts, selectivity options and experimental design approaches in multidimensional and comprehensive two-dimensional gas chromatography

    TrAC - Trends Anal. Chem.

    (2020)
  • J.M. Cevallos-Cevallos et al.

    Metabolomic analysis in food science: a review

    Trends Food Sci. Technol.

    (2009)
  • J.T.V. Matos et al.

    Trends in data processing of comprehensive two-dimensional chromatography: state of the art

    J. Chromatogr. B Anal. Technol. Biomed. Life Sci.

    (2012)
  • K.M. Pierce et al.

    Review of chemometric analysis techniques for comprehensive two dimensional separations data

    J. Chromatogr., A

    (2012)
  • B.C. Reaser et al.

    Management and interpretation of capillary chromatography-mass spectrometry data

  • S.E. Reichenbach et al.

    Features for non-targeted cross-sample analysis with comprehensive two-dimensional chromatography

    J. Chromatogr., A

    (2012)
  • K.J. Johnson et al.

    Pattern recognition of jet fuels: comprehensive GC × GC with ANOVA-based feature selection and principal component analysis

    Chemometr. Intell. Lab. Syst.

    (2002)
  • K.M. Pierce et al.

    A principal component analysis based method to discover chemical differences in comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry (GC × GC-TOFMS) separations of metabolites in plant samples

    Talanta

    (2006)
  • K.M. Pierce et al.

    Pixel-level data analysis methods for comprehensive two-dimensional chromatography

    Data Handl. Sci. Technol.

    (2015)
  • L.C. Marney et al.

    Tile-based Fisher-ratio software for improved feature selection analysis of comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry data

    Talanta

    (2013)
  • K.M. Pierce et al.

    Pixel-level data analysis methods for comprehensive two-dimensional chromatography

    Data Handling Sci. Technol.

    (2015)
  • S.T. Chin et al.

    Review of the role and methodology of high resolution approaches in aroma analysis

    Anal. Chim. Acta

    (2015)
  • P.Q. Tranchida et al.

    Current state of comprehensive two-dimensional gas chromatography-mass spectrometry with focus on processes of ionization

    TrAC - Trends Anal. Chem.

    (2018)
  • K.L. Berrier et al.

    Advanced data handling in comprehensive two-dimensional gas chromatography

  • M.F. Almstetter et al.

    Comparison of two algorithmic data processing strategies for metabolic fingerprinting by comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry

    J. Chromatogr., A

    (2011)
  • T.F. Smith et al.

    Identification of common molecular subsequences

    J. Mol. Biol.

    (1981)
  • H.D. Bean et al.

    Improving the quality of biomarker candidates in untargeted metabolomics via peak table-based alignment of comprehensive two-dimensional gas chromatography-mass spectrometry data

    J. Chromatogr., A

    (2015)
  • B. Egert et al.

    A peaklet-based generic strategy for the untargeted analysis of comprehensive two-dimensional gas chromatography mass spectrometry data sets

    J. Chromatogr., A

    (2015)
  • A. Barcaru et al.

    Bayesian peak tracking: a novel probabilistic approach to match GCxGC chromatograms

    Anal. Chim. Acta

    (2016)
  • I.A. Titaley et al.

    Automating data analysis for two-dimensional gas chromatography/time-of-flight mass spectrometry non-targeted analysis of comparative samples

    J. Chromatogr., A

    (2018)
  • S.E. Reichenbach et al.

    Informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography and high-resolution mass spectrometry (GCxGC-HRMS)

    Talanta

    (2011)
  • H.G. Schmarr et al.

    Two-dimensional gas chromatographic profiling as a tool for a rapid screening of the changes in volatile composition occurring due to microoxygenation of red wines

    Anal. Chim. Acta

    (2010)
  • H.-G. Schmarr et al.

    Profiling analysis of volatile compounds from fruits using comprehensive two-dimensional gas chromatography and image processing techniques

    J. Chromatogr., A

    (2010)
  • M. Ni et al.

    Peak pattern variations related to comprehensive two-dimensional gas chromatography acquisition

    J. Chromatogr. A

    (2005)
  • K.M. Pierce et al.

    Classification of gasoline data obtained by gas chromatography using a piecewise alignment algorithm combined with feature selection and principal component analysis

    J. Chromatogr., A

    (2005)
  • W.P.H. De Boer et al.

    Two-dimensional semi-parametric alignment of chromatograms

    J. Chromatogr., A

    (2014)
  • Y. Zushi et al.

    Pixel-by-pixel correction of retention time shifts in chromatograms from comprehensive two-dimensional gas chromatography coupled to high resolution time-of-flight mass spectrometry

    J. Chromatogr., A

    (2017)
  • C. Couprie et al.

    BARCHAN: blob alignment for robust CHromatographic ANalysis

    J. Chromatogr., A

    (2017)
  • K.D. Nizio et al.

    Comprehensive multidimensional separations for the analysis of petroleum

    J. Chromatogr., A

    (2012)
  • F. Hilaire et al.

    Comprehensive two-dimensional gas chromatography for biogas and biomethane analysis

    J. Chromatogr., A

    (2017)
  • C. Cordero et al.

    Profiling food volatiles by comprehensive two-dimensional ga schromatography coupled with mass spectrometry: advanced fingerprinting approaches for comparative analysis of the volatile fraction of roasted hazelnuts (Corylus avellana L.) from different ori

    J. Chromatogr., A

    (2010)
  • G. Purcaro et al.

    Toward a definition of blueprint of virgin olive oil by comprehensive two-dimensional gas chromatography

    J. Chromatogr., A

    (2014)
  • D. Bressanello et al.

    Urinary metabolic fingerprinting of mice with diet-induced metabolic derangements by parallel dual secondary column-dual detection two-dimensional comprehensive gas chromatography

    J. Chromatogr., A

    (2014)
  • Cited by (42)

    View all citing articles on Scopus
    View full text