(1) Overview

Introduction

Plant cell wall biomass is composed of a range of different types of carbon-based compounds [, , ]. We can use the relative proportion of these carbon components to understand species characteristics, such as litter decomposition []. Traditional methods for estimating carbon components, particularly lignocellulosic biomass, involve wet chemistry assays [] that use sulfuric acid and acetic anhydride, among other chemicals. These chemicals, however, can adversely impact the environment and lead to loss of lignocellulose and other compounds []. Thermogravimetric analysis (TGA) is an alternative method, already in use among biofuel researchers, to approximate the proportions of these compounds in plants [, ]. In this method, we use mass loss data obtained by heating a biomass sample in an N2 environment, termed pyrolysis, to estimate the proportion of different carbon components in a sample.

Mass loss during complete pyrolysis is the sum of the degradation of the main components of the sample, often simplified to the three main components of lignocellulose: hemicelluloses, cellulose, and lignin [, , ]. The rate of mass loss is generally a multi-peaked curve, which can be mathematically separated into its constituent parts with a mixture model, in a process termed ‘deconvolution’ [, ]. The component peaks identified by the mixture model represent the proportion of initial mass lost by each component during pyrolysis. The integral, or area under the curve, of these peaks therefore gives us an estimate of the proportion of each component in the original sample. Carbon component estimation from deconvolution of thermogravimetric loss curves has been validated with estimates achieved with wet chemistry measurements [].

Most researchers who conduct thermogravimetric analysis use commercial software to deconvolve the rate of mass loss curves [for example OriginPro 3, PeakFit 18, Fityk 18, or Datafit ]. However, the majority of these proprietary software employ point-and-click interfaces that hinder independent replication of the deconvolution analysis. The inability to reproduce readily others’ experimental results using these software, a guiding principle of functional trait measurement [], might in part explain why thermogravimetric analysis has not been widely adopted by functional ecologists despite its proven promise [such as in marine and coastal macrophytes 25, and in eucalyptus trees ].

The mixchar package in R is an open-source tool for the deconvolution of thermal decay curves from thermogravimetric analysis. This tool improves upon existing software by making implicit mixture model choices, such as starting values and number of peaks to estimate, programmatic and transparent. Although the nonlinear mixture model used for peak separation at the core of this package could be used for many different purposes, our mixchar package provides specific guidelines for using thermal decay curve analysis to estimate carbon components in plant material. Detailed vignettes and several default plotting options are included in mixchar so that researchers interested in adopting this method can readily do so for the purpose of estimating carbon components in plant biomass samples.

Implementation and architecture

Litter collection and preparation

We collected the litter for development of this package from three freshwater wetlands surrounding Melbourne, Victoria (sites within 60 km of –37.455, 144.985). In the field, we placed the plant litter collected for this analysis in moist plastic bags and stored them in dark coolers until we could transport them to the lab where they were promptly dried at 60°C for 72 hours to ensure our component estimates were an accurate representation of the original composition of the litter samples we collected. We ground our dry litter to <40 μm using a Retsch Centrifugal Mill ZM200.

We pyrolysed 10–20 mg subsamples of dry, ground litter in an N2 environment from 30–800°C at a temperature ramp of 10°C/min using a Netzsch TGA-FTIR thermogravimetric analyser (Department of Biomedical Engineering, University of Melbourne).

We developed and tested the functions of our mixchar package using the thermogravimetric decay data of the litter of 29 different plant species. Two species from our data are available as datasets in the package — the freshwater reed Juncus amabilis (accessed as juncus) and the freshwater fern Marsilea drumondii (accessed as marsilea). The data resulting from the pyrolysis is mass loss (mg) against temperature (Figure 1).

Figure 1 

Mass across temperature for pyrolysis of Juncus amabilis.

library(mixchar)
head(juncus, n = 3)

## temp_C mass_loss
## 1 31.453 -0.000931
## 2 31.452 -0.001340
## 3 31.450 -0.001350

Deconvolution using mixchar

Rate of mass loss

After completing the thermogravimetric analysis, the resulting data can be loaded into R. Using mixchar, the process function calculates the rate of mass loss by taking the derivative of mass loss over temperature. The process() function needs the following dataset features: the initial mass of the sample, the name of the temperature data column, and the name of the mass column (mg). Since TGA-FTIR instruments can export data in variable units, the mass column can be specified either as mass loss data with the mass_loss argument or as mass data with the mass argument.


deriv_juncus <- process(juncus,
                           init_mass = 18.96,             # initial mass of sample
                           temp = 'temp_C',               # temperature data column name
                           mass_loss = 'mass_loss',       # mass loss data column name
                           temp_units = 'C')              # 'C' is the default setting
deriv_juncus

## Derivative thermogravimetry data (DTG) calculated for
## 768 datapoints from 31.5 to 798.52 degrees C.

The process function produces a modified dataframe, which includes the derivative thermogravimetric rate of mass loss data (DTG), the initial mass value that was supplied, and the maximum and minimum temperature values in the data. Plotting the output of the process function yields the mass of sample across temperature curve and the rate of mass loss curve (Figure 2). The rate of mass loss is a multi-peaked curve encompassing three main phases []:

  1. A short period with a pronounced peak of moisture evolution, up until approximately 120°C.
  2. A wide mid-range of high mass loss, caused by devolatilisation of primary biomass carbon components, between approximately 120–650°C.
  3. A final period of little mass loss when carbonaceous material associated with the inorganic fraction combusts, after approximately 650°C.
Figure 2 

Derivative thermogravimetric rate of mass loss across temperature, scaled by initial mass of sample for Juncus amabilis. Line segments 1, 2, and 3 represent mass loss phases.

Subset DTG data

Since the overall DTG curve represents the loss of extractives, water, inorganic matter, and volatiles in addition to the components in which we are interested [], we isolate mass loss from our primary biomass components by subsetting the DTG data to Phase 2. The deconvolve function defaults to temperature bounds at 120°C and 700°C, but these can be modified with the lower_temp and upper_temp arguments.

Non-linear mixture model

Biomass components combust relatively independently because they do not interact very much during thermal volatilisation []. Therefore, the subsetted DTG curve can be mathematically deconvolved into constituent parts using a mixture model. The derivative rate of mass loss equation (dmdT) can be expressed as the sum of n independent reactions (Eq. 1), as follows []:

(1)
dmdT=i=1ncidαidT
(2)
m=MTM0
(3)
ci=M0iMi
(4)
αi=M0iMTiM0iMi

where mass (m) is expressed as a fraction of mass at temperature T (MT) of the initial sample mass (M0) (Eq. 2), ci is the mass of component i that is decayed (Eq. 3), and the mass loss curve of each individual component (dαidT) is the derivative of αi, the conversion of mass at a given temperature (MTi), from the initial (M0i), as a proportion of total mass lost between the initial and final (Mi) temperature for each peak (Eq. 4).

Although the carbon distribution of many species can be described with only n = 3 peaks, corresponding to a single peak for each of hemicelluose, cellulose, and lignin, some litter samples yield a second hemicellulose peak at a lower temperature, resulting in n = 4 independent peaks. This is because the soluble carbohydrates in plant tissue can take many forms, including xylan, amylose, etc., which apparently degrade at different temperatures [see also 3, 15]. deconvolve() will decide whether three or four peaks are best using an internal function that determines if there is a peak below 220°C. Alternatively, upon inspection of the curve, users can specify the number of peaks with the n_peaks argument.

In order to fit the mixture model to the data, we must decide upon the shape of the individual peaks (dαidT) that are summed to produce it. Many different functions have been proposed: the asymmetric bi-Gaussian [], logistic [], Weibull [], asymmetric double sigmoidal [], and the Fraser-Suzuki function [, ]. Researchers have compared several techniques [, , ] and found that the Fraser-Suzuki function best fit these kinetic peaks. This is because the Fraser-Suzuki function allows for asymmetry (a parametric examination of the Fraser-Suzuki function can be found in Figure 3). We therefore use the Fraser-Suzuki function to describe the rate expression of a single peak (Eq. 5) as follows:

(5)
dαidT=hiexp{ln2si2[ln(1+2siTpiwi)]2}
Figure 3 

Parametric study of the Fraser-Suzuki function for deconvolution of derivative thermogravimetric biomass curves: Effect of modifying (a) height; (b) skew; (c) position; and (d) width.

where T is temperature (°C), and the parameters hiC–1), si, pi (°C), and wi (°C) are height, skew, position, and width of the peak, respectively. In total, our model estimates 12 or 16 parameters, one for each parameter of Eq. 5 for either three or four primary components.

Likelihood functions in mixture models have multiple maxima, and therefore expectation-maximisation algorithms are highly dependent on starting value selection [, ]. The vector of starting values for the 12 or 16 estimated parameters is based on curves depicted in the literature [] and from the results of running an identical deconvolution on pure cellulose (carboxy-methyl cellulose) and lignin (alkali lignin from Sigma Aldrich). Hemicelluloses decay in a reasonably narrow band beginning at a lower temperature [], so we use 270°C for position and 50°C for width. Linear cellulose crystals decay at a higher temperature, but decay more rapidly after peak temperatures are reached, so we set its starting position to 310°C and width to 30°C. Lignin typically decays beginning at a high temperature and over a wide interval [], so we begin position and width at 410°C and 200°C, respectively.

In an effort to ensure the same starting vector would be useful across a wide variety of different samples, we employ an extra optimisation step before fitting the model. The deconvolve function first optimises the given starting value vector with 300 restarts of the NLOPTR_LN_BOBYQA algorithm [] with the nloptr [] package. In this way, we can set the given starting value vector so that it works properly on a wide range of samples, and at the same time the starting values we ultimately give to the model are as close as possible to the global maxima for a given dataset.

To fit the non-linear mixture model, we send the optimised starting value vector to the nlsLM() function in the minpack.lm [] package, which uses the Levenberg-Marquardt algorithm to minimise residual sum of squares.

The default starting values and two-stage optimisation worked well for our thermogravimetric decay dataset of 29 plant species, encompassing herbaceous, graminoid, as well as woody species. Although this result is encouraging it is not altogether surprising because these data were pyrolysed using the same TGA-FTIR instrument. For this reason, the package was also tested on thermogravimetric data processed from a different instrument, as well as plants from marine ecosystems. Default settings produced well-fit curves for leaf samples from the seagrass species Thalassia testudinum, and rhizome and root samples from the seagrass species Zostera marina [Figure 4; data for both from 25].

Figure 4 

Mass loss and component estimation using default settings for test samples: (a)Thalassia testudinum leaves; (b)Zostera marina rhizome; and (c)Zostera marina root.

Despite the broad range of testing of mixchar, users may still find they need to explore the literature for reasonable estimates of starting values for their study species. Default settings did not, for example, redundant did not identify the fourth peak in the deconvolution of macroalgae species Ecklonia radiata blades [Figure 5a; ]. In this case, we can use the option to specify our own starting values, with the start_vec, lower_vec, and upper_vec arguments, in order to better guide the model (Figure 5b).

Figure 5 

Component estimation for Ecklonia radiata blades with default (a) and specified starting values (b).


# code given for reference only

start_vec <- c(0.002, -0.15, 250, 50,       # for hemicellulose 1
               0.003, -0.15, 310, 50,       # for hemicellulose 2
               0.006, -0.15, 350, 30,       # for cellulose
               0.001, -0.15, 410, 200)      # for lignin

# change the upper bounds to ensure the starting vector values are within
# the allowed range
ub <- c(2, 0.2, 260, 80,
        2, 0.2, 330, 90,
        2, 0.2, 380, 50,
        2, 0.2, 430, 250)

e.radiata_decon <- deconvolve(e.radiata,
                              n_peaks = 4,
                              start_vec = start_vec,
                              upper_vec = ub,
                              lower_temp = 150,
                              upper_temp = 600)
Component weights

After we fit our curve parameters, we can pass each component’s parameter estimates to a single Fraser-Suzuki function and integrate under the peak to calculate the weight of the component in the overall sample (Eq. 6). To estimate the uncertainty of the weight predictions, deconvolve will calculate the 95% interval of the weight estimates across a random sample of parameter estimates, drawn in proportion to their likelihood. We assume a truncated multivariate normal distribution, since the parameters are constrained to positive values, using the modelling package tmvtnorm [].

(6)
αi=120650hiexp{ln2si2[ln(1+2siTpiwi)]2}dT

We interpret that the peak located around 250–270°C corresponds to primary hemicelluloses (HC), around 310–330°C to cellulose (CL), and around 330–350°C to lignin (LG). If present, the fourth peak located below 200°C corresponds to the most simple hemicelluloses (HC-1). The second dataset included in the package, marsilea, provides an example of a four-peak deconvolution. A worked example can be found in the package vignettes.

Package outputs

The output of the deconvolve function is a list of five items:

  1. the dataset that results from the process function, useful for testing other modelling approaches or plotting options, and accessed with rate_data():
    DTG_data <- rate_data (output_juncus)
    head(DTG_data)
    
    ##        temp_C     derive    mass_T
    ## 5325 120.514 9.570652e-05 17.91630
    ## 5384 121.501 9.885901e-05 17.91445
    ## 5445 122.515 1.003878e-04 17.91252
    ## 5505 123.514 9.133606e-05 17.91079
    ## 5565 124.513 6.493836e-05 17.90956
    ## 5625 125.509 8.578618e-05 17.90794
  2. the temperature values at which the data were cropped for analysis, accessed with temp_bounds():
    temp_bounds(output_juncus)
    
    ## [1] 120 700
  3. the output of the mixture model. Peak 1 is hemicellulose; peak 2 is cellulose; and peak 3 is lignin. If present, the optional fourth peak located at the lowest temperature interval will be listed as peak 0. Accessed with model_fit():
    model_fit(output_juncus)
    
    ## Nonlinear regression model
    ##   model: deriv ~ fs_mixture (temp_C, height_1, skew_1, position_1,
    ##     width_1, height_2, skew_2, position_2, width_2, height_3,
    ##     skew_3, position_3, width_3)
    ##   data: dataframe
    ##   height_1    skew_1     position_1  width_1    height_2     skew_2
    ##   3.944e-03   1.258e-01  2.662e+02   5.106e+01  5.793e-03    1.344e-02
    ##   position_2  width_2    height_3    skew_3     position_3   width_3
    ##   3.173e+02   2.866e+01  1.163e-03   1.085e-01  3.300e+02    2.500e+02
    ##   residual sum-of-squares: 9.299e-06
    ##
    ## Number of iterations to convergence: 23
    ## Achieved convergence tolerance: 1.49e-08
  4. the number of peaks:
    output_juncus$n_peaks
    ## [1] 3
  5. and the mean, 2.5% and 97.5% estimates, median, and standard deviation of the weight of each component that can be accessed with component_weights():
    component_weights(output_juncus)
    
    ##           HC         CL         LG value_type
    ## 1 21.5600422 17.6748693 30.6629891       mean
    ## 2 20.4327310 16.6433643 29.5201899       2.5%
    ## 3 21.5980403 17.6367428 30.6535159        50%
    ## 4 22.7575067 18.6700545 31.8302178      97.5%
    ## 5 0.5978226 0.5128315 0.5914671            5%

Plotting

Plotting the output of the deconvolve function shows the underlying DTG data, the overall mixture model curve, as well as the component peaks of the deconvolution (Figure 6). The default plot is in black and white, but a colour version that uses colour-blind friendly viridis colours [] is available by specifying bw = FALSE.

Figure 6 

Deconvolution of Juncus amabilis example dataset. Mass loss data overlaid with output of deconvolution. Rate of mass loss scaled by initial mass of sample.

The Fraser-Suzuki family of functions are exported (Table 1) to allow users to create their own plots from the model outputs in conjunction with the parameter estimates, accessed as follows:

juncus_parameters <- model_parameters(output_
juncus)
juncus_parameters

##    parameter_name    parameter_value
## 1        height_1       3.944240e-03
## 2          skew_1       1.258171e-01
## 3      position_1       2.661764e+02
## 4         width_1       5.105925e+01
## 5        height_2       5.792848e-03
## 6          skew_2       1.344097e-02
## 7      position_2       3.172997e+02
## 8         width_2       2.866180e+01
## 9        height_3       1.162606e-03
## 10         skew_3       1.085210e-01
## 11     position_3       3.300000e+02
## 12        width_3       2.500000e+02

Table 1

Exported functions.


FUNCTION FAMILYFUNCTION NAMEDESCRIPTION

DatajuncusExample thermogravimetric data for Juncus amabilis

DatamarsileaExample thermogravimetric data for Marsilea drumondii

Basic useprocess()Calculates the derivative rate of mass loss of thermogravimetric data

Basic usedeconvolve()Deconvolves derivative rate of mass loss data

Accessor functiontemp_bounds()Access temperature bounds used to crop data for mixture model

Accessor functionrate_data()Access processed dataframe including mass loss, rate of mass loss, and temperature

Accessor functionmodel_fit()Access fit of nonlinear mixture model

Accessor functioncomponent_weights()Access mean, upper, and lower bounds for component weight estimates

Accessor functionmodel_parameters()Access parameter estimates

Fraser-Suzuki functionfs_function()Fraser-Suzuki equation for a single peak

Fraser-Suzuki functionfs_mixture()Fraser-Suzuki mixture model equation

Fraser-Suzuki functionfs_model()Non-linear model implementation of Fraser-Suzuki mixture model

S3 methodprint(<process>)Default print method for process object (derived from process())

S3 methodplot(<process>)Default plot method for process object (derived from process())

S3 methodprint(<deconvolve>)Default print method for decon object (derived from deconvolve())

S3 methodplot(<deconvolve>)Default plot method for process object (derived from deconvolve())

Quality control

All the functions of mixchar were tested to see if they produce the desired output. The workflow was tested on thermogravimetric data from two different TGA-FTIR instruments, and on samples outside those used to build the package.

The structure of the package successfully passed the CRAN R CMD check with no errors or warnings, or notes and the results from this check can be found on CRAN.

(2) Availability

Operating system

The package was tested on Windows, Mac OS X, and Linux.

Programming language

R version 3.2.0 or higher.

Additional system requirements

An internet connection is required to install the mixchar package.

Dependencies

R packages: graphics, minpack.lm, nloptr, stats, tmvtnorm, zoo.

List of contributors

This package was created by Saras Windecker and Nick Golding.

Software location

Archive

Name: CRAN

Persistent identifier: https://CRAN.R-project.org/package=mixchar

Licence: MIT and open license as found on https://cran.r-project.org/web/packages/mixchar/LICENSE

Publisher: Saras Windecker

Version published: 0.1.0

Date published: 16/08/2018

Code repository

Name: Github

https://github.com/smwindecker/mixchar/

Persistent identifier: DOI: 10.5281/zenodo.1343849

Licence: MIT and open license as found on

https://github.com/smwindecker/mixchar/releases/tag/v0.1.0.

Date published: 11/08/2018

Language

R

(3) Reuse potential

This package was designed with both the user and developer in mind. There are several vignettes available with the package and on the package website (https://smwindecker.github.io/mixchar/) facilitating exploration of package functionality. We expect that this package will be useful to researchers already using thermogravimetric analysis for biomass component estimation, as well as to functional ecologists seeking to test out this approach as an alternative to wet chemistry methods. For all users, this method improves on most current software available to them, as it is fully open-source and transparent.

Finite mixture models are used to cluster continuous multivariate data. Statistical inference of mixture models is notoriously difficult because of their flexibility []. This is especially true for the Fraser-Suzuki function, which has an additional parameter compared to a Gaussian distribution. Many combinations of peaks can create the same overall derivative thermogravimetric curve, and so informed starting values are necessary as they can substantially affect fit. To use mixchar well, we need in some cases to modify the default starting values.

For those who wish to contribute to the package, it is hosted on Github. Contributors can log issues, for example concerning alternative data formats, via the issues tracker (https://github.com/smwindecker/mixchar/issues) or submit a pull request to add functionality to the package.