pcr: an R package for quality assessment, analysis and testing of qPCR data

Mahmoud Ahmed; Deok Ryong Kim

doi:10.7717/peerj.4473

pcr: an R package for quality assessment, analysis and testing of qPCR data

Department of Biochemistry and Convergence Medical Sciences and Institute of Health Sciences, Gyeongsang National University School of Medicine, Jinju, Gyeongnam, South Korea

DOI: 10.7717/peerj.4473

Published: 2018-03-16
Accepted: 2018-02-17
Received: 2017-12-19

Academic Editor: Sahil Seth

Subject Areas: Bioinformatics, Cell Biology, Computational Biology, Molecular Biology
Keywords: qPCR, R package, Data analysis, Quality assessment

Copyright: © 2018 Ahmed and Kim
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Ahmed M, Kim DR. 2018. pcr: an R package for quality assessment, analysis and testing of qPCR data. PeerJ 6:e4473 https://doi.org/10.7717/peerj.4473

The authors have chosen to make the review history of this article public.

Abstract

Background

Real-time quantitative PCR (qPCR) is a broadly used technique in the biomedical research. Currently, few different analysis models are used to determine the quality of data and to quantify the mRNA level across the experimental conditions.

Methods

We developed an R package to implement methods for quality assessment, analysis and testing qPCR data for statistical significance. Double Delta C_T and standard curve models were implemented to quantify the relative expression of target genes from C_T in standard qPCR control-group experiments. In addition, calculation of amplification efficiency and curves from serial dilution qPCR experiments are used to assess the quality of the data. Finally, two-group testing and linear models were used to test for significance of the difference in expression control groups and conditions of interest.

Results

Using two datasets from qPCR experiments, we applied different quality assessment, analysis and statistical testing in the pcr package and compared the results to the original published articles. The final relative expression values from the different models, as well as the intermediary outputs, were checked against the expected results in the original papers and were found to be accurate and reliable.

Conclusion

The pcr package provides an intuitive and unified interface for its main functions to allow biologist to perform all necessary steps of qPCR analysis and produce graphs in a uniform way.

Introduction

Real-time quantitative PCR (qPCR) is a commonly used technique to analyze the relative gene expression level in the biomedical research. In most cases, small scale experiments are designed to quantify the level of mRNA among experimental conditions. Some advanced machines and optimized protocols have simplified the experiments to a highly efficient one-step process, allowing the effective analysis of a large scale of qPCR data. However, all processes for assessing the quality of the data, performing the analysis and reporting the results are not done in the most uniform way across the literature (Bustin & Nolan, 2004). Different analysis models have been proposed and implemented in different software environments (Pabinger et al., 2014). Furthermore, requirements and guidelines for reporting qPCR data were independently introduced (Bustin et al., 2009).

In this report, we introduce an open source R package for performing quality assessment, modeling and testing for statistical significance of qPCR data in a uniform way. In its current version, the pcr package implement two methods for relative quantification of mRNA expression proposed originally by Livak & Schmittgen (2001), in addition to the necessary steps to check the assumption of these methods. Also, we implement a number of methods to check for statistical significance in qPCR data which were introduced in SAS by Yuan et al. (2006). Finally, the package provides unified interface to make the analysis accessible and the ability to make graphs of the different analysis steps for visual inspection and preparation of publication-level figures. We start by describing the process for generating the data in the original papers, briefly introduce the methods and apply them to the original data using the pcr.

Materials & Methods

Data sources

To illustrate the usage of the pcr package and to apply it to qPCR data, we used real qPCR datasets from two published papers. In addition, we compared the results obtained by the pcr package to that of the original paper to ensure the reliability. In their original article, Livak & Schmittgen (2001) collected total RNA from human tissues; brain and kidney. c-myc and GAPDH primers were then used for cDNA synthesis and used as input in the PCR reaction. Seven different dilutions where used as input to the PCR reaction (three replicates each), this dataset was referred to as ct3 and shown in Table 1. Six replicates for each tissue were run in separate tubes. This dataset was referred to as ct1 through this document and shown along with the difference calculations in Tables 2 and 3. At the second work, Yuan et al. (2006) extracted total RNA from Arabidopsis thaliana plant treated and control samples (24 samples each), and performed qPCR analyses using MT7 and ubiquitin primers. This dataset was referred to as ct4 and shown the results of the different testing methods that applied in the original paper in Table 4. In all datasets, the raw C_T (Cycle Threshold), also known as C_q, was recorded (Bustin et al., 2009) and will be used as input to the pcr analysis pipeline.

Table 1:

Average C_T value for c-myc and GAPDH at different input amounts.

Input RNA (ng)	c-myc average C_T	GAPDH average C_T	ΔC_T c-myc-GAPDH
1.0	25.59 ± 0.04	22.64 ± 0.03	2.95 ± 0.05
0.5	26.77 ± 0.09	23.73 ± 0.05	3.04 ± 0.10
0.2	28.14 ± 0.05	25.12 ± 0.10	3.02 ± 0.11
0.1	29.18 ± 0.13	26.16 ± 0.02	3.01 ± 0.13
0.05	30.14 ± 0.03	27.17 ± 0.06	2.97 ± 0.07
0.02	31.44 ± 0.16	28.62 ± 0.10	2.82 ± 0.19
0.02	32.42 ± 0.12	29.45 ± 0.08	2.97 ± 0.14

DOI: 10.7717/peerj.4473/table-1

Table 2:

Relative quantification using comparative (ΔΔC_T) method (separate tubes).

Tissue	c-myc C_T	GAPDH C_T	ΔC_T c-myc-GAPDH	ΔΔC_TΔC_T − ΔC_T,Brain	c-myc_N Rel. to Brain
Brain	30.72	23.7
	30.34	23.56
	30.58	23.47
	30.34	23.65
	30.5	23.69
	30.43	23.68
Average	30.49 ± 0.15	23.63 ± 0.09	6.86 ± 0.17	0.00 ± 0.17	1.0 (0.9–1.1)
Kidney	27.06	22.76
	27.03	22.61
	27.03	22.62
	27.1	22.6
	26.99	22.61
	26.94	24.18
Average	27.03 ± 0.06	22.66 ± 0.08	4.37 ± 0.10	−2.50 ± 0.10	5.6 (5.3–6.0)

DOI: 10.7717/peerj.4473/table-2

Table 3:

Relative quantification using the standard curve method (separate tube).

Tissue	c-myc (ng)	GAPDH (ng)	c-myc_N norm. to GAPDH	c-myc_N Rel. to Brain
Brain	0.033	0.51
	0.043	0.56
	0.036	0.59
	0.043	0.53
	0.039	0.51
	0.040	0.52
Average	0.039 ± 0.004	0.54 ± 0.034	0.07 ± 0.008	1.0 ± 0.12
Kidney	0.40	0.96
	0.41	1.06
	0.41	1.05
	0.39	1.07
	0.42	1.06
	0.43	0.96
Average	0.41 ± 0.016	1.02 ± 0.052	0.40 ± 0.025	5.5 ± 0.35

DOI: 10.7717/peerj.4473/table-3

Table 4:

Statistical significance using different testing methods.

Test	ΔΔC_T (estimate)	p-value	Confidence interval
Multiple regression	−0.6848	<0.0001	(−0.4435, −0.9262)
ANOVA	−0.6848	<0.0001	(−0.4435, −0.9262)
t-test	−0.6848	<0.0001	(−0.4147, −0.955)
Wilcoxon test	−0.6354	<0.0001	(−0.4227, −0.8805)

DOI: 10.7717/peerj.4473/table-4

Statistical methods

In contrast with the absolute quantification of the amount of mRNA in a sample, the relative quantification uses a internal control (reference gene) and/or a control group (reference group) to quantify the mRNA of interest relative to these references. This relative quantification was sufficient to draw conclusions in most of the biomedical applications involving qPCR. A few methods were developed to perform these relative quantification. These methods generally require different assumptions and models for the analysis. The most common two of these methods were described here in the following sections.

The comparative C_T methods

The comparative C_T methods assume that the cDNA templates of the gene/s of interest as well as the control/reference gene have similar amplification efficiency, and also that this amplification efficiency is near perfect. This means that, at a certain threshold during the linear portion of the PCR reaction, the amount of the gene of the interest and the control double each cycle. Another assumption is that the expression difference between two genes or two samples can be captured by subtracting one (gene or sample of interest) from another (reference). The final assumption is that the reference doesn’t change with the treatment or the course in question. The formal derivation of the double delta C_T model is described here Livak & Schmittgen (2001). Briefly, the ΔΔC_T is given by: (1) $Δ Δ C_{T} = Δ C_{T, q} - Δ C_{T, c b}$

And the relative expression by: (2) $2^{- Δ Δ C_{T}}$ where ΔC_T,q is the difference in the C_T (or their average) of a gene of interest and a reference gene in a group of interest. ΔC_T,cb is the the difference in the C_T (or their average) of a gene of interest and a reference gene in a reference group. The error term is given by: (3) $s = \sqrt{s_{1}^{2} + s_{2}^{2}}$ where s₁ is the standard deviation of a gene of interest and s₂ is the standard deviation of a reference gene.

Standard curve

In comparison, this model doesn’t assume perfect amplification but rather actively use the amplification in calculating the relative expression. Therefore, when the amplification efficiency of all genes are 100% both methods should give similar results. The standard curve method is applied using two steps. First, serial dilutions of the mRNAs from the samples of interest are used as input to the PCR reaction. The linear trend of the log input amount and the resulting C_T values for each gene are used to calculate an intercept and a slope. Secondly, these intercepts and slopes are used to calculate the amounts of mRNA of the genes of interest and the control/reference in the samples of interest and the control sample/reference. These amounts are finally used to calculate the relative expression in a manner similar to the later method, just using division instead of subtraction. The formal derivation of the model is described here (Yuan et al., 2006). Briefly, the amount of RNA in a sample is given by: (4) $log amount = \frac{C_{T} - b}{m} .$

And the relative expression is given by: (5) $1 0^{log amount}$ where C_T is the cycle threshold of a gene. b is the intercept of C_T log10 input amount. m is the slope of C_T. And the error term is given by: (6) $s = (c v) (\bar{X})$ where: (7) $c v = \sqrt{c v_{1}^{2} + c v_{2}^{2}}$ where s is the standard deviation. $\bar{X}$ is the average. cv is the coefficient of variation or relative standard deviation.

Statistical significance tests

Assuming that the assumptions of the first methods are holding true, the simple t-test can be used to test the significance of the difference between two conditions (ΔC_T). t-test assumes, in addition, that the input C_T values are normally distributed and the variance between conditions are comparable. Wilcoxon test can be used if sample size is small, and those two last assumptions are hard to achieve.

Two use the linear regression here. A null hypothesis is formulated as following, (8) $C_{T, target,treatment} - C_{T, control,treatment} = C_{T, target,control} - C_{T, control,control} .$

This is exactly the ΔΔC_T value as explained earlier. So the ΔΔC_T is estimated and the null is rejected when ΔΔC_T ≠ 0.

Quality assessment

Fortunately, regardless of the method used in the analysis of qPCR data, The quality assessment can be done in a similar way. It requires an experiment similar to that of calculating the standard curve. Serial dilutions of the genes of interest and controls are used as input to the reaction and different calculations are made. The amplification efficiency is approximated be the linear trend between the difference between the C_T value of a gene of interest and a control/reference (ΔC_T) and the log input amount. This piece of information is required when using the ΔΔC_T model. Typically, the slope of the curve should be very small and the R² value should be very close to one. A value of the amplification efficiency itself is given by 10^−1∕slope and should be close to 2. Other analysis methods are recommended when this is not the case. Similar curves are required for each gene using the C_T value instead of the difference for applying the standard curve method. In this case, a separate slope and intercept for each gene are required for the calculation of the relative expression.

Results & Discussion

Availability & installation

The pcr packages is available on CRAN, the main repository for R packages and can be installed by invoking install.packages in an R (≥3.4.2) session. The package’s source code is also available on github, https://github.com/MahShaaban/pcr along with the development version.

 
 
# install  the  pcr  package  from  CRAN 
install.packages('pcr')

The examples shown in this article are explained in greater details in the package vignette that can be accessed through browseVignette(’pcr’). Moreover, the package documentation provides detailed instruction on the input and the output of each function (e.g., ?pcr_analyze).

Functionality & user interface

The pcr package provides different methods for performing quality assessment, modeling and testing real-time qualitative PCR data through the unified interface of three functions: pcr_assess, pcr_analyze and pcr_test, respectively.

Quality assessment

pcr_assess provides two methods for assessing the quality of qPCR data. These are ‘efficiency’ and ‘standard_curve’ to calculate the amplification efficiency and gene standard curves as described in the methods section. The following code block applies both methods to the dataset ct3, shown in Table 1. Using the argument plot as TRUE in the pcr_assess function provides the a graphic presentation of the amplification and the standard curves as shown in Fig. 1.

Figure 1: Amplification efficiency and standard curves of c-myc and GAPDH.
Seven different dilutions of RNA were used as an input to synthesize cDNA, then to a real-time quantitative PCR reaction with c-myc and GAPDH primers. (A) ΔC_T values were calculated by subtracting the C_T values from three independent samples of the control gene(GAPDH) from the target c-myc. Averages and standard deviations are shown as points and error bars. The blue line represents the linear trend between the ΔC_T and log10 of the input amount. (B) C_T values from three independent samples of c-myc and GAPDH are shown with the corresponding log10 input amounts.

Download full-size image

DOI: 10.7717/peerj.4473/fig-1

 
 
# load  required  libraries 
library(pcr) 
library(ggplot2) 
library(cowplot) 
library(dplyr) 
library(xtable) 
library(readr)

 
 
#  pcr_assess 
##  locate  and  read  data 
fl <− system.file('extdata', 'ct3.csv', package = 'pcr') 
ct3 <− read_csv(fl) 
 
## make a vector  of RNA  amounts 
amount <− rep(c(1, .5, .2, .1, .05, .02, .01),  each = 3) 
 
##  calculate  amplification  efficiency 
res1 <− pcr_assess(ct3, 
                      amount = amount, 
                      reference_gene  = 'GAPDH', 
                      method = 'efficiency') 
 
##  calculate  standard  curves 
res2 <− pcr_assess(ct3, 
                      amount = amount, 
                      method = 'standard_curve ') 
 
##  retain  curve  information 
intercept <− res2$intercept 
slope <− res2$slope

Analysis models

Similarly, pcr_analyze provides two methods to model the C_T values and calculates the relative expression of target genes. ’delta_delta_ct’ performs the ΔΔC_T method described previously. The average relative expression of the target gene in the condition of interest is given by the Eqs. (1) and (2) and the standard deviation by Eq. (3). The calculations are applied to the dataset ‘ct1’, shown in Table 2 and Fig. 2A. ‘relative_curve’ performs the relative standard curve quantification, average relative expression/amount of the target gene in the condition of interest is given by Eqs. (4) and (5) and the standard deviation by Eqs. (6) and (7). The calculation is applied to the same dataset ‘ct1’ and is shown in Table 3 and Fig. 2B.

Figure 2: Relative expression of c-myc in human brain and kidney tissues.
Total RNA from human brain and kidney tissues were used to synthesize cDNA and real-time quantitative PCR reaction with c-myc and GAPDH primers. C_T values from six independent replicates were used to calculate the expression of c-myc in the kidney normalized by GAPDH and relative to the brain using The ΔΔC_T (A) and the standard curve methods (B). Averages and standard deviations are shown as bars and error bars.

Download full-size image

DOI: 10.7717/peerj.4473/fig-2

  
 
#  pcr_analyze 
##  locate  and  read  raw ct data 
fl <− system.file('extdata', 'ct1.csv', package = 'pcr') 
ct1 <− read_csv(fl) 
 
## add  grouping  variable 
group_var <− rep(c('brain', 'kidney'), each = 6) 
 
# calculate  all  values  and  errors  in one  step 
## mode == 'separate_tube' default 
res1 <− pcr_analyze(ct1, 
                        group_var  = group_var, 
                        reference_gene  = 'GAPDH', 
                        reference_group  = 'brain') 
 
##  calculate  standard  amounts  and  error 
res2 <− pcr_analyze(ct1, 
                        group_var  = group_var, 
                        reference_gene  = 'GAPDH', 
                        reference_group  = 'brain', 
                        intercept = intercept, 
                        slope = slope, 
                        method = 'relative_curve ')

Significance testing

Finally, pcr_test can be used to calculate useful statistics, p-values and confidence intervals on the previous models. Two tests are available of the two-group comparisons; ‘t.test’ and ‘wilcox.test’ to test for the difference of the normalized target gene expression (ΔC_T) in one condition to another. Linear regression, ‘lm’, can be applied to estimate these differences between multiple conditions and a reference (Eq. (8)). The following code applies different testing methods to the dataset ‘ct4’. The dataset was published original in (Yuan et al., 2006), along with results of different testing method (Table 4). Table 5 shows the results of the three different tests as implemented in pcr_test.

Table 5:

Different testing methods applied to the same dataset.

	Gene	Estimate	Lower	Upper	Term
t.test	target	−0.68	−0.96	−0.41
wilcox.test	target	−0.64	−0.88	−0.42
lm	target	−0.68	−0.95	−0.41	group_vartreatment

DOI: 10.7717/peerj.4473/table-5

 
 
#  pcr_test 
# locate  and  read  data 
fl <− system.file('extdata', 'ct4.csv', package = 'pcr') 
ct4 <− read_csv(fl) 
 
# make  group  variable 
group <− rep(c('control', 'treatment'), each = 12) 
 
# test  using t −test 
tst1 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 't.test') 
 
# test  using  wilcox.test 
tst2 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 'wilcox.test') 
# testing  using  lm 
tst3 <− pcr_test(ct4, 
                     group_var  = group, 
                     reference_gene  = 'ref', 
                     reference_group  = 'control', 
                     test = 'lm')

The pcr package workflow

In Fig. 3, we suggest a workflow based on three steps; Quality assessment, modeling and testing for statistical significance, these can be applied by using the pcr package functions; pcr_assess, pcr_analyze and pcr_test respectively. The workflow is based on the C_T from two kinds of experiments. First, the C_T values for the genes of interest and a reference gene in the different experimental groups. Second, the C_T values of the same genes from a serial dilution experiment. The amplification efficiency can be calculated by calculating the ΔC_T from the first and the slope and the intercept from the second experiment. The ΔΔC_T model and the relative expression are then calculated and tested. In the case of the two groups comparison, any of the testing methods can be applied otherwise the linear regression can be used for multiple group comparisons. Alternatively, the standard curve methods can be applied by calculating the amount of cDNA in each sample from the C_T values and the individual curves for each gene. These amounts are finally used to obtain the relative expression of the genes of interest.

Figure 3: A conceptual workflow of the analysis of RT-qPCR data.
A graphic description of the input, intermediary values and final output of different steps of the analysis pipeline and their relations. Colors represent distinct steps and the numbers are the order in which they are applied in a typical analysis.

Download full-size image

DOI: 10.7717/peerj.4473/fig-3

Comparison with other packages

Pabinger et al. (2014) surveyed the tools used to analyze qPCR data across different platforms. They included nine R packages which provide very useful analysis and visualization methods. Table 6 shows a comparison of the features of these nine packages from the original publication, in addition to the pcr package. Some of these packages focuses one certain model and some are designed to handle high-throughput qPCR data (e.g., chipPCR). Most of these packages are hosted in CRAN and a few on the Bioconductor so they adhere to Bioconductor methods and data containers (e.g., qpcR).

Table 6:

Comparison with available R packages for qPCR quantification.

Package	Raw^a	Efficiency	Abs quant	Rel quant	Error prop.	Norm.	NA handling	Graphs	Stats.	MIQE^b
chipPCR	+	+					+	+		+
ddCT				+		+		+	+	+
dpcR			+				+	+	+	+
EasyqpcR		+		+		+	+			+
HTqPCR				+		+	+	+	+
FPK-PCR	+	+
NormqPCR				+		+	+			+
qpcR	+	+	+	+	+	+	+	+		+
qpcrNorm						+		+	+
pcr (proposed)		+		+	+	+		+	+	+

DOI: 10.7717/peerj.4473/table-6

Notes:

aCalculates the C_T values from raw florescence data.

bComplies with MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) recommendations for reporting RT-qPCR results.

In comparison, pcr provides a unified interface for different quality assessment, analysis and testing models. The different functions take similar inputs to perform different analysis steps which requires minimal formating at the end of the user. The main input and the output are tidy data.frame, and the package source code follows the tidyverse practices. This package targets the small scale qPCR experimental data and the R users. The interface and documentation choices were made with such users in mind and require no deep knowledge in specific data structures or complex statistical models. Users can go from the raw C_T values through different analysis steps to publication graphs in a few simple lines of code.

Limitations & future directions

The current version of the pcr package (1.1.0) provides only methods to estimate the expression of genes in a certain condition relative to another. Other methods were proposed for absolute quantification of the copy number of genes in samples (Whelan, Russell & Whelan, 2003). Also, the comparative C_T methods assume that the PCR reaction has a close to perfect amplification rates. Other methods were proposed to model the data when this assumption is not true (Liu & Saint, 2002; Tichopad et al., 2003). We are planning to implement methods for absolute quantification and dealing with less than perfect amplification efficiency cases in a future version of the package.

Conclusion

To sum, the pcr package is an open source R package for quality assessing, modeling and testing real-time quantitative PCR data. The package provide an intuitive and unified interface for its main functions to allow biologist to perform all necessary steps of qPCR analysis and produce graphs in a uniform way.

[1] Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, Mueller R, Nolan T, Pfaffl MW, Shipley GL, Vandesompele J, Wittwer CT. 2009. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clinical Chemistry 55(4):611-622

[2] Bustin SA, Nolan T. 2004. Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. Journal of Biomolecular Techniques 15(3):155-166

[3] Liu W, Saint DA. 2002. A new quantitative method of real time reverse transcription polymerase chain reaction assay based on simulation of polymerase chain reaction kinetics. Analytical Biochemistry 302(1):52-59

[4] Livak KJ, Schmittgen TD. 2001. Analysis of relative gene expression data using real-time quantitative PCR and the double delta CT method. Methods 25(4):402-408