Introduction

Comparable measurements are a sine qua non for both science and engineering, and one of the most commonly needed measurements of microbes is the number (or concentration) of cells in a sample. The most common method for estimating the number of cells in a liquid suspension is the use of optical density measurements (OD) at a wavelength of 600 nm (OD600)1. The dominance of OD measurements is unsurprising, particularly in plate readers, as these measurements are extremely fast, inexpensive, simple, relatively non-disruptive, high-throughput, and readily automated. Alternative measurements of cell count—microscopy (with or without hemocytometer), flow cytometry, colony-forming units (CFU), and others, e.g., see refs. 2,3,4,5—lack many of these properties, though some offer other benefits, such as distinguishing viability and being unaffected by cell states such as inclusion body formation, protein expression, or filamentous growth6.

A key shortcoming of OD measurements is that they do not actually provide a direct measure of cell count. Indeed, OD is not even linearly related to cell count except within a limited range7. Furthermore, because the phenomenon is based on light scatter rather than absorbance, it is relative to the configuration of a particular instrument. Thus, in order to relate OD measurements to cell count—or even just to compare measurements between instruments and experiments—it is necessary to establish a calibration protocol, such as comparison to a reference material.

While the problems of interpreting OD values have been studied (e.g., refs. 1,6,7), no previous study has attempted to establish a standard protocol to reliably calibrate estimation of cell count from OD. To assess reliability, it is desirable to involve a large diversity of instruments and laboratories, such as those participating in the International Genetically Engineered Machines (iGEM) competition8, where hundreds of teams at the high school, undergraduate, and graduate levels have been organized previously to study reproducibility and calibration for fluorescence measurements in engineered E. coli9,10. As iGEM teams have a high variability in training and available resources, organizing an interlaboratory study with iGEM also demands that protocols be simple, low cost, and highly accessible. The large scale and high variability between teams also allows investigation of protocol robustness, as well as how readily issues can be identified and debugged in protocol execution.

We thus organized a large-scale interlaboratory study within iGEM to compare three candidate OD calibration protocols: a colony-forming unit (CFU) assay, the de facto standard assay for determining viable cell count; comparison with colloidal silica (LUDOX) and water, previously used for normalizing fluorescence measurements9; and serial dilution of silica microspheres, a new protocol based on a recent study of microbial growth7. Overall, this study demonstrates that serial dilution of silica microspheres is by far the best of these three protocols under the conditions tested, allowing highly precise, accurate, and robust calibration that is easily assessed for quality control and can also evaluate the effective linear range of an instrument. We thus recommend use of silica microsphere calibration within the linear range of OD measurements for cells with compact shape and matching refractive index. Adoption of this recommendation is expected to enable effective use of OD data for estimation of cell count, comparison of plate reader measurements with single-cell measurements such as flow cytometry, improved replicability, and better cross-laboratory comparison of data.

Results

To evaluate the three candidate OD calibration protocols, we organized an interlaboratory study as part of the 2018 International Genetically Engineered Machine (iGEM) competition. The precision and robustness of each protocol is assessed based on the variability between replicates, between reference levels, and between laboratories. The overall efficacy of the protocols was then further evaluated based on the reproducibility of cross-laboratory measurements of cellular fluorescence, as normalized by calibrated OD measurements.

Experimental data collection

Each contributing team was provided with a set of calibration materials and a collection of eight engineered genetic constructs for constitutive expression of GFP at a variety of levels. Specifically, the constructs consisted of a negative control, a positive control, and six test constructs that were identical except for promoters from the Anderson library11, selected to give a range of GFP expression (illustrated in Fig. 1a, with complete details provided in Supplementary Data 1). In particular, the positive and negative controls and the J23101, J23106, and J23117 promoters were chosen based on their prior successful use in the 2016 iGEM interlaboratory study9 as controls and “high”, “medium”, and “low” test levels, respectively. Beyond these, J23100 and J23104 were chosen as potential alternatives for J23101 (about which there were previous reports of difficulty in transformation), and J23116 was chosen as an intermediate value in the large gap in expression levels between J23106 and J23117 (expected values were not communicated to teams, however). These materials were then used to follow a calibration and cell measurement protocol (see the “Methods” section; Supplementary Note: Plate Reader and CFU Protocol and Supplementary Note: Flow Cytometer Protocol).

Fig. 1: Study design.
figure 1

a Each team cultured eight strains of engineered E. coli expressing GFP at various levels: positive and negative controls plus a library of six test constructs with promoters selected to give a range of levels of expression. Each team also collected four sets of calibration measurements, b fluorescein titration for calibration of GFP fluorescence, plus three alternative protocols for calibration of absorbance at 600 nm: c dilution and growth for colony-forming units (CFU), d LUDOX and water, and e serial dilution of 0.961 μm-diameter monodisperse silica microspheres.

Each team transformed E. coli K-12 DH5-alpha with the provided genetic constructs, culturing two biological replicates for each of the eight constructs. Teams measured absorbance at 600 nm (OD600) and GFP in a plate reader from four technical replicates per biological replicate (for a total of eight replicates and fitting on a 96-well plate) at the 0 and 6 h time points, along with media blanks, thus producing a total of 144 OD600 and 144 GFP measurements per team. Six hours was chosen as a period sufficient for exponential growth, and the zero-hour measurement used only for comparison to exclude samples that failed to grow well. Teams with access to a flow cytometer were asked to also collect GFP and scatter measurements for each sample, plus a sample of SpheroTech Rainbow Calibration Beads12 for fluorescence calibration.

Measurements of GFP fluorescence were calibrated using serial dilution of fluorescein with PBS in quadruplicate, using the protocol from ref. 9, as illustrated in Fig. 1b. Starting with a known concentration of fluorescein in PBS means that there is a known number of fluorescein molecules per well. The number of molecules per arbitrary fluorescence unit can then be estimated by dividing the expected number of molecules in each well by the measured fluorescence for the well; a similar computation can be made for concentration.

Measurements of OD via absorbance at 600 nm (OD600) were calibrated using three protocols and for each of these a model was devised for the purpose of fitting the data obtained in the study (Methods):

Calibration to colony-forming units (CFU), illustrated in Fig. 1c: Four overnight cultures (two each of positive and negative controls), were sampled in triplicate, each sample diluted to 0.1 OD, then serially diluted, and the final three dilutions spread onto bacterial culture plates for incubation and colony counting (a total of 36 plates per team). The number of CFU per OD per mL is estimated by multiplying colony count by dilution multiple. This protocol has the advantage of being well established and insensitive to non-viable cells and debris, but the disadvantages of an unclear number of cells per CFU, potentially high statistical variability when the number of colonies is low, and being labor intensive.

Comparison of colloidal silica (LUDOX CL-X) and water, illustrated in Fig. 1d: This protocol is adapted from ref. 9 by substitution of a colloidal silica formulation that is more dense and freeze-tolerant (for easier shipping). Quadruplicate measurements are made for both LUDOX CL-X and water, with conversion from arbitrary units to OD measurement in a standard spectrophotometer cuvette estimated as the ratio of their difference to the OD measurement for LUDOX CL-X in a reference spectrophotometer. This protocol has the advantage of using extremely cheap and stable materials, but the disadvantage that LUDOX CL-X provides only a single reference value, and that it calibrates for instrument differences in determination of OD but cannot be used to estimate the number of cells, as all grades of LUDOX particles are far smaller than cells (<50 nm).

Comparison with serial dilution of silica microspheres, illustrated in Fig. 1e: This new protocol, inspired by the relationship between particle size, count, and OD7, uses quadruplicate serial dilution protocol of 0.961-μm-diameter monodisperse silica microspheres in water, similar to fluorescein dilution, but with different materials. These particles are selected to match the approximate volume and optical properties of E. coli, with the particles having a refractive index of 1.4 (per manufacturer specification) and typical E. coli ranging from 1.33 to 1.417. With a known starting concentration of particles, the number of particles per OD600 unit is estimated by dividing the expected number of particles in each well by the measured OD for the well. This protocol has the advantages of low cost and of directly mapping between particles and OD, but the disadvantage that the microspheres tend to settle and are freeze-sensitive.

Data from each team were accepted only if they met a set of minimal data quality criteria (Supplementary Note: Data Acceptance Criteria), including values being non-negative, the positive control being notably brighter than the negative control, and measured values for calibrants decreasing as dilution increases. In total, 244 teams provided data meeting these minimal criteria, with 17 teams also providing usable flow cytometry data. Complete anonymized data sets and analysis results are available in Supplementary Data 2.

Robustness of calibration protocols

We assessed the robustness of the calibration protocols under test in two ways: replicate precision and residuals. Replicate precision can be evaluated simply in terms of the similarity of values for each technical replicate of a protocol. The smaller the coefficient of variation (i.e., ratio of standard deviation to mean), the more precise the protocol. With regards to residuals, on the other hand, we considered the modeled mechanism that underlies each calibration method and assess how well it fits the data. Here, the residual is the distance between each measured value provided by a team and the predicted value of a model fit using that same set of data (see Methods for details of each mechanism model and residual calculations). The smaller the residual value, the more precise the protocol. Moreover, the more similar the replicate precision and residuals across teams, the more robust the protocol is to variations in execution conditions.

Figure 2 shows the distribution of the coefficients of variation (CVs) for all valid replicates for each of the calibrant materials (see Methods for validity criteria). For CFU, basic sampling theory implies that the dilution with the largest number of countably distinct colonies (lowest dilution) should have the best CV, and indeed this is the case for 81.6% of the samples. This percentage is surprisingly low, however, and indicates a higher degree of variation than can be explained by the inherent stochasticity of the protocol: CFU sampling should follow a binomial distribution and have a little over 3-fold higher CV with each 10-fold dilution, but on average it was much less. This indicates the presence of a large component of variation with an unknown source, which is further confirmed by the fact that even the best CVs are quite high: the best of the three dilutions for each team has CV ≤ 0.1 for only 2.1% of all data sets and CV ≤ 0.2 for only 16.4% of all data sets.

Fig. 2: Distribution of the coefficient of variation for valid replicate sets in CFU, LUDOX/water, microspheres, and fluorescein (all teams included).
figure 2

CFU models are generated from only the best CV dilution (blue); other dilutions are shown separately above. Even the best CV CFU dilutions, however, have a distribution far worse than the other four methods, and are surprisingly often not the lowest dilution (red crosses). Of the others, LUDOX (magenta) and water (light blue) have the best and near-identical distributions, while microspheres (black) and fluorescein (green) are only slightly higher.

LUDOX and water have the lowest CV, at CV ≤ 0.1 for 86.9% (LUDOX) and 88.1% (water) of all replicate sets and CV ≤ 0.2 for 97.1% (LUDOX) and 98.0% (water) of all replicate sets. Microspheres and fluorescein have slightly higher CV, at CV ≤ 0.1 for 80.8% (microspheres) and 76.9% (fluorescein) of all replicate sets and CV ≤ 0.2 for 93.9% (microspheres) and 92.4% (fluorescein) of all replicate sets. The difference between these two pairs likely derives from the fact that the LUDOX and water samples are each produced in only a single step, while the serial dilution of microspheres and fluorescein allows inaccuracies to compound in the production of later samples.

The accuracy of a calibration protocol is ultimately determined by how replicate data sets across the study are jointly interpreted to parameterize a model of the calibration protocol, one part of which is the scaling function that maps between arbitrary units and calibrated units. As noted above, this can be assessed by considering the residuals in the fit between observed values and their fit to the protocol model. To do this, we first estimated the calibration parameters from the observed experimental values (see Methods for the unit scaling computation for each calibration method), then used the resulting model to “predict” what those values should have been (e.g., 10-fold less colonies after a 10-fold dilution). The closer the ratio was to one, the more the protocol was operating in conformance with the theory supporting its use for calibration, and thus the more likely that the calibration process produced an accurate value.

Here we see a critical weakness of the LUDOX/water protocol: the LUDOX and water samples provide only two measurements, from which two model parameters are set: the background to subtract (set by water) and the scaling between background-subtracted LUDOX and the reference OD. Thus, the dimensionality of the model precisely matches the dimensionality of the experimental samples, and there are no residuals to assess. As such, the LUDOX/water protocol may indeed be accurate, but its accuracy cannot be empirically assessed from the data it produces. If anything goes wrong in the reagents, protocol execution, or instrument, such problems cannot be detected unless they are so great as to render the data clearly invalid (e.g., the OD of water being less than the OD of LUDOX).

The CFU protocol and the two serial dilution protocols, however, both have multiple dilution levels, overconstraining the model and allowing likely accuracy to be assessed. Figure 3 shows the distribution of residuals for these three protocols, in the form of a ratio between the observed mean for each replicate set and the value predicted by the model fit across all replicate sets. The CFU protocol again performs extremely poorly, as we might expect based on the poor CV of even the best replicates: only 7.3% of valid replicate sets have a residual within 1.1-fold, only 14.0% within 1.2-fold, and overall the geometric standard deviation of the residuals is 3.06-fold—meaning that values are only reliable to within approximately two orders of magnitude! Furthermore, the distribution is asymmetric, suggesting that the CFU protocol may be systematically underestimating the number of cells in the original sample. The accuracy of the CFU protocol thus appears highly unreliable.

Fig. 3: Distribution of residuals.
figure 3

a Model fit residual distribution for each replica set in the CFU (blue), microsphere, and fluorescein calibration protocols (all teams included). b Expanding the Y axis to focus on the microsphere and fluorescein distributions shows that incorporating a model parameter for systematic pipetting error (black, green) produces a notably better fit (and thus likely more accurate unit calibration) than a simple geometric mean over scaling factors (red, magenta).

The microsphere dilution protocol, on the other hand, produced much more accurate results. Even with only a simple model of perfect dilution, the residuals are quite low (red line in Fig. 3b), having 61.0% of valid replicates within 1.1-fold, 83.6% within 1.2-fold, and an overall geometric standard deviation of 1.152-fold. As noted above, however, with serial dilution we may expect error to compound systematically with each dilution, and indeed the value sequences in individual data sets do tend to show curves indicative of systematic pipetting error. When the model is extended to include systematic pipetting error (see Methods subsection on “Systematic pipetting error model”), the results improve markedly (black line in Fig. 3b), to 82.4% of valid replicates within 1.1-fold, 95.5% within 1.2-fold, and an overall geometric standard deviation improved to 1.090-fold. Fluorescein dilution provides nearly identical results: with a perfect dilution model (magenta line in Fig. 3b), having 71.1% of valid replicates within 1.1-fold, 88.2% within 1.2-fold, and an overall geometric standard deviation of 1.148-fold, and systematic pipetting error improving the model (green line in Fig. 3b), to 88.1% of valid replicates within 1.1-fold, 98.0% within 1.2-fold, and an overall geometric standard deviation of 1.085-fold.

Based on an analysis of the statistical properties of calibration data, we may thus conclude that the microsphere and fluorescein dilution protocols are highly robust, producing results that are precise, likely to be accurate, and readily assessed for execution quality on the basis of calibration model residuals. The LUDOX/water protocol is also highly precise and may be accurate, but its execution quality cannot be directly assessed due to its lack of residuals. The CFU protocol, on the other hand, appears likely to be highly problematic, producing unreliable and likely inaccurate calibrations.

Reproducibility and accuracy of cell-count estimates

Reproducibility and accuracy of the calibration protocols can be evaluated through their application to calibration of fluorescence from E. coli, as normalized by calibrated OD measurements. Figure 4 shows the fluorescence values computed for each of the three fluorescence/OD calibration combinations, as well as for calibrated flow cytometry, excluding data with poor calibration or outlier values for colony growth or positive control fluorescence (for details see Methods on determining validity of E. coli data). Overall, the lab-to-lab variation was workably small, with the geometric mean of the geometric standard deviations for each test device being 2.4-fold for CFU calibration, 2.21-fold for LUDOX/water calibration, and 2.21-fold for microsphere dilution calibration. These values are quite similar to those previously reported in ref. 9, which reported a 2.1-fold geometric standard deviation for LUDOX/water.

Fig. 4: Measured fluorescence of test devices.
figure 4

Measured fluorescence of test devices after 6 h of growth using a CFU calibration, b LUDOX/water calibration, c microsphere dilution calibration, and d flow cytometry. In each box, red plus indicates geometric mean, red line indicates median, top and bottom edges indicate 25th and 75th percentiles, and whiskers extend from 9 to 91%. Team count per condition provided in Supplementary Data 3.

Note that these standard deviations are also dominated by the high variability observed in the constructs with J23101 and J23104, both of which appear to have suffered notable difficulties in culturing, with many teams’ samples failing to grow for these constructs, while other constructs grew much more reliably (see Supplementary Fig. 1). Omitting the problematic constructs finds variations of 2.02-fold for CFU calibration, 1.84-fold for LUDOX/water calibration, and 1.83-fold for microsphere dilution calibration. Flow cytometry in this case is also similar, though somewhat higher variability in this case, at 2.31-fold (possibly due to the much smaller number of replicates and additional opportunities for variation in protocol execution). All together, these values indicate that, when filtered using quality control based on the replicate precision and residual statistics established above, all three OD calibration methods are capable of producing highly reproducible measurements across laboratories.

To determine the accuracy of cell-count estimates, we compared normalized bulk measurements (total fluorescence divided by estimated cell count) against single-cell measurements of fluorescence from calibrated flow cytometry, which provides direct measurement of per-cell fluorescence without the need to estimate cell count (see Methods on “Flow cytometry data processing” for analytical details). In this comparison, an accurate cell count is expected to allow bulk fluorescence measurement normalized by cell count to closely match the per-cell fluorescence value produced by flow cytometry. In making this comparison, there are some differences that must be considered between the two modalities. Gene expression typically has a log-normal distribution13, meaning that bulk measurements will be distorted upward compared to the geometric mean of log-normal distribution observed with the single-cell measurements of a flow cytometer. In this experiment, for typical levels of cell-to-cell variation observed in E. coli, this effect should cause the estimate of per-cell fluorescence to be approximately 1.3-fold higher from a plate reader than a flow cytometer. At the same time, non-cell particles in the culture will tend to distort fluorescence per-cell estimates in the opposite direction for bulk measurement, as these typically contribute to OD but not fluorescence in a plate reader, but the vast majority of debris particles are typically able to be gated out of flow cytometry data. With generally healthy cells in log-phase growth, however, the levels of debris in this experiment are expected to be relatively low. Thus, these two differences are likely to both be small and in opposite directions, such that we should still expect the per-cell fluorescence estimates of plate reader and flow cytometry data to closely match if accurately calibrated.

Of the three OD calibration methods, the LUDOX/water measurement is immediately disqualified as it calibrates only to a relative OD, and thus cannot produce comparable units. Comparison of CFU and microsphere dilution to flow cytometry is shown in Fig. 5. The CFU-calibrated measurements are far higher than the values produced by flow cytometry, a geometric mean of 28.4-fold higher, indicating that this calibration method badly underestimates the number of cells. It is unclear the degree to which this is due to known issues of CFU, such as cells adhering into clumps, as opposed to the problems with imprecision noted above or yet other possible unidentified causes. Whatever the cause, however, CFU calibration is clearly problematic for obtaining anything like an accurate estimate of cell count.

Fig. 5: Fluorescence per cell after 6 h of growth, comparing calibrated flow cytometry to estimates using cell count from CFU and microsphere dilution protocols (LUDOX/water is not shown as the units it produces are not comparable).
figure 5

Microsphere dilution produces values extremely close to the ground truth provided by calibrated flow cytometry, whereas the CFU protocol produces values more than an order of magnitude different, suggesting that CFU calibration greatly underestimates the number of cells in the sample. Bars show geometric mean and standard deviation. Team count per condition provided in Supplementary Data 3.

Microsphere dilution, on the other hand, produces values that are remarkably close to those for flow cytometry, a geometric mean of only 1.07-fold higher, indicating that this calibration method is quite accurate in estimating cell count. Moreover, we may note that the only large difference between values comes with the extremely low fluorescence of the J23117 construct, which is unsurprising given that flow cytometers generally have a higher dynamic range than plate readers, allowing better sensitivity to low signals.

Discussion

Reliably determining the number of cells in a liquid culture has remained a challenge in biology for decades. For the field of synthetic biology, which seeks to engineer based on standardized biological measurements, it was critical to find a solution to this challenge. Here, we have compared the most common method for calibrating OD to cell number (calculation of CFU) to two alternative methods of calibration: LUDOX/water and microsphere serial dilution. The qualitative and quantitative benefits and drawbacks of these three methods for OD calibration are summarized in Table 1.

Table 1 Summary of the benefits and drawbacks of the three calibration protocols.

These three protocols are all inexpensive, with the reagent cost for both LUDOX/water and microsphere serial dilution being <$0.10 US. The CFU protocol has well-known issues of cell clumping and slow, labor-intensive execution, and counts only live and active cells, which can be either a benefit or a limitation depending on circumstances, though it does benefit from being insensitive to cell shape and optical properties. In addition, the CFU counts in this study exhibited a remarkably high level of variability, which may call into question the use of the CFU method as a standard for determining cell counts. This observed variability is not without precedent—prior work has also demonstrated E. coli CFU counting performing poorly on measures of reproducibility and repeatability in an interlaboratory study14.

The microsphere protocol, on the other hand, has no major drawbacks and provides a number of notable benefits when applied to cells with shapes and optical properties that can be reasonably approximated with appropriately sized microspheres. First, the microsphere protocol is highly robust and reliable, particularly compared with CFU assays. Second, failures are much easier to diagnose with the microsphere protocol, since it has many distinct levels that can be compared. This is particularly salient when compared with the LUDOX/water protocol, which only provides a single calibration point at low absorbance (and thus susceptible to instrument range issues), and to the CFU protocol, where failures may be difficult to distinguish from inherent high variability. With the microsphere protocol, on the other hand, some failures such as systematic dilution error and instrument saturation can not only be detected, but also modeled and corrected for. Finally, the microsphere protocol also permits a unit match between plate reader and flow cytometry measurements (both in cell number and in fluorescence per cell), which is highly desirable, allowing previously impossible data fusion between these two complementary platforms (e.g., to connect high-resolution time-series data from a plate reader with high-detail data about population structure from a flow cytometer). Accordingly, based on the results of this study, we recommend the adoption of silica microsphere calibration for robust estimation of bacterial cell count. As long as OD measurements are within the linear range, this calibration protocol is expected to enable effective use of OD data for estimation of actual cell count, comparison of plate reader measurements with single-cell measurements such as flow cytometry, improved replicability, and better cross-laboratory comparison of data.

With regards to future opportunities for extension, we note that these methods seem likely to be applicable to other instruments that measure absorbance (e.g., spectrophotometers, automated culture flasks) by appropriately scaling volumes and particle densities. Similarly, it should be possible to adapt to other cell types by selecting other microspheres with appropriately adjusted diameters and materials for their optical properties (indeed, per ref. 7, many other commonly used bacteria have quite similar refractive index values), and a wide range of potential options are already readily available from commercial suppliers. Finally, further investigation would be valuable for more precisely establishing the relationship between cell count and particle count. It would also be useful to quantify the degree to which the estimates are affected by factors such as changing optical properties associated with cell state, distribution, shape, and clustering, and to investigate means of detecting and compensating for such effects.

Methods

Participating iGEM teams measured OD and fluorescence among the same set of plasmid-based devices, according to standardized protocols. In brief, teams were provided a test kit containing the necessary calibration reagents, a set of standardized protocols, and pre-formatted Excel data sheets for data reporting. Teams provided their own plate reader instruments, consumables/plasticware, competent E. coli cells, PBS, water, and culture medium. First, teams were asked to complete a series of calibration measurements by measuring LUDOX and water, and also making a standard curve of both fluorescein and silica microspheres. Next, each team transformed the plasmid devices into E. coli and selected transformants on chloramphenicol plates. They selected two colonies from each plate to grow as liquid cultures overnight, then the following day diluted their cultures and measured both fluorescence and OD after 0 and 6 h of growth. Some of these cultures were also used to make serial dilutions for the CFU counting experiment. Teams were asked to report details of their instrumentation, E. coli strains used, and any variations from the protocol using an online survey. Additional details are available in the Supplementary Information.

Calibration materials

The following calibration materials were provided to each team as a standard kit: 1 ml of LUDOX CL-X (Sigma-Aldrich, #420891) and 1.00e−8 moles of fluorescein (Sigma-Aldrich, #46970). About 300 μl of 0.961-μm-diameter monodisperse silica beads (Cospheric, SiO2MS-2.0, 0.961 μm) in ddH2O were prepared to contain 3.00e8 beads.

Fluorescein samples tubes were prepared with 1.00e−8 moles fluorescein in solution in each tube, which was then vacuum dried for shipping. Resuspension in 1 ml PBS would thus produce a solution with initial concentration of 10 μM fluorescein.

Each team providing flow cytometry data also obtained their own sample of SpheroTech RCP-30-5A Rainbow Calibration Particles (SpheroTech). A sample of this material is a mixture of particles with eight levels of fluorescence, which should appear as up to eight peaks (typically some are lost to saturation on the instrument). Teams used various different lots, reporting the lot number to allow selection of the appropriate manufacturer-supplied quantification for each peak.

Constructs, culturing, and measurement protocols

The genetic constructs supplied to each team for transformation are provided in Supplementary Data 1. The protocol for plate readers, exactly as supplied to each participating team, is provided in Supplementary Note: Plate Reader and CFU Protocol. The supplementary protocol for flow cytometry is likewise provided in Supplementary Note: Flow Cytometer Protocol.

Criteria for valid calibrant replicates

For purpose of analyzing the precision of calibrants, the following criteria were used to determine which replicate sets are sufficiently valid for inclusion of analysis:

CFU: A dilution level is considered valid if at least 4 of the 12 replicate plates have a number of colonies that are >0 but not too numerous to count (participants were instructed they could report anything over 300 colonies to be too numerous to count). A calibration set is considered valid if there is at least one valid dilution level. Of the 244 data sets, 241 are valid and 3 are not valid.

LUDOX/water: A LUDOX/water calibration is considered valid if it fits the acceptance criteria in Supplementary Note: Data Acceptance Criteria, meaning that all 244 are valid.

Microsphere dilution and fluorescein dilution: For both of these protocols, a dilution level is considered locally valid if the measured value does not appear to be either saturated high or low. High saturation is determined by lack of sufficient slope from the prior level, here set to be at least 1.5x, and low saturation by indistinguishability from the blank replicates, here set to be anything <2 blank standard deviations above the mean blank. The valid range of dilution levels is then taken to be the longest continuous sequence of locally valid dilution levels, and the calibration set considered valid overall if this range has at least three valid dilution levels.

For microsphere dilution, of the 244 data sets, 235 are valid and 9 are not valid—one due to being entirely low saturated, the others having inconsistent slopes indicative of pipetting problems. Supplementary Fig. 2 Length of Valid Sequence(a) shows that most microsphere dilution data sets have the majority of dilution levels valid, but that only about one-tenth are without saturation issues.

For fluorescein dilution, of the 244 data sets, 243 are valid and 1 is not valid, having an inconsistent slope indicative of pipetting problems. Supplementary Fig. 2 Length of Valid Sequence(b) shows that the vast majority of fluorescein dilution data sets are without any saturation issues.

Note that in both cases, changing the required number of value dilution levels down to 2 or up to 4 would have little effect on the number of data sets included, adding 7 or removing 8 for microspheres and adding or removing only 1 for fluorescein.

Unit scaling factor computation

CFU

The scaling factor Sc relating CFU/ML to Abs600 is computed as follows:

$${S}_{c,i}=\mu ({C}_{i})* {\delta }_{i}$$
(1)

where μ(Ci) is the mean number of colonies for dilution level i and δi is the dilution fold for level i. For the specific protocol used, there are three effective dilution factors, 1.6e5, 1.6e6, and 1.6e7 (including a 2-fold conversion between 200 and 100 μl volumes).

The overall scaling factor Sc for each data set is then taken to be:

$${S}_{c}=\left\{{S}_{c,i}| \frac{\sigma ({C}_{i})}{\mu ({C}_{i})}=\mathop{\min }\limits_{i}\frac{\sigma ({C}_{i})}{\mu ({C}_{i})}\right\}$$
(2)

i.e., the scaling factor for the valid level with the lowest coefficient of variation, where σ(Ci) is the standard deviation in the number of colonies for dilution level i.

The residuals for this fit are then Sc,i/Sc for all other valid levels.

LUDOX/water

The scaling factor Sl relating standard OD to Abs600 is computed as follow:

$${S}_{l}=\frac{R}{\mu (L)-\mu (W)}$$
(3)

where R is the measured reference OD in a standard cuvette (in this case 0.063 for LUDOX CL-X), μ(L) is the mean Abs600 for LUDOX CL-X samples and μ(W) is the mean Abs600 for water samples.

No residuals can be computed for this fit, because there are two measurements and two degrees of freedom.

Microsphere dilution and fluorescein dilution

The scaling factors Sm relating microsphere count to Abs600 and Sf for relating molecules of fluorescein to arbitrary fluorescent units are both computed in the same way. These are transformed into scaling factors in two ways, either as the mean conversion factor Sμ or as one parameter of a fit to a model of systematic pipetting error Sp.

Mean conversion factor: If we ignore pipetting error, then the model for serial dilution has an initial population of calibrant p0 that is diluted n times by a factor of α at each dilution, such that the expected population of calibrant for the ith dilution level is:

$${p}_{i}={p}_{0}(1-\alpha ){\alpha }^{i-1}$$
(4)

In the case of the specific protocols used here, α = 0.5. For the microsphere dilution protocol used, p0 = 3.00e8 microspheres, while for the fluorescein dilution protocol used, p0 = 6.02e14 molecules of fluorescein.

The local conversion factor Si for the ith dilution is then:

$${S}_{i}=\frac{{p}_{i}}{\mu ({O}_{i})-\mu (B)}$$
(5)

where μ(Oi) is the mean of the observed values for the ith dilution level and μ(B) is the mean observed value for the blanks.

The mean conversion factor is thus

$${S}_{\mu }=\mu (\{{S}_{i}| i \, {\rm{is}}\ {\rm{a}}\ {\rm{valid}}\ {\rm{dilution}}\ {\rm{level}}\})$$
(6)

i.e., the mean over local conversion factors for valid dilution levels.

The residuals for this fit are then Si/Sμ for all valid levels.

Systematic pipetting error model: The model for systematic pipetting error modifies the intended dilution factor α with the addition of an unknown bias β, such that the expected biased population bi for the ith dilution level is:

$${b}_{i}={p}_{0}(1-\alpha -\beta ){(\alpha +\beta )}^{i-1}$$
(7)

We then simultaneously fit β and the scaling factor Sp to minimize the sum squared error over all valid dilution levels:

$$\epsilon =\sum_{i}| \mathrm{log}\,\left(\frac{{b}_{i}}{{S}_{p}\cdot (\mu ({O}_{i})-\mu (B))}\right){| }^{2}$$
(8)

where ϵ is sum squared error of the fit.

The residuals for this fit are then the absolute ratio of fit-predicted to observed net mean \(\frac{{b}_{i}/{S}_{p}}{\mu ({O}_{i}) \, - \, \mu (B)}\) for all valid levels.

Application to E. coli data

The Abs600 and fluorescence a.u. data from E. coli samples are converted into calibrated units by subtracting the mean blank media values for Abs600 and fluorescence a.u., then multiplying by the corresponding scaling factors for fluorescein and Abs600.

Criteria for valid E. coli data

For analysis of E. coli culture measurements, a data set was only eligible to be included if both its fluorescence calibration and selected OD calibration were above a certain quality threshold. The particular values used for the four calibration protocols were:

CFU: Coefficient of variation for best dilution level is <0.5.

LUDOX/water: Coefficient of variation for both LUDOX and water are <0.1.

Microsphere dilution: Systematic pipetting error has geometric mean absolute residual <1.1-fold.

Fluorescein dilution: Systematic pipetting error has geometric mean absolute residual <1.1-fold.

Measurements of the cellular controls were further used to exclude data sets with apparent problems in their protocol: those with a mean positive control value more than 3-fold different than the median mean positive control.

Finally, individual samples without sufficient growth were removed, that being defined as all that are either less than the 25% of the 75th percentile Abs600 measurement in the sample set or less than 2 media blank standard deviations above the mean media blank in the sample set.

Flow cytometry data processing

Flow cytometry data was processed using the TASBE Flow Analytics software package15. A unit conversion model from arbitrary units to MEFL was constructed per the recommended best practices of TASBE Flow Analytics for each data set using the bead sample and lot information provided by each team.

Gating was automatically determined using a two-dimensional Gaussian fit on the forward-scatter area and side-scatter area channels for the first negative control (Supplementary Fig. 3).

The same negative control was used to determine autofluorescence for background subtraction (Supplementary Fig. 4).

As only a single green fluorescent protein was used, there was no need for spectral compensation or color translation.

All teams submitted flow cytometry used standard SpheroTech Rainbow Calibration beads12 for dye-based calibration to equivalent fluorescent molecules16. In particular, 16 teams used RCP-30-5A beads (various lot numbers) and 1 team used URCP-38-2K beads, and conversion from arbitrary units to MEFL was computed using the peak-to-intensity values provided for each lot. Examples are provided below (Supplementary Figs. 5 and 6).

This color model was then applied to each sample to filter events and convert GFP measurements from arbitrary units to MEFL, and geometric mean and standard deviation computed for the filtered collection of events.

Statistics and reproducibility

As reproducibility is the main subject of this study, see the Results section above for its full presentation. In addition to the discussion of statistical analyses in the Results section, we note the following details of statistical analyses:

Coefficient of variation (CV) is computed per its definition, as the ratio of the standard deviation to the mean.

Fluorescence values are analyzed geometric mean and geometric standard deviation, rather than the more typical arithmetic statistics, due to the typical log-normal distribution of gene expression13.

Data analysis was performed with Matlab.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.