Background
Clonogenic assay or colony formation assay (CFA) is an
in vitro cell survival assay based on the ability of single cells to grow into colonies [
1]. It is the gold standard to determine cell reproductive death after treatment with ionizing radiation. Whereby the relationship between the radiation doses and the proportion of surviving colonies is usually described by parametric cell survival curves. These can be used for the characterisation of the radiation sensitivity of different tumour cell lines given a specific radiation type [
2], or in combination with other treatment modalities, e.g. a therapeutic agent or radiation sensitizer [
3]. For the analysis of cell survival curves CFAssay uses the commonly used linear-quadratic model (LQ model) [
1]. Apart from radiation the CFA is also applicable to two-way experimental designs, in which typically a control cell line and a genetically modified cell line are treated with a therapeutic drug [
4]. In CFAssay ANOVA based tests are used for two-way designs.
Usually, simple least square (LS) methods are applied on the logarithmic survival fractions in order to calculate parameter values for the LQ model and to perform comparison tests between curves or between mean values of survival fractions. The statistical analysis with the LS method requires that the data can be described accurately with the normal distribution. However, because colony numbers are discrete values following the Poisson distribution between different cell culture plates of a particular experiment, maximum likelihood (ML) based methods are preferable from a statistical point of view. The ML approach for cell survival curves was introduced by [
5] using a one-hit-multi-target model. Assuming that the model (here LQ) is applicable, ML estimations of the model parameters become asymptotically efficient, i.e. are most precise (for practical and theoretical considerations in general see for e.g. [
6]). The efficiency can be demonstrated in a simplified mean value example: For two numbers
k
1 and
k
2 of surviving colonies in plates with
N
1 or
N
2 cells seeded ML yields the pooled mean
m
1=(
k
1+
k
2)/(
N
1+
N
2) and LS yields the arithmetic mean
m
2=(
k
1/
N
1+
k
2/
N
2)/2. When
μ is the expectation of
m
1 and
m
2, the Poisson variance of
m
1 is
μ/(
N
1+
N
2) and that of
m
2 is (
μ/
N
1+
μ/
N
2)/4. The relative efficiency of
m
2 to
m
1 is the ratio of the two variances, 4/(1/
N
1+1/
N
2)/(
N
1+
N
2). For equal cell numbers the relative efficiency is 1.0 and in addition
m
1=
m
2. When the cell numbers are different by a factor of 3, the relative efficiency is 0.75.
In CFAssay the ML method is set as default but for the sake of comparison the LS method can be optionally chosen. The ML method provides two related quantities, the so called deviance and a dispersion parameter, which are useful to assess the quality of the data or the goodness of fit. Both quantities are more sensitive against outliers than the coefficient of determination
R
2 in LS regression. The dispersion parameter is derived from the method of iteratively reweighted least squares which solves the ML equations when they can be formulated as a generalized linear model [
7]. This holds true for the LQ model in the CFA as well as for the comparison of mean survival fractions with help of ANOVA models. Generalized linear models or LS regression are processed by the R-functions glm or lm, respectively. The functions of CFAssay serve as wrappers of these functions to simplify its use for the analyst and to extract numerical results along with the terminology used for the CFA. In addition to the LS or ML method, an option for the weighting of the LS as described in [
1] is provided. For visual inspection of data quality a further function is provided for plotting cell survival curves for each replicated experiment, annotated with the value of its contribution to the total weighted residual sum of squares. The package can be installed directly in R using the commands source(“
http://bioconductor.org/biocLite.R,”) and biocLite(~CFAssay~). Once installed the reference manual can be accessed from R using the command browseVignettes("CFAssay").
Methods
After any irradiation with a dose
d the number of scored colonies
y is proportional to
N, the number of cells seeded and to the average proportion
S=
S(
d) of cells that grow into colonies. Thus, the Poisson probability for
y is given by
$$ prob(y)=e^{-NS}(NS)^{y}/y! $$
(1)
The linear-quadratic cell survival model
Dose dependent surviving fractions in CFAssay are fitted by the LQ-model
$$ S = S(d) = e^{-c-\alpha d-\beta d^{2}} $$
(2)
where d is the radiation dose measured in Gy or another unit, α is the dose effect per Gy and β per Gy2. c=−l
o
g(S(0)), represents the logarithmic plating efficiency, i.e. the surviving fraction of unirradiated cells, which varies between different experiments. Usually, when taken as a fixed value, the plating efficiency is put by division on the left side of the equation1. We leave it on the right side to have the possibility to fit it together with the other two parameters. Statistically, colonies from untreated cells are as well as colonies from treated cells random observations.
ANOVA model for the two-way experiment
For the analysis of the two-way experiment we use multiplicative modelling, i.e. a logarithmic linear (log-linear) model with two linear factors
A and
B and a factor
D for their potential interaction. Then the model can be formulated as
$$ S = e^{c+Ax_{1}+Bx_{2}+Dx_{1}x_{2}} $$
(3)
or as nested parametrization
$$ S = e^{c+Ax_{1}+B_{0}x_{2}+(B_{1}-B_{0})x_{1}x_{2}} $$
(4)
where
A,
B are the effects of cell line modification and of one or of two different treatments,
D the interaction effect and
x
1,
x
2 are 0 or 1, dependent on which factor is applied. Thus, interaction for applied
A and
B means that there is more (or less) effect than the sum. In the second, nested parametrization
B
0 is the effect of treatment in control cells (
x
1=0) and
B
1 the treatment effect after genetic modification of the cell line (
x
1=1), for e.g. siRNA knockdown of a gene of interest. The interaction
D is then the difference between
B
0 and
B
1.
c represents the logarithmic plating efficiencies in replicated experiments, similar as in the LQ model (
2).
Finally, with the ML method the model parameters are determined such that the joint probability according to (
1) for the set of all colony counts
y
ij
at all doses
d
i
(or treatments) and for all replicates
j is maximized. For the Poisson distribution this is equivalent to iteratively minimize the sum of weighted squared differences between observed (
S
ij
=
y
ij
/
N
ij
) and modeled survival fractions
S, with inverse Poisson variances as weights. For overall Poisson distribution, the dispersion parameter, defined as the sum of weighted least squares divided by its expected value ([
6]), should be about 1.0. However, for the CFA it often appears to be >1.0, mainly due to extra variability between replicated experiments, even after correction for plating efficiencies. Therefore the calculated standard deviations in CFAssay are scaled by the square root of the dispersion parameter. This technique is also called quasi-likelihood and uses the quasipoisson family of the R-function glm. With the LS method simply the sum of squared differences between observed (
l
o
g(
S
ij
)=
l
o
g(
y
ij
/
N
ij
)) and modeled logarithmic survival fractions
l
o
g(
S) according to (
2)–(
4) is minimized. For assessment of the goodness-of-fit of the LS method the multiple
R
2 (coefficient of determination) is calculated, which describes the fraction of variability in the total data which can be explained by model dependency and plating efficiencies.
Example data
We demonstrate the ML method for two examples. For cell survival curves it is demonstrated on colony counts of irradiation experiments with a pair of two human head and neck squamous cell carcinoma (HNSCC) cell lines, CAL33 [
8] and OKF6/TERT1 [
9] which were irradiated with five different doses up to 6 Gy. The second example was taken from [
4]. There, the treatment effect of one given dose of the chemotherapeutic drug cisplatin/5-FU was tested for the human oesophageal adenocarcinoma cell line OE19 before and after COX7A2 knockdown by siRNA transfection. It was shown that knockdown of the COX7A2 protein altered chemosensitivity, which appeared statistically as an interaction effect. Data of the two examples are supplied in the CFAssay package.
Discussion
We established the software package CFAssay for statistical analysis of the colony formation assay and to be used with the open source statistic software R [
10]. The package consists of several functions for the calculation of linear-quadratic (LQ) cell survival curve parameters, plotting of survival curves and a statistical test for comparing pairs of survival curves. In addition, it contains a function for ANOVA testing of two-way experimental designs with the CFA. The functions use per default maximum likelihood (ML) based methods, however optionally the least square (LS) method or a weighted LS method with weights calculated according to [
1] can be used for sake of comparison. Results of the ML method are known to be most stable when the data vary according to the Poisson distribution and the model can be assumed to be appropriate. Data of the CFA is usually analysed by the LQ model [
3]. Although, as in the presented examples, where numbers of survived colonies are throughout two-digit (>10) or more, ML and LS lead to comparable results, this cannot be guaranteed in general. However, with the ML method the dispersion parameter provides a sensitive quantity to assess the quality of the data. Large dispersion values can be due to outlying single points of one experiment or to variation between experimental replicates. Deviations from the LQ model should not have a substantial statistical influence for irradation doses below 8 Gy. In the manual we recommend roughly a critical dispersion value of 9.0 in analogy to the three-sigma rule. However, it depends on experience and CFAssay provides diagnostic plots for single experiments.
The LQ model for cell survival and log-linear ANOVA for Poisson distributed counts of surviving colonies belong statistically to a wider class of so called generalized linear models [
7]. Numerical procedures for its solution with the algorithm of iteratively reweighted least squares, which solve the ML equations, are now available with almost every software for statistical analysis (GENMOD in SAS [
11], GENLIN in SPSS [
12], glm in R).
Conclusions
The availability of numerical procedures for the ML method and its features for thorough statistical analysis are a reason why it should be taken into account. R is now the most widely used statistical framework for the professional statistician and also non-statisticians such as biologists. Because it is free-available for everyone, interlaboratory code sharing between users is facilitated. CFAssay significantly simplifies the use of the R functions glm and lm for non-statisticians working with the CFA and allows straightforward analysis and plotting of CFA data. The package is open for extensions to other models for cell survival and related statistical analysis.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
HB contributed to the implementation of the statistical methods, to design of the software and drafting. AM contributed to testing and critical revision of the software design. JH contributed to drafting and critical revision of the manuscript. KU revised the manuscript and gave final approval to the version to be submitted. All authors read and approved the final manuscript.