Fit Curves
Table of Contents
- AMiGA can fit a growth curve using GP regression
fit
summarizes growth curvesfit
manipulates growth curvesfit
models growth using GP regression.- Bonus:
fit
can detect and charaterize diauxic shifts. - AMiGA evaluates quality of fit
- The log-based estimates of AUC, K, and Death are relative!
- What happens if the starting OD of a curve is zero?
- Adjusting background OD by subtracting blanks or controls samples
- Troubleshooting
- Command-Line arguments
AMiGA can fit a growth curve using GP regression
AMiGA
fits growth curves as a Gaussian Process model and uses the model-predicted OD to estimate growth parameters. To fit the curves in your data set, simply run the following command.
python amiga.py fit -i /home/outbreaks/erandomii/ER1_PM2-1.txt
fit
command ensures thatAMiGA
fits your growth curves with GP regression.-i
or--input
argument will point to the location of the file of interest
Fitting growth curves takes about 1 minutes per 96-well plate. Users can speed this a bit by thinning the input to the GP regression model with the --time-step-size
parameter. See Command Line Interface for more details.
fit
summarizes growth curves
Before model-fitting, AMiGA
will first compute the following metrics for all growth curves:
Metric | Description |
---|---|
OD_Baseline | The OD measurement at the first time point |
OD_Min | The minimum OD measurement at any time points |
OD_Max | The maximum OD measurement at any time points |
Fold_Change | defined as the ratio of change in OD of the case (or treatment) growth curve relative to change in OD of the control growth curve |
fit
manipulates growth curves
AMiGA
will then apply the following transformations to you data.
- Convert units of time (e.g. from seconds to hours). Choice of units is defined in
libs\config.py
(see Configure default parameters). - [Optional] If the user passed the
--skip-first-n
argument,AMiGA
will ignore the firstn
time points of the curve. - [Optional] If the user passed the
--subtract-blanks
argument,AMiGA
will subtract growth curves corresponding to blank media controls from treatment growth curves. - [Optional] If the user passed the
--subtract-control
argument,AMiGA
will subtract growth curves corresponding to control wells (e.g. microbe on minimal media) from treatment growth curves. - AMiGA handles negative or zero values by vertically translating the whole growth curve such that all measurements are positive.
- Transform OD with natural logarithmic (i.e. OD → ln(OD))
- Adjust the baseline of the growth curve such that the first time point is centered at approximately ln(OD)=0 which corresponds to OD=1.
The basic summary for adjusted growth curves will be added to the summary file if the user requests subtraction of either blank or control wells. This will generate Adj_OD_Baseline
, Adj_OD_Min
, and Adj_OD_Max
columns for the first value, minimum value, and maximum value of the growth curves at the end of step 4.
At the end of step 5, AMiGA
re-measures the first time point and stores it in the summary file as OD_Offset
. If AMiGA
detects negative or zero values, it will adjust the growth curve accordingly and OD_Offset
should be different than OD_Baseline
(Adj_OD_Baseline
or if user applies steps 3 and/or).
See Adjusting OD and other sections below for more details on how to interpret Adjusted OD(t) or ln OD(t).
fit
models growth using GP regression.
Finally, AMiGA
will model a growth curve using GP regression, estimate the following growth parameters, and save them in a tab-separated file with the suffix _summary.txt
.
Short Name | Long Name | Description |
---|---|---|
auc_lin | AUC (lin) | Area Under the curve (in units of OD, and based on user-specified of time) |
auc_log | AUC (log) | Area Under the curve (in units of log OD, and based on user-specified of time) |
k_lin | Carrying Capacity (lin) | Carrying capacity (in units of OD, assumes that OD starts at 0) |
k_log | Carrying Capacity (log) | Carrying capacity (in units of log OD, assumes that OD starts at 0) |
death_lin | Death (lin) | Total loss of growth (in units of OD) |
death_log | Death (log) | Total loss of growth in units of log-OD) |
gr | Growth Rate | Maximum specific growth rate |
dr | Death Rate | Maximum specific death rate |
td | Doubling time | Doubling Time |
lagC | Lag Time | Time delay needed to enter exponential growth |
lagP | Adaptation Time | Time delay needed to initiate positive growth |
t_gr | Time at Max. Growth Rate | Time point at which maximum growth rate is reached |
t_dr | Time at Max. Death Rate | Time point at which minimum gowth rate (i.e. maximum death rate) is reached |
t_k | Time at Carrying Capacity | Time point at which carrying capacity is eached |
diauxie | Diauxie | Multi-phasic grwoth (True or False) |
Bonus: fit
can detect and charaterize diauxic shifts.
If AMiGA detects multi-phasic growth (e.g. diauxic shift) in any well, it will also characterize the above growth parameters for each unique growth phase. These additional parameters will be saved in a separate tab-separated file with the suffix _diauxie.txt
. See Detect Diauxie for more details.
AMiGA evaluates quality of fit
AMiGA
evaluates how well the predicted growth curve matches the input growth curve using two metrics: the mean squared error (MSE
) and the carrying capacity error (K_Error
).
The MSE is simply defined as
\[MSE = \frac{1}{n} \sum_{t=1}^T\left(\mathrm{OD_i} - \widehat{\mathrm{OD_i}} \right)^2\]where \(\mathrm{OD}\) is the input OD and \(\widehat{\mathrm{OD_i}}\) is the predicted OD.
The K_Error is defined as the deviation of the predicted carrying capacity (K) from the expected carrying capacity. The expected carrying capacity, \(K\) is defined as \((\text{OD_Max} - \text{OD_Baseline})\) or \((\text{Adj_OD_Max} - \text{Adj_OD_Baseline})\).
\[\text{K_Error} = \left|\frac{\hat{K}}{K} - 1\right| \times 100\%\]where \(\hat{K}\) is the predicted carrying capacity and \(K\) is the expected carrying capacity..
In the summary file, AMiGA
will highlight if the K_Error
is above a certain threshold. The default is 20% but can be adjusted in in libs\config.py
(see Configure default parameters).
The log-based estimates of AUC, K, and Death are relative!
All growth curves should start with a non-zero OD which indicates the starting size of the microbial population. To estimate exponential growth rates and other metrics, AMiGA
must tansform the OD data with a natural logarithm in order to infer certain metrics like the maximum specific growth rate. Then, to account for variation in the starting OD, the measurement at the first time point is subtracted. In other words.
First, we apply natural log transformation
\[f(t) = \ln{\text{OD}(t)}\]Second, we subtract first time point
\[f(t) = \ln{\text{OD}(t)}- \ln{\text{OD}(0)}\]which is equivalent to
\[f(t) = \ln{\left(\frac{\text{OD}(t)}{\text{OD}(0)}\right)}\]So at the first time point
\[f(0) = \ln{\left(\frac{\text{OD}(0)}{\text{OD}(0)}\right)} = \ln{1} = 0\]and all measurements of OD at other time points is thus relative to an arbitrary initial measurement of OD(0). This affects several metrics in particular AUC_log, K_log, and Death_log.
In other words, whereas the linear estimate of K and AUC is in units of OD and OD x time, respectively,
\[\text{K_lin} = \max_t \text{OD} (t)\] \[\text{AUC_lin} = \int \text{OD} (t) dt\]the log estimate of K and AUC is arbitrary and relative to what the K or AUC would have been if there was no growth at all.
\[\text{K_log} = \max_t \left(\frac{\text{OD(t)}}{\text{OD}(0)}\right)\] \[\text{AUC_log} = \int \ln{\frac{\text{OD}(t)}{\text{OD}(0)}} dt\]What happens if the starting OD of a curve is zero?
All growth curves should start with a non-zero OD which indicates the starting size of the microbial population. To estimate exponential growth rates and other metrics, AMiGA
must tansform the OD data with a natural logarithm in order to infer certain metrics like the maximum specific growth rate. Because the logarithm of zero is not defined, AMiGA
will shift the whole growth curve up until all measurements are positive. To find out how AMiGA
can do this, see Data input and manipulation: Handling non-positive measurements?.
Adjusting background OD by subtracting blanks or controls samples
Background OD especially at the first few time points can bias the inference of several growth parameters including growth rates, lag times, and adpatation times. It is a common practice to adjust for background OD using various steps. AMiGA
has several of those setps.
Skipping First Few Time Points: Often, the first few time points have very low signal-to-noise ratio. Users can opt to ignore the first n time points using the --skip-first-n
argument.
Adjusting Baseline of the Growth Curve: Growth curves are translated so that the first time point always starts at approximately ln(OD)=0 (this corresopnds to OD=1, i.e., initial populatin size of one). This can be done by either (i) subtracting the initial ln(OD) measurement from all subsequent ln(OD) measurements, or (ii) using a polynomial regression-based estimate of ln(OD) at the first time point. The former option is used by default. The latter option is only helpful for fitting of pooled experimental or technical replicates where the starting OD varies albeit only slightly. If the starting OD varies by a lot, you will introduce significant artifical biases into the model. To use the latter method, set config['PolyFit']
to True
in libs\config.py
.
Adjusting for background OD due to media: Users can adjust for the background OD to blank media with the --subtract-blank
argument. See Command Line Interface for more detail on how to use this feature. In brief, users must provide in the meta-data two additional columns: BlankGroup
and BlankControl
. The group identifies the media type and values must be unique numebrs, and the control identifies if the well corresponds to blank media (1) or treatment well (0).
Adjusting for background OD due to other considerations: Alternatively, users may want to account for the background growth of their microbes on minimal media or other conditions. Users can adjust for the background OD with the --subtract-control
argument. See Command Line Interface for more detail on how to use this feature. In brief, users must provide in the meta-data two additional columns: Group
and Control
. The group identifies the unique growth conditions and values must be unique numebrs, and the control identifies if the well corresponds to the group-specific control (1) or group-specific treatment well (0).
Troubleshooting
I am getting negative K and AUC vlaues. Why?
AMiGA
analyzes growth curves after transforming with a natural logarithm. So K_log
and AUC_log
can be negative. For curves where there is practically no growth (ODs hover near zero over time), it is also possible to get negative values for K_lin
and AUC_log
estimates. After fitting growth curves, AMiGA
transforms curves back to real OD with \(\mathrm{OD}(t) = \exp{\left[\text{log_OD(t)} + \ln{\text{real_OD}(0)}\right]}\) then subtracts the first time point to center at zero. This can results in measurements that are negative. If you are getting very large AUC_lin or K_lin values that are negative, that may be however an unrelated issue. If you pass the --save-gp-data
argument, you can take a look at the predicted growth curve in the *_gp_data.txt
file and plot the curve to see if there are actually negative values.
Command-Line arguments
To see the full list of arguments that amiga fit
will accept, run
python amiga.py fit --help
which will return the following message
usage: amiga.py [-h] -i INPUT [-o OUTPUT] [-f FLAG] [-s SUBSET] [-t INTERVAL] [-tss TIME_STEP_SIZE]
[-sfn SKIP_FIRST_N] [--do-not-log-transform] [--subtract-blanks] [--subtract-control]
[--keep-missing-time-points] [--verbose] [--plot] [--plot-derivative] [--pool-by POOL_BY]
[--save-cleaned-data] [--save-mapping-tables] [--save-gp-data] [--merge-summary] [--fix-noise]
[--sample-posterior]
Fit growth curves
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
-o OUTPUT, --output OUTPUT
-f FLAG, --flag FLAG
-s SUBSET, --subset SUBSET
-t INTERVAL, --interval INTERVAL
-tss TIME_STEP_SIZE, --time-step-size TIME_STEP_SIZE
-sfn SKIP_FIRST_N, --skip-first-n SKIP_FIRST_N
--do-not-log-transform
--subtract-blanks
--subtract-control
--keep-missing-time-points
--verbose
--plot
--plot-derivative
--pool-by POOL_BY
--save-cleaned-data
--save-mapping-tables
--save-gp-data
--merge-summary
--fix-noise
--sample-posterior
See more details for these arguments in Command Line Interface