Table of Contents


AMiGA can fit a growth curve using GP regression

AMiGA fits growth curves as a Gaussian Process model and uses the model-predicted OD to estimate growth parameters. To fit the curves in your data set, simply run the following command.

python amiga.py fit -i /home/outbreaks/erandomii/ER1_PM2-1.txt
  • fit command ensures that AMiGA fits your growth curves with GP regression.
  • -i or --input argument will point to the location of the file of interest


Fitting growth curves takes about 1 minutes per 96-well plate. Users can speed this a bit by thinning the input to the GP regression model with the --time-step-size parameter. See Command Line Interface for more details.


fit summarizes growth curves

Before model-fitting, AMiGA will first compute the following metrics for all growth curves:

Metric Description
OD_Baseline The OD measurement at the first time point
OD_Min The minimum OD measurement at any time points
OD_Max The maximum OD measurement at any time points
Fold_Change defined as the ratio of change in OD of the case (or treatment) growth curve relative to change in OD of the control growth curve


fit manipulates growth curves

AMiGA will then apply the following transformations to you data.

  1. Convert units of time (e.g. from seconds to hours). Choice of units is defined in libs\config.py (see Configure default parameters).
  2. [Optional] If the user passed the --skip-first-n argument, AMiGA will ignore the first n time points of the curve.
  3. [Optional] If the user passed the --subtract-blanks argument, AMiGA will subtract growth curves corresponding to blank media controls from treatment growth curves.
  4. [Optional] If the user passed the --subtract-control argument, AMiGA will subtract growth curves corresponding to control wells (e.g. microbe on minimal media) from treatment growth curves.
  5. AMiGA handles negative or zero values by vertically translating the whole growth curve such that all measurements are positive.
  6. Transform OD with natural logarithmic (i.e. OD → ln(OD))
  7. Adjust the baseline of the growth curve such that the first time point is centered at approximately ln(OD)=0 which corresponds to OD=1.

The basic summary for adjusted growth curves will be added to the summary file if the user requests subtraction of either blank or control wells. This will generate Adj_OD_Baseline, Adj_OD_Min, and Adj_OD_Max columns for the first value, minimum value, and maximum value of the growth curves at the end of step 4.

At the end of step 5, AMiGA re-measures the first time point and stores it in the summary file as OD_Offset. If AMiGA detects negative or zero values, it will adjust the growth curve accordingly and OD_Offset should be different than OD_Baseline (Adj_OD_Baseline or if user applies steps 3 and/or).


See Adjusting OD and other sections below for more details on how to interpret Adjusted OD(t) or ln OD(t).


fit models growth using GP regression.

Finally, AMiGA will model a growth curve using GP regression, estimate the following growth parameters, and save them in a tab-separated file with the suffix _summary.txt.

Short Name Long Name Description
auc_lin AUC (lin) Area Under the curve (in units of OD, and based on user-specified of time)
auc_log AUC (log) Area Under the curve (in units of log OD, and based on user-specified of time)
k_lin Carrying Capacity (lin) Carrying capacity (in units of OD, assumes that OD starts at 0)
k_log Carrying Capacity (log) Carrying capacity (in units of log OD, assumes that OD starts at 0)
death_lin Death (lin) Total loss of growth (in units of OD)
death_log Death (log) Total loss of growth in units of log-OD)
gr Growth Rate Maximum specific growth rate
dr Death Rate Maximum specific death rate
td Doubling time Doubling Time
lagC Lag Time Time delay needed to enter exponential growth
lagP Adaptation Time Time delay needed to initiate positive growth
t_gr Time at Max. Growth Rate Time point at which maximum growth rate is reached
t_dr Time at Max. Death Rate Time point at which minimum gowth rate (i.e. maximum death rate) is reached
t_k Time at Carrying Capacity Time point at which carrying capacity is eached
diauxie Diauxie Multi-phasic grwoth (True or False)


Bonus: fit can detect and charaterize diauxic shifts.

If AMiGA detects multi-phasic growth (e.g. diauxic shift) in any well, it will also characterize the above growth parameters for each unique growth phase. These additional parameters will be saved in a separate tab-separated file with the suffix _diauxie.txt. See Detect Diauxie for more details.


AMiGA evaluates quality of fit

AMiGA evaluates how well the predicted growth curve matches the input growth curve using two metrics: the mean squared error (MSE) and the carrying capacity error (K_Error).

The MSE is simply defined as

\[MSE = \frac{1}{n} \sum_{t=1}^T\left(\mathrm{OD_i} - \widehat{\mathrm{OD_i}} \right)^2\]

where \(\mathrm{OD}\) is the input OD and \(\widehat{\mathrm{OD_i}}\) is the predicted OD.

The K_Error is defined as the deviation of the predicted carrying capacity (K) from the expected carrying capacity. The expected carrying capacity, \(K\) is defined as \((\text{OD_Max} - \text{OD_Baseline})\) or \((\text{Adj_OD_Max} - \text{Adj_OD_Baseline})\).

\[\text{K_Error} = \left|\frac{\hat{K}}{K} - 1\right| \times 100\%\]

where \(\hat{K}\) is the predicted carrying capacity and \(K\) is the expected carrying capacity..

In the summary file, AMiGA will highlight if the K_Error is above a certain threshold. The default is 20% but can be adjusted in in libs\config.py (see Configure default parameters).


The log-based estimates of AUC, K, and Death are relative!

All growth curves should start with a non-zero OD which indicates the starting size of the microbial population. To estimate exponential growth rates and other metrics, AMiGA must tansform the OD data with a natural logarithm in order to infer certain metrics like the maximum specific growth rate. Then, to account for variation in the starting OD, the measurement at the first time point is subtracted. In other words.

First, we apply natural log transformation

\[f(t) = \ln{\text{OD}(t)}\]

Second, we subtract first time point

\[f(t) = \ln{\text{OD}(t)}- \ln{\text{OD}(0)}\]

which is equivalent to

\[f(t) = \ln{\left(\frac{\text{OD}(t)}{\text{OD}(0)}\right)}\]

So at the first time point

\[f(0) = \ln{\left(\frac{\text{OD}(0)}{\text{OD}(0)}\right)} = \ln{1} = 0\]

and all measurements of OD at other time points is thus relative to an arbitrary initial measurement of OD(0). This affects several metrics in particular AUC_log, K_log, and Death_log.

In other words, whereas the linear estimate of K and AUC is in units of OD and OD x time, respectively,

\[\text{K_lin} = \max_t \text{OD} (t)\] \[\text{AUC_lin} = \int \text{OD} (t) dt\]

the log estimate of K and AUC is arbitrary and relative to what the K or AUC would have been if there was no growth at all.

\[\text{K_log} = \max_t \left(\frac{\text{OD(t)}}{\text{OD}(0)}\right)\] \[\text{AUC_log} = \int \ln{\frac{\text{OD}(t)}{\text{OD}(0)}} dt\]


What happens if the starting OD of a curve is zero?

All growth curves should start with a non-zero OD which indicates the starting size of the microbial population. To estimate exponential growth rates and other metrics, AMiGA must tansform the OD data with a natural logarithm in order to infer certain metrics like the maximum specific growth rate. Because the logarithm of zero is not defined, AMiGA will shift the whole growth curve up until all measurements are positive. To find out how AMiGA can do this, see Data input and manipulation: Handling non-positive measurements?.


Adjusting background OD by subtracting blanks or controls samples

Background OD especially at the first few time points can bias the inference of several growth parameters including growth rates, lag times, and adpatation times. It is a common practice to adjust for background OD using various steps. AMiGA has several of those setps.

Skipping First Few Time Points: Often, the first few time points have very low signal-to-noise ratio. Users can opt to ignore the first n time points using the --skip-first-n argument.

Adjusting Baseline of the Growth Curve: Growth curves are translated so that the first time point always starts at approximately ln(OD)=0 (this corresopnds to OD=1, i.e., initial populatin size of one). This can be done by either (i) subtracting the initial ln(OD) measurement from all subsequent ln(OD) measurements, or (ii) using a polynomial regression-based estimate of ln(OD) at the first time point. The former option is used by default. The latter option is only helpful for fitting of pooled experimental or technical replicates where the starting OD varies albeit only slightly. If the starting OD varies by a lot, you will introduce significant artifical biases into the model. To use the latter method, set config['PolyFit'] to True in libs\config.py.

Adjusting for background OD due to media: Users can adjust for the background OD to blank media with the --subtract-blank argument. See Command Line Interface for more detail on how to use this feature. In brief, users must provide in the meta-data two additional columns: BlankGroup and BlankControl. The group identifies the media type and values must be unique numebrs, and the control identifies if the well corresponds to blank media (1) or treatment well (0).

Adjusting for background OD due to other considerations: Alternatively, users may want to account for the background growth of their microbes on minimal media or other conditions. Users can adjust for the background OD with the --subtract-control argument. See Command Line Interface for more detail on how to use this feature. In brief, users must provide in the meta-data two additional columns: Group and Control. The group identifies the unique growth conditions and values must be unique numebrs, and the control identifies if the well corresponds to the group-specific control (1) or group-specific treatment well (0).


Troubleshooting


I am getting negative K and AUC vlaues. Why?

AMiGA analyzes growth curves after transforming with a natural logarithm. So K_log and AUC_log can be negative. For curves where there is practically no growth (ODs hover near zero over time), it is also possible to get negative values for K_lin and AUC_log estimates. After fitting growth curves, AMiGA transforms curves back to real OD with \(\mathrm{OD}(t) = \exp{\left[\text{log_OD(t)} + \ln{\text{real_OD}(0)}\right]}\) then subtracts the first time point to center at zero. This can results in measurements that are negative. If you are getting very large AUC_lin or K_lin values that are negative, that may be however an unrelated issue. If you pass the --save-gp-data argument, you can take a look at the predicted growth curve in the *_gp_data.txt file and plot the curve to see if there are actually negative values.


Command-Line arguments

To see the full list of arguments that amiga fit will accept, run

python amiga.py fit --help

which will return the following message

usage: amiga.py [-h] -i INPUT [-o OUTPUT] [-f FLAG] [-s SUBSET] [-t INTERVAL] [-tss TIME_STEP_SIZE]
                [-sfn SKIP_FIRST_N] [--do-not-log-transform] [--subtract-blanks] [--subtract-control]
                [--keep-missing-time-points] [--verbose] [--plot] [--plot-derivative] [--pool-by POOL_BY]
                [--save-cleaned-data] [--save-mapping-tables] [--save-gp-data] [--merge-summary] [--fix-noise]
                [--sample-posterior]

Fit growth curves

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
  -o OUTPUT, --output OUTPUT
  -f FLAG, --flag FLAG
  -s SUBSET, --subset SUBSET
  -t INTERVAL, --interval INTERVAL
  -tss TIME_STEP_SIZE, --time-step-size TIME_STEP_SIZE
  -sfn SKIP_FIRST_N, --skip-first-n SKIP_FIRST_N
  --do-not-log-transform
  --subtract-blanks
  --subtract-control
  --keep-missing-time-points
  --verbose
  --plot
  --plot-derivative
  --pool-by POOL_BY
  --save-cleaned-data
  --save-mapping-tables
  --save-gp-data
  --merge-summary
  --fix-noise
  --sample-posterior


See more details for these arguments in Command Line Interface