The objective of this tutorial is to showcase for advanced users how to adopt AMiGA code into their own scripts instead of relying on the command-line interface (CLI). Here, I will introdue functions and classes that can be used in a standalone manner for importing data and meta-data, processing growth curves, and fitting growth curves using Gaussian Process (which is implemented using the GPy package).

Table of Contents

Import Packages
Impot AMiGA functions
Read example data
GrowthPlate object
GrowthPlate methods
GrowthModel object
Curve object
Pooling example
Estimating confidence intervals for OD
Estimating confidence intervals for parameters

Import packages

Import AMiGA functions

WARNING: You will have to replace the the path to match amiga's location on your local machine.

Read "death" example data

We will use the "death" example data set included in the Github repository for this tutorial.

Define plate reader interval (time interval between subsequent measurements) of optical density. You can pass a single default value or file-specific values.

readPlateReaderFolder() can be used to parse a single file or multiple files at once. It returns a dictionary with file names as keys and formatted and tabulated data as values.

assembleMappings() can be used to parse meta-data.

trimInput() allows you to merge the data and mapping and also trim or subset the data based on user-defined parameters. The subset entry of the dictionary indicates which conditions I need to keep. The keys (i.e. Substrate, Concentration, and Isolate) must already be included in the meta-data files as column headers. The values must be actual values in these respective columns.

AMiGA formats data file as follows:

AMiGA formats mapping file as follows:

You can easily plot the raw data

The three curves with low OD correspond to growth on minimal media only while the three curves with high OD correspond to growth on minimal media supplemented with fructose.

GrowthPlate object

GrowthPlate is a class that allows you to save data in a standard format amenable to several tansformations.

The main attributes of GrowthPlate are time, data, and key.

See ?GrowthPlate for more info.

GrowthPlate methods

You can compute Min OD, Max OD, and Initial OD

You can comptue Fold Change relative to control wells

In the meta-data files, I manually added Group and Control columns to indicate how fold-changes should be calculated. For each unique Group, there must be at least one Control sample.

You can adjust the units of time

If your measurements include negative or zero values, AMiGA can raise them to zero

Growth curve analysis requires log tansformation which can not handle non-positive values. In some cases, OD (or other measurements) may be zero or negative. This avoids numerical issues by raising them to the minimum positive value in the curve.

You can then log-transform your curves

Finally, you can adjust the initial OD so that it is centered at zero

This can be done in two ways:

  1. poly=False, for each curve, simply subtract OD at the first time point from subsequent time points
  2. poly=True, for a group of curves, fit the first 5 time points to a polynomial of degree 3, estimate the initial OD basd on this fit, then subtract this value from OD at all time points for all curves.

Either way, recall that subtraction in the log-space is equivalent to division in the linear space.

$\log\left[OD(t)\right] - \log\left[OD(0)\right] = \log\left(\frac{OD(t)}{OD(0)}\right)$

This may not seem obvious but it affects the interpretation of your growth curves and growth parameters as I discuss elsewhere.

GrowthModel object

GrowthModel class allows you to model growth curve (individual or a group of replicates) as a Gaussian Process.

see ?GrowthModel for more info.

Above, we combined time and OD into a single pandas.DataFrame then we melted the dataframe into a long form where columns include the measured (dependent variable) of OD and the explanatory (independent variable) of Time. You can also pass additional categorical covariates as well as additional columns with arbitrary name.

We will later return to the multi-dimensional input See Pooling example. For now, we will analyze the simple model where only time is is the independent varaiable. We will also analyze a single growth curve and later will attempt a model that pools replicate samples.

df attribute contains the input data

x attribute is the model input for independent variables

y attribute is the model input for dependent variable

If GrowthModel recognizes that there are additional variables beyond time, it also computes the measurement variance across all replicate samples. Here, it would compute the measurement variances for growth on fructose and measurement variance for grwoth on water. The errors are organized in long-form like the tmp dataframe.

You can now fit the model using GPy in the backend

Cruve object

The GrowthModel().run() method not only fits the GrowthModel, but predicts the growth curve, its first-order derivative, and its second-order derivative.

Because we are jointly modelling multiple curves with differnt starting initial values (baseline), GrowthModel will assume that that baseline is zero.

$\log\left[OD(t)\right] - \log\left[OD(0)\right] = 0$

This corresponds to

$\log\left(\frac{OD(t)}{OD(0)}\right)=1$

so the growth parameter estimates for AUC, carrying capacity, and death will be relative to th starting OD. See below for more details.

The data method summarizes different formulations of the input and predicted data.

  1. Time
  2. GP_Input: Actual input to the GP model.
  3. GP_Output: Prediction by the GP model for mean grwoth.
  4. GP_Derivative: Pediction by the GP model for derivative of growth.
  5. OD_Growth_Data: GP_Input but converted to linear scale.
  6. OD_Growth_Fit: GP_Output but converted to linear scale.
  7. OD_Fit: OD_Growth_Fit

Grwoth parameters are stored in a dictionary

diauxie is a binary variable that indicates if diauxic shift was detected (1) or not (0).

df_dx is a pandas.DataFrame that contains one row for each unique growth phase with columns being the growth parameters. If there is no diauxie, the dataframe has a single row.

Pooling example

We are using the tmp variable from earlier

Similar to the uni-dimentional data, the data method summarizes different formulations of the input and predicted data.

  1. Time
  2. GP_Input: Actual input to the GP model.
  3. GP_Output: Prediction by the GP model for mean grwoth.
  4. GP_Derivative: Pediction by the GP model for derivative of growth.
  5. OD_Growth_Data: GP_Input but converted to linear scale.
  6. OD_Growth_Fit: GP_Output but converted to linear scale.
  7. OD_Fit: OD_Growth_Fit

HOWEVER, here GP_Input and OD_Growth_Data are n-times lager than the other variables, where n is the number of pooled samples or pooled replicates.

Here it is plotted differently.

However we are interested in simply the model estimates which we can store into a single pandas.DataFrame as shown below

Above, we have a dataframe that has meta-data columns indicating the unique conditions, and columns that indicate model predictions:

This is how you can plot the prediction of the latent function

This is how you can plot the prediction of the derivative of the latent function

Here are the parameters: you can extract only the means or the means and standard deviations.

To get the 95% confidence interval of the parameters, here is what you can do.