AMiGA : Pool Replicates

Table of Contents

What is pooling?
How to infer summary statistics for pooled replicates?
How to empirically estimate measurement noise?
Where are the linear estimates of AUC, Carrying Capacity, and Death?
How can I evaluate the goodness of fit for my growth curves?

AMiGA can model individual growth curves to estimate growth parameters. It can also pool experimental and/or technical replicates to model mean growth behavior and estimate summary statistics for growth parameters, in particular mean, standard deviation, and confidence intervals.

What is pooling?

In this example, I have in my working directory (/Users/firasmidani/xperiment/) multiple data files in the data sub-folder. As previously shown, you can analyze all growth curves separately without pooling.

amiga fit -i /Users/firasmidani/experiment/ -o "split_merged" --merge-summary

Below, I however ask AMiGA to pool replicates based on all unique combinations of Isolateand Substrate. If I am analyzing the growth of 2 isolates on 96 substrates, I have 192 unique combinations. For each unique condition (Isolate x Substrate), AMiGA will find all replicate samples and model them jointly. To pool these replicates, you must pass the necessary meta-data to AMiGA. See Preparing metadata. Here, let us assume that my meta-data already includes columns for Isolate and Substrate.

amiga fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate"

How to infer summary statistics for pooled replicates?

Below, the --sample-posterior argument asks AMiGA to infer summary statistics for the estimated growth parameters. If not called, AMiGA simply provides a single estimate of each growth parameter based on the mean latent function of the GP model. If called, AMiGA samples from the posterior distribution of the predicted curve (n times, where n is defined in the libs/config.py file, see Configure default parameters for more details), computes the growth parameter for each sampled curve, then returns the mean and the standard deviation of the mean of each growth parameter in the summary file.

amiga fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate" --sample-posterior 

How to empirically estimate measurement noise?

Gaussian Process regression will model measurement noise as a single time-independent term (\(\sigma_{noise}\)). This noise term influences the standard deviation and thus confidence intervals of the predicted growth parameters. However, measurement noise tends to vary with time and is often smaller during lag and exponential phase and larger during stationary and death phases. The user has the option of empirically modelling the measurement noise. This measurement noise is computed as the variance across all replicates at each time point, which is then smoothed over time with a Gaussian filter. The user can opt for inferring summary statistics with empirically estimated measurement noise using the --fix-noise argument. While fitting growth with empirically estimated measurements fine-tunes confidence intervals, it is more prone to over-fitting and should be used with caution.

amiga fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate" --sample-posterior --fix-noise

Of course, you can further specify the analysis with additional command line arguments.

amiga fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate,PM" --skip-first-n 1 -tss 1  --save-gp-data --save-cleaned-data --save-mapping-data  --sample-posterior --fix-noise --verbose

Where are the linear estimates of AUC, Carrying Capacity, and Death?

Pooling combines the measurmeents across multiple replicates. These replictes likely have different starting Optical Density measurements. In order to estimate AUC, Carrying Capacity in linear units of OD or OD x time, AMiGA would need to make assumptions about the starting OD value which could vary drsastically based on the experimental and measurement noise of your samples. Therefore, AMiGA will only provide estimates of AUC, Carrying Capacity, and Death based on log OD curve. See the section titled “The log-based estimates of AUC, K, and Death are relative!” for more details on how to interpret these estimates.

How can I evaluate the goodness of fit for my growth curves?

For each condition, AMiGA computes a mean squared error (MSE) by comparing model predictions from model input. In particular, the MSE is based on a comparison of GP_Input and GP_Output. See --save-gp-data in Command-line interface for description of these variables. The MSE is included as a column in the summary file.

In addition, the Gaussian Process model optimizes a hyperparameter called Gaussian Noise, \(\sigma_{noise}\). If the user passed the --save-gp-data argument, the predicted Gaussian noise will be included in the _gp_data.txt file. This Gaussian noise is a time-independent parameter and essentially estimates sampling uncertainty. However, be aware that if the user opted to model Gaussian noise as time-dependent parameter with --sample-posterior argument, AMiGA will empirically estimate the noise from the measurement variance. Thus, the Gaussian Noise would be negligibly low becaue the measurement noise is now modelled as another fixed parameter that we refer to as the empirically-estimated measurement noise, \(\sigma(t)\). Whether the user requests to model measurement noise as a time-dependent or time-independent process, both Gaussian noise and empirically-estimated noise will be included as columns in the _gp_data.txt.

AMiGA Anaylsis of Microbial Growth Assays

Pool Replicates

What is pooling?

How to infer summary statistics for pooled replicates?

How to empirically estimate measurement noise?

Where are the linear estimates of AUC, Carrying Capacity, and Death?

How can I evaluate the goodness of fit for my growth curves?