Pool Replicates
Table of Contents
- What is pooling?
- How to infer summary statistics for pooled replicates?
- How to empirically estimate measurement noise?
- Where are the linear estimates of AUC, Carrying Capacity, and Death?
- How can I evaluate the goodness of fit for my growth curves?
AMiGA can model individual growth curves to estimate growth parameters. It can also pool experimental and/or technical replicates to model mean growth behavior and estimate summary statistics for growth parameters, in particular mean, standard deviation, and confidence intervals.
What is pooling?
In this example, I have in my working directory (/Users/firasmidani/xperiment/
) multiple data files in the data
sub-folder. As previously shown, you can analyze all growth curves separately without pooling.
python $amiga/amiga.py fit -i /Users/firasmidani/experiment/ -o "split_merged" --merge-summary
Below, I however ask AMiGA
to pool replicates based on all unique combinations of Isolate
and Substrate
. If I am analyzing the growth of 2 isolates on 96 substrates, I have 192 unique combinations. For each unique condition (Isolate
x Substrate
), AMiGA
will find all replicate samples and model them jointly. To pool these replicates, you must pass the necessary meta-data to AMiGA. See Preparing metadata. Here, let us assume that my meta-data already includes columns for Isolate
and Substrate
.
python $amiga/amiga.py fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate"
How to infer summary statistics for pooled replicates?
Below, the --sample-posterior
argument asks AMiGA to infer summary statistics for the estimated growth parameters. If not called, AMiGA
simply provides a single estimate of each growth parameter based on the mean latent function of the GP model. If called, AMiGA
samples from the posterior distribution of the predicted curve (n
times, where n
is defined in the libs/config.py
file, see Configure default parameters for more details), computes the growth parameter for each sampled curve, then returns the mean and the standard deviation of the mean of each growth parameter in the summary
file.
python $amiga/amiga.py fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate" --sample-posterior
How to empirically estimate measurement noise?
Gaussian Process regression will model measurement noise as a single time-independent term (\(\sigma_{noise}\)). This noise term influences the standard deviation and thus confidence intervals of the predicted growth parameters. However, measurement noise tends to vary with time and is often smaller during lag and exponential phase and larger during stationary and death phases. The user has the option of empirically modelling the measurement noise. This measurement noise is computed as the variance across all replicates at each time point, which is then smoothed over time with a Gaussian filter. The user can opt for inferring summary statistics with empirically estimated measurement noise using the --fix-noise
argument. While fitting growth with empirically estimated measurements fine-tunes confidence intervals, it is more prone to over-fitting and should be used with caution.
python $amiga/amiga.py fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate" --sample-posterior --fix-noise
Of course, you can further specify the analysis with additional command line arguments.
python $amiga/amiga.py fit -i /Users/firasmidani/experiment/ -o "pooled_analysis" --pool-by "Isolate,Substrate,PM" --skip-first-n 1 -tss 1 --save-gp-data --save-cleaned-data --save-mapping-data --sample-posterior --fix-noise --verbose
Where are the linear estimates of AUC, Carrying Capacity, and Death?
Pooling combines the measurmeents across multiple replicates. These replictes likely have different starting Optical Density measurements. In order to estimate AUC, Carrying Capacity in linear units of OD or OD x time, AMiGA
would need to make assumptions about the starting OD value which could vary drsastically based on the experimental and measurement noise of your samples. Therefore, AMiGA
will only provide estimates of AUC, Carrying Capacity, and Death based on log OD curve. See the section titled “The log-based estimates of AUC, K, and Death are relative!” for more details on how to interpret these estimates.
How can I evaluate the goodness of fit for my growth curves?
For each condition, AMiGA
computes a mean squared error (MSE) by comparing model predictions from model input. In particular, the MSE is based on a comparison of GP_Input
and GP_Output
. See --save-gp-data
in Command-line interface for description of these variables. The MSE
is included as a column in the summary file.
In addition, the Gaussian Process model optimizes a hyperparameter called Gaussian Noise, \(\sigma_{noise}\). If the user passed the --save-gp-data
argument, the predicted Gaussian noise will be included in the _gp_data.txt
file. This Gaussian noise is a time-independent parameter and essentially estimates sampling uncertainty. However, be aware that if the user opted to model Gaussian noise as time-dependent parameter with --sample-posterior
argument, AMiGA
will empirically estimate the noise from the measurement variance. Thus, the Gaussian Noise would be negligibly low becaue the measurement noise is now modelled as another fixed parameter that we refer to as the empirically-estimated measurement noise, \(\sigma(t)\). Whether the user requests to model measurement noise as a time-dependent or time-independent process, both Gaussian noise and empirically-estimated noise will be included as columns in the _gp_data.txt
.