The objective of this tutorial is to showcase for advanced users how to adopt AMiGA code into their own scripts instead of relying on the command-line interface (CLI). Here, I will introdue functions and classes that can be used in a standalone manner for importing data and meta-data, processing growth curves, and fitting growth curves using Gaussian Process (which is implemented using the GPy package).
Import Packages
Impot AMiGA functions
Read example data
GrowthPlate object
GrowthPlate methods
GrowthModel object
Curve object
Pooling example
Estimating confidence intervals for OD
Estimating confidence intervals for parameters
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Arial'
plt.rcParams['font.size'] = '20'
WARNING: You will have to replace the the path to match amiga's location on your local machine.
amiga = '/Users/firasmidani/rab_fm/git/amiga'
sys.path.append(amiga)
from libs.detail import assembleMappings
from libs.read import readPlateReaderFolder
from libs.trim import trimInput, annotateMappings,trimMergeMapping,trimMergeData
from libs.plot import largeTickLabels
from libs.growth import GrowthPlate
from libs.model import GrowthModel
from libs.curve import GrowthCurve
from libs.utils import subsetDf
We will use the "death" example data set included in the Github repository for this tutorial.
# define paths to data and ampping folders
path_data = '{}/examples/manuscript/death/data'.format(amiga)
path_mapping = '{}/examples/manuscript/death/mapping'.format(amiga)
Define plate reader interval (time interval between subsequent measurements) of optical density. You can pass a single default value or file-specific values.
# single default value (units in seconds)
interval = 600
# dictionary that maps each data file name to its time interval value (units in seconds)
interval = {'2020-06-22-R1':600,'2020-06-22-R2':600,'2020-06-22-R3':600}
readPlateReaderFolder()
can be used to parse a single file or multiple files at once. It returns a dictionary with file names as keys and formatted and tabulated data as values.
# read data files
data = readPlateReaderFolder(directory=path_data,interval=interval)
data['2020-06-22-R1']
Time | A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | ... | H3 | H4 | H5 | H6 | H7 | H8 | H9 | H10 | H11 | H12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.105 | 0.099 | 0.096 | 0.100 | 0.099 | 0.094 | 0.094 | 0.080 | 0.082 | ... | 0.070 | 0.071 | 0.073 | 0.069 | 0.065 | 0.070 | 0.069 | 0.069 | 0.071 | 0.073 |
1 | 600.0 | 0.103 | 0.097 | 0.095 | 0.098 | 0.098 | 0.093 | 0.093 | 0.080 | 0.081 | ... | 0.069 | 0.071 | 0.073 | 0.069 | 0.065 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
2 | 1200.0 | 0.102 | 0.097 | 0.095 | 0.098 | 0.098 | 0.093 | 0.092 | 0.080 | 0.080 | ... | 0.069 | 0.071 | 0.073 | 0.068 | 0.065 | 0.070 | 0.069 | 0.069 | 0.071 | 0.073 |
3 | 1800.0 | 0.101 | 0.097 | 0.096 | 0.098 | 0.098 | 0.092 | 0.092 | 0.080 | 0.080 | ... | 0.069 | 0.071 | 0.073 | 0.069 | 0.065 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
4 | 2400.0 | 0.101 | 0.098 | 0.097 | 0.098 | 0.098 | 0.093 | 0.092 | 0.079 | 0.080 | ... | 0.069 | 0.071 | 0.073 | 0.069 | 0.065 | 0.070 | 0.069 | 0.069 | 0.071 | 0.073 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
140 | 84000.0 | 0.342 | 0.210 | 0.149 | 0.381 | 0.390 | 0.494 | 0.497 | 0.373 | 0.385 | ... | 0.074 | 0.071 | 0.073 | 0.070 | 0.066 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
141 | 84600.0 | 0.334 | 0.204 | 0.148 | 0.359 | 0.388 | 0.491 | 0.497 | 0.368 | 0.382 | ... | 0.074 | 0.071 | 0.073 | 0.070 | 0.066 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
142 | 85200.0 | 0.323 | 0.198 | 0.147 | 0.337 | 0.386 | 0.487 | 0.493 | 0.362 | 0.379 | ... | 0.074 | 0.071 | 0.073 | 0.070 | 0.066 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
143 | 85800.0 | 0.313 | 0.194 | 0.147 | 0.317 | 0.385 | 0.481 | 0.491 | 0.356 | 0.376 | ... | 0.074 | 0.071 | 0.073 | 0.070 | 0.066 | 0.069 | 0.068 | 0.068 | 0.071 | 0.073 |
144 | 86400.0 | 0.300 | 0.187 | 0.145 | 0.297 | 0.381 | 0.477 | 0.486 | 0.348 | 0.372 | ... | 0.074 | 0.071 | 0.073 | 0.070 | 0.066 | 0.069 | 0.068 | 0.069 | 0.071 | 0.073 |
145 rows × 97 columns
assembleMappings()
can be used to parse meta-data.
# read mapping files
mappings = assembleMappings(data,path_mapping)
mappings['2020-06-22-R1']
Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | |
---|---|---|---|---|---|---|---|---|---|
Well | |||||||||
A1 | 2020-06-22-R1 | Glucose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
B1 | 2020-06-22-R1 | Glucose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
C1 | 2020-06-22-R1 | Fructose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
E1 | 2020-06-22-R1 | Trehalose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
D12 | 2020-06-22-R1 | Fructose | High | M120 | RT078 | 12 | 0 | 0 | 1 |
E12 | 2020-06-22-R1 | Trehalose | Low | M120 | RT078 | 12 | 0 | 0 | 1 |
F12 | 2020-06-22-R1 | Trehalose | High | M120 | RT078 | 12 | 0 | 0 | 1 |
G12 | 2020-06-22-R1 | Water | None | M120 | RT078 | 12 | 1 | 0 | 1 |
H12 | 2020-06-22-R1 | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 1 |
96 rows × 9 columns
trimInput()
allows you to merge the data and mapping and also trim or subset the data based on user-defined parameters. The subset
entry of the dictionary indicates which conditions I need to keep. The keys (i.e. Substrate
, Concentration
, and Isolate
) must already be included in the meta-data files as column headers. The values must be actual values in these respective columns.
# trim samples based on meta-data
params = {
'subset' : {'Substrate':['Water','Fructose'],
'Concentration':['None','High'],
'Isolate':['CD1007']},
'flag' : {},
'hypothesis' : {},
'interval' : {}
}
sub_data,sub_mappings = trimInput(data,mappings,params)
nsamples = sub_mappings.shape[0]
ntimepoints = sub_data.shape[0]
assert nsamples == (sub_data.shape[1] - 1)
print('Shape of Data: {} time points and {} samples'.format(ntimepoints,nsamples))
print('Shape of Mappings: {} samples by {} variables'.format(*sub_mappings.shape))
Shape of Data: 145 time points and 6 samples Shape of Mappings: 6 samples by 10 variables
mappings['2020-06-22-R1']
Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | |
---|---|---|---|---|---|---|---|---|---|
Well | |||||||||
A1 | 2020-06-22-R1 | Glucose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
B1 | 2020-06-22-R1 | Glucose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
C1 | 2020-06-22-R1 | Fructose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
E1 | 2020-06-22-R1 | Trehalose | Low | CD1007 | RT053 | 1 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
D12 | 2020-06-22-R1 | Fructose | High | M120 | RT078 | 12 | 0 | 0 | 1 |
E12 | 2020-06-22-R1 | Trehalose | Low | M120 | RT078 | 12 | 0 | 0 | 1 |
F12 | 2020-06-22-R1 | Trehalose | High | M120 | RT078 | 12 | 0 | 0 | 1 |
G12 | 2020-06-22-R1 | Water | None | M120 | RT078 | 12 | 1 | 0 | 1 |
H12 | 2020-06-22-R1 | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 1 |
96 rows × 9 columns
full_data,full_mappings = trimInput(data,mappings)
nsamples = full_mappings.shape[0]
ntimepoints = full_data.shape[0]
assert nsamples == (full_data.shape[1] - 1)
print('Shape of Data: {} time points and {} samples'.format(ntimepoints,nsamples))
print('Shape of Mappings: {} samples by {} variables'.format(*full_mappings.shape))
Shape of Data: 145 time points and 288 samples Shape of Mappings: 288 samples by 10 variables
AMiGA formats data file as follows:
sub_data
Sample_ID | Time | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|---|
0 | 0.0 | 0.101 | 0.096 | 0.088 | 0.090 | 0.101 | 0.091 |
1 | 600.0 | 0.099 | 0.095 | 0.087 | 0.089 | 0.098 | 0.089 |
2 | 1200.0 | 0.097 | 0.094 | 0.087 | 0.088 | 0.099 | 0.089 |
3 | 1800.0 | 0.097 | 0.094 | 0.087 | 0.088 | 0.096 | 0.088 |
4 | 2400.0 | 0.098 | 0.094 | 0.087 | 0.088 | 0.098 | 0.088 |
... | ... | ... | ... | ... | ... | ... | ... |
140 | 84000.0 | 0.494 | 0.226 | 0.438 | 0.158 | 0.450 | 0.207 |
141 | 84600.0 | 0.497 | 0.227 | 0.437 | 0.157 | 0.450 | 0.208 |
142 | 85200.0 | 0.488 | 0.229 | 0.435 | 0.157 | 0.454 | 0.208 |
143 | 85800.0 | 0.498 | 0.227 | 0.435 | 0.157 | 0.451 | 0.207 |
144 | 86400.0 | 0.494 | 0.231 | 0.433 | 0.156 | 0.447 | 0.208 |
145 rows × 7 columns
AMiGA formats mapping file as follows:
sub_mappings
Well | Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | |
---|---|---|---|---|---|---|---|---|---|---|
Sample_ID | ||||||||||
0 | D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
1 | G1 | 2020-06-22-R1 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
2 | D1 | 2020-06-22-R2 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
3 | G1 | 2020-06-22-R2 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
4 | D1 | 2020-06-22-R3 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
5 | G1 | 2020-06-22-R3 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
You can easily plot the raw data
The three curves with low OD correspond to growth on minimal media only while the three curves with high OD correspond to growth on minimal media supplemented with fructose.
fig,ax = plt.subplots(figsize=[6,4])
# the first column of sub_data is time, and remaining columns are sample-specific
ax.plot(sub_data.Time,sub_data.iloc[:,1:]);
ax.set(**{'xlabel':'Time (seconds)','ylabel':'OD'});
GrowthPlate
is a class that allows you to save data in a standard format amenable to several tansformations.
The main attributes of GrowthPlate
are time
, data
, and key
.
See ?GrowthPlate
for more info.
plate = GrowthPlate(data=sub_data,key=sub_mappings)
plate
<libs.growth.GrowthPlate at 0x7fd7fdefcf10>
plate.time
Time | |
---|---|
0 | 0.0 |
1 | 600.0 |
2 | 1200.0 |
3 | 1800.0 |
4 | 2400.0 |
... | ... |
140 | 84000.0 |
141 | 84600.0 |
142 | 85200.0 |
143 | 85800.0 |
144 | 86400.0 |
145 rows × 1 columns
plate.data
Sample_ID | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
0 | 0.101 | 0.096 | 0.088 | 0.090 | 0.101 | 0.091 |
1 | 0.099 | 0.095 | 0.087 | 0.089 | 0.098 | 0.089 |
2 | 0.097 | 0.094 | 0.087 | 0.088 | 0.099 | 0.089 |
3 | 0.097 | 0.094 | 0.087 | 0.088 | 0.096 | 0.088 |
4 | 0.098 | 0.094 | 0.087 | 0.088 | 0.098 | 0.088 |
... | ... | ... | ... | ... | ... | ... |
140 | 0.494 | 0.226 | 0.438 | 0.158 | 0.450 | 0.207 |
141 | 0.497 | 0.227 | 0.437 | 0.157 | 0.450 | 0.208 |
142 | 0.488 | 0.229 | 0.435 | 0.157 | 0.454 | 0.208 |
143 | 0.498 | 0.227 | 0.435 | 0.157 | 0.451 | 0.207 |
144 | 0.494 | 0.231 | 0.433 | 0.156 | 0.447 | 0.208 |
145 rows × 6 columns
plate.key
Well | Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | |
---|---|---|---|---|---|---|---|---|---|---|
Sample_ID | ||||||||||
0 | D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
1 | G1 | 2020-06-22-R1 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
2 | D1 | 2020-06-22-R2 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
3 | G1 | 2020-06-22-R2 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
4 | D1 | 2020-06-22-R3 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 |
5 | G1 | 2020-06-22-R3 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 |
You can compute Min OD, Max OD, and Initial OD
# compute min od, max od
plate.computeBasicSummary()
plate.key
Well | Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | OD_Baseline | OD_Min | OD_Max | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample_ID | |||||||||||||
0 | D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.101 | 0.097 | 0.590 |
1 | G1 | 2020-06-22-R1 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.096 | 0.093 | 0.231 |
2 | D1 | 2020-06-22-R2 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.088 | 0.087 | 0.539 |
3 | G1 | 2020-06-22-R2 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.090 | 0.088 | 0.201 |
4 | D1 | 2020-06-22-R3 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.101 | 0.096 | 0.563 |
5 | G1 | 2020-06-22-R3 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.091 | 0.087 | 0.210 |
You can comptue Fold Change relative to control wells
In the meta-data files, I manually added Group
and Control
columns to indicate how fold-changes should be calculated. For each unique Group, there must be at least one Control sample.
plate.computeFoldChange(subtract_baseline=True)
plate.key
Well | Plate_ID | Substrate | Concentration | Isolate | Ribotype | Group | Control | Flag | Subset | OD_Baseline | OD_Min | OD_Max | Fold_Change | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample_ID | ||||||||||||||
0 | D1 | 2020-06-22-R1 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.101 | 0.097 | 0.590 | 3.622222 |
1 | G1 | 2020-06-22-R1 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.096 | 0.093 | 0.231 | 1.000000 |
2 | D1 | 2020-06-22-R2 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.088 | 0.087 | 0.539 | 4.063063 |
3 | G1 | 2020-06-22-R2 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.090 | 0.088 | 0.201 | 1.000000 |
4 | D1 | 2020-06-22-R3 | Fructose | High | CD1007 | RT053 | 1 | 0 | 0 | 1 | 0.101 | 0.096 | 0.563 | 3.882353 |
5 | G1 | 2020-06-22-R3 | Water | None | CD1007 | RT053 | 1 | 1 | 0 | 1 | 0.091 | 0.087 | 0.210 | 1.000000 |
You can adjust the units of time
# convert time to desired units
plate.convertTimeUnits(input='seconds',output='hours')
plate.time
Time | |
---|---|
0 | 0.000000 |
1 | 0.166667 |
2 | 0.333333 |
3 | 0.500000 |
4 | 0.666667 |
... | ... |
140 | 23.333333 |
141 | 23.500000 |
142 | 23.666667 |
143 | 23.833333 |
144 | 24.000000 |
145 rows × 1 columns
fig,ax = plt.subplots()
ax.plot(plate.time,plate.data)
ax.set_title('Time converted from seconds to hours');
ax.set(**{'xlabel':'Time (hours)','ylabel':'OD'});
If your measurements include negative or zero values, AMiGA can raise them to zero
Growth curve analysis requires log tansformation which can not handle non-positive values. In some cases, OD (or other measurements) may be zero or negative. This avoids numerical issues by raising them to the minimum positive value in the curve.
plate.raiseData()
You can then log-transform your curves
plate.logData()
plate.data
Sample_ID | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
0 | -2.292635 | -2.343407 | -2.430418 | -2.407946 | -2.292635 | -2.396896 |
1 | -2.312635 | -2.353878 | -2.441847 | -2.419119 | -2.322788 | -2.419119 |
2 | -2.333044 | -2.364460 | -2.441847 | -2.430418 | -2.312635 | -2.419119 |
3 | -2.333044 | -2.364460 | -2.441847 | -2.430418 | -2.343407 | -2.430418 |
4 | -2.322788 | -2.364460 | -2.441847 | -2.430418 | -2.322788 | -2.430418 |
... | ... | ... | ... | ... | ... | ... |
140 | -0.705220 | -1.487220 | -0.825536 | -1.845160 | -0.798508 | -1.575036 |
141 | -0.699165 | -1.482805 | -0.827822 | -1.851509 | -0.798508 | -1.570217 |
142 | -0.717440 | -1.474033 | -0.832409 | -1.851509 | -0.789658 | -1.570217 |
143 | -0.697155 | -1.482805 | -0.832409 | -1.851509 | -0.796288 | -1.575036 |
144 | -0.705220 | -1.465338 | -0.837018 | -1.857899 | -0.805197 | -1.570217 |
145 rows × 6 columns
fig,ax = plt.subplots()
ax.plot(plate.time,plate.data)
ax.set_title('Log-transformed OD');
ax.set(**{'xlabel':'Time (hours)','ylabel':'log OD'});
Finally, you can adjust the initial OD so that it is centered at zero
This can be done in two ways:
poly=False
, for each curve, simply subtract OD at the first time point from subsequent time pointspoly=True
, for a group of curves, fit the first 5 time points to a polynomial of degree 3, estimate the initial OD basd on this fit, then subtract this value from OD at all time points for all curves. Either way, recall that subtraction in the log-space is equivalent to division in the linear space.
$\log\left[OD(t)\right] - \log\left[OD(0)\right] = \log\left(\frac{OD(t)}{OD(0)}\right)$
This may not seem obvious but it affects the interpretation of your growth curves and growth parameters as I discuss elsewhere.
plate.subtractBaseline(True,poly=True,groupby=['Substrate'])
fig,ax = plt.subplots()
ax.plot(plate.time,plate.data)
ax.set_title('Baseline corrected');
ax.set(**{'xlabel':'Time (hours)','ylabel':'log OD(t) - log OD(0)'});
GrowthModel
class allows you to model growth curve (individual or a group of replicates) as a Gaussian Process.
see ?GrowthModel
for more info.
tmp = plate.time.join(plate.data)
tmp = tmp.melt(id_vars='Time',var_name='Sample_ID',value_name='OD')
tmp = tmp.drop(['Sample_ID'],axis=1) # T*R x 2 (where R is number of replicates)
tmp = tmp.dropna()
tmp
Time | OD | |
---|---|---|
0 | 0.000000 | -0.000442 |
1 | 0.166667 | -0.020443 |
2 | 0.333333 | -0.040852 |
3 | 0.500000 | -0.040852 |
4 | 0.666667 | -0.030595 |
... | ... | ... |
865 | 23.333333 | 0.822661 |
866 | 23.500000 | 0.827480 |
867 | 23.666667 | 0.827480 |
868 | 23.833333 | 0.822661 |
869 | 24.000000 | 0.827480 |
870 rows × 2 columns
Above, we combined time and OD into a single pandas.DataFrame
then we melted the dataframe into a long form where columns include the measured (dependent variable) of OD
and the explanatory (independent variable) of Time
. You can also pass additional categorical covariates as well as additional columns with arbitrary name.
tmp = plate.time.join(plate.data)
tmp = tmp.melt(id_vars='Time',var_name='Sample_ID',value_name='OD')
# let's include susbtrate label as an explantory variable
tmp.Sample_ID = tmp.Sample_ID.replace(plate.key.loc[:,'Substrate'].to_dict())
# variables must be numerical so I am converting substrate labels to numbers
tmp.Sample_ID = tmp.Sample_ID.replace({'Fructose':1,'Water':0})
# avoid any missing data
tmp = tmp.dropna()
# unnecessary, but to make it clear, I am updating column headers
tmp.columns = ['Time','Substrate','OD']
tmp
Time | Substrate | OD | |
---|---|---|---|
0 | 0.000000 | 1 | -0.000442 |
1 | 0.166667 | 1 | -0.020443 |
2 | 0.333333 | 1 | -0.040852 |
3 | 0.500000 | 1 | -0.040852 |
4 | 0.666667 | 1 | -0.030595 |
... | ... | ... | ... |
865 | 23.333333 | 0 | 0.822661 |
866 | 23.500000 | 0 | 0.827480 |
867 | 23.666667 | 0 | 0.827480 |
868 | 23.833333 | 0 | 0.822661 |
869 | 24.000000 | 0 | 0.827480 |
870 rows × 3 columns
We will later return to the multi-dimensional input See Pooling example. For now, we will analyze the simple model where only time is is the independent varaiable. We will also analyze a single growth curve and later will attempt a model that pools replicate samples.
tmp.iloc[0:145,:]
Time | Substrate | OD | |
---|---|---|---|
0 | 0.000000 | 1 | -0.000442 |
1 | 0.166667 | 1 | -0.020443 |
2 | 0.333333 | 1 | -0.040852 |
3 | 0.500000 | 1 | -0.040852 |
4 | 0.666667 | 1 | -0.030595 |
... | ... | ... | ... |
140 | 23.333333 | 1 | 1.586973 |
141 | 23.500000 | 1 | 1.593027 |
142 | 23.666667 | 1 | 1.574752 |
143 | 23.833333 | 1 | 1.595037 |
144 | 24.000000 | 1 | 1.586973 |
145 rows × 3 columns
gm = GrowthModel(df=tmp.iloc[0:145,:].drop('Substrate',axis=1),ARD=True)
gm
<libs.model.GrowthModel at 0x7fd7fecbd100>
df
attribute contains the input data
gm.df
Time | OD | |
---|---|---|
0 | 0.000000 | -0.000442 |
1 | 0.166667 | -0.020443 |
2 | 0.333333 | -0.040852 |
3 | 0.500000 | -0.040852 |
4 | 0.666667 | -0.030595 |
... | ... | ... |
140 | 23.333333 | 1.586973 |
141 | 23.500000 | 1.593027 |
142 | 23.666667 | 1.574752 |
143 | 23.833333 | 1.595037 |
144 | 24.000000 | 1.586973 |
145 rows × 2 columns
x
attribute is the model input for independent variables
gm.x
array([[ 0. ], [ 0.16666667], [ 0.33333333], [ 0.5 ], [ 0.66666667], [ 0.83333333], [ 1. ], [ 1.16666667], [ 1.33333333], [ 1.5 ], [ 1.66666667], [ 1.83333333], [ 2. ], [ 2.16666667], [ 2.33333333], [ 2.5 ], [ 2.66666667], [ 2.83333333], [ 3. ], [ 3.16666667], [ 3.33333333], [ 3.5 ], [ 3.66666667], [ 3.83333333], [ 4. ], [ 4.16666667], [ 4.33333333], [ 4.5 ], [ 4.66666667], [ 4.83333333], [ 5. ], [ 5.16666667], [ 5.33333333], [ 5.5 ], [ 5.66666667], [ 5.83333333], [ 6. ], [ 6.16666667], [ 6.33333333], [ 6.5 ], [ 6.66666667], [ 6.83333333], [ 7. ], [ 7.16666667], [ 7.33333333], [ 7.5 ], [ 7.66666667], [ 7.83333333], [ 8. ], [ 8.16666667], [ 8.33333333], [ 8.5 ], [ 8.66666667], [ 8.83333333], [ 9. ], [ 9.16666667], [ 9.33333333], [ 9.5 ], [ 9.66666667], [ 9.83333333], [10. ], [10.16666667], [10.33333333], [10.5 ], [10.66666667], [10.83333333], [11. ], [11.16666667], [11.33333333], [11.5 ], [11.66666667], [11.83333333], [12. ], [12.16666667], [12.33333333], [12.5 ], [12.66666667], [12.83333333], [13. ], [13.16666667], [13.33333333], [13.5 ], [13.66666667], [13.83333333], [14. ], [14.16666667], [14.33333333], [14.5 ], [14.66666667], [14.83333333], [15. ], [15.16666667], [15.33333333], [15.5 ], [15.66666667], [15.83333333], [16. ], [16.16666667], [16.33333333], [16.5 ], [16.66666667], [16.83333333], [17. ], [17.16666667], [17.33333333], [17.5 ], [17.66666667], [17.83333333], [18. ], [18.16666667], [18.33333333], [18.5 ], [18.66666667], [18.83333333], [19. ], [19.16666667], [19.33333333], [19.5 ], [19.66666667], [19.83333333], [20. ], [20.16666667], [20.33333333], [20.5 ], [20.66666667], [20.83333333], [21. ], [21.16666667], [21.33333333], [21.5 ], [21.66666667], [21.83333333], [22. ], [22.16666667], [22.33333333], [22.5 ], [22.66666667], [22.83333333], [23. ], [23.16666667], [23.33333333], [23.5 ], [23.66666667], [23.83333333], [24. ]])
y
attribute is the model input for dependent variable
gm.y
array([[-4.42420686e-04], [-2.04430874e-02], [-4.08519590e-02], [-4.08519590e-02], [-3.05954589e-02], [-3.05954589e-02], [-4.08519590e-02], [-4.08519590e-02], [-3.05954589e-02], [-3.05954589e-02], [-3.05954589e-02], [-2.04430874e-02], [-2.04430874e-02], [-1.03927515e-02], [-4.42420686e-04], [ 9.40987576e-03], [ 9.40987576e-03], [ 1.91660507e-02], [ 3.83974126e-02], [ 4.78761566e-02], [ 7.57849447e-02], [ 9.39672638e-02], [ 1.11824881e-01], [ 1.38027254e-01], [ 1.71928805e-01], [ 1.96621418e-01], [ 2.28624149e-01], [ 2.67238985e-01], [ 3.11690748e-01], [ 3.54250362e-01], [ 4.01716899e-01], [ 4.47032095e-01], [ 5.08401042e-01], [ 5.71822868e-01], [ 6.20879025e-01], [ 7.02557056e-01], [ 7.82599764e-01], [ 8.35475516e-01], [ 8.73374789e-01], [ 1.02279173e+00], [ 1.02634413e+00], [ 1.15587819e+00], [ 1.20748296e+00], [ 1.30869286e+00], [ 1.38832413e+00], [ 1.45054515e+00], [ 1.48475601e+00], [ 1.51783511e+00], [ 1.59904516e+00], [ 1.62276169e+00], [ 1.65731407e+00], [ 1.69253550e+00], [ 1.69435534e+00], [ 1.72303114e+00], [ 1.72655848e+00], [ 1.74054472e+00], [ 1.75090751e+00], [ 1.75090751e+00], [ 1.75262425e+00], [ 1.76116401e+00], [ 1.74227933e+00], [ 1.74227933e+00], [ 1.75433805e+00], [ 1.74401093e+00], [ 1.73357605e+00], [ 1.73532278e+00], [ 1.76455960e+00], [ 1.72655848e+00], [ 1.75775685e+00], [ 1.72479637e+00], [ 1.74401093e+00], [ 1.72655848e+00], [ 1.73007342e+00], [ 1.72479637e+00], [ 1.73007342e+00], [ 1.71593891e+00], [ 1.73357605e+00], [ 1.71949131e+00], [ 1.71593891e+00], [ 1.73532278e+00], [ 1.69435534e+00], [ 1.73007342e+00], [ 1.69253550e+00], [ 1.71058654e+00], [ 1.69253550e+00], [ 1.67415263e+00], [ 1.70879602e+00], [ 1.70340518e+00], [ 1.71058654e+00], [ 1.70340518e+00], [ 1.69071235e+00], [ 1.70879602e+00], [ 1.69253550e+00], [ 1.69253550e+00], [ 1.70700230e+00], [ 1.68522286e+00], [ 1.69071235e+00], [ 1.70700230e+00], [ 1.67229562e+00], [ 1.67785634e+00], [ 1.69071235e+00], [ 1.66483290e+00], [ 1.67785634e+00], [ 1.67415263e+00], [ 1.63441230e+00], [ 1.68338631e+00], [ 1.67043516e+00], [ 1.67970306e+00], [ 1.63054383e+00], [ 1.66857122e+00], [ 1.66108055e+00], [ 1.64973828e+00], [ 1.65163761e+00], [ 1.63441230e+00], [ 1.62471291e+00], [ 1.62860396e+00], [ 1.64401853e+00], [ 1.66108055e+00], [ 1.64592875e+00], [ 1.64401853e+00], [ 1.63441230e+00], [ 1.62471291e+00], [ 1.64973828e+00], [ 1.63826587e+00], [ 1.62471291e+00], [ 1.61884779e+00], [ 1.62276169e+00], [ 1.61294807e+00], [ 1.62080665e+00], [ 1.60502723e+00], [ 1.58899483e+00], [ 1.61688508e+00], [ 1.61884779e+00], [ 1.60104316e+00], [ 1.60899549e+00], [ 1.60104316e+00], [ 1.60701333e+00], [ 1.59302709e+00], [ 1.58899483e+00], [ 1.59503714e+00], [ 1.58697258e+00], [ 1.59302709e+00], [ 1.57475247e+00], [ 1.59503714e+00], [ 1.58697258e+00]])
If GrowthModel
recognizes that there are additional variables beyond time, it also computes the measurement variance across all replicate samples. Here, it would compute the measurement variances for growth on fructose and measurement variance for grwoth on water. The errors are organized in long-form like the tmp
dataframe.
gm.error
array([[0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.], [0.]])
You can now fit the model using GPy
in the backend
gm.fit()
gm.model
Model: GP regression
Objective: -360.8961273489176
Number of Parameters: 3
Number of Optimization Parameters: 3
Updates: True
GP_regression. | value | constraints | priors |
---|---|---|---|
rbf.variance | 0.8446783061070134 | +ve | |
rbf.lengthscale | 2.792099660898179 | +ve | |
Gaussian_noise.variance | 0.00016762324874182143 | +ve |
The GrowthModel().run()
method not only fits the GrowthModel
, but predicts the growth curve, its first-order derivative, and its second-order derivative.
curve = gm.run()
curve
<libs.curve.GrowthCurve at 0x7fd7fed8eee0>
Because we are jointly modelling multiple curves with differnt starting initial values (baseline), GrowthModel
will assume that that baseline is zero.
$\log\left[OD(t)\right] - \log\left[OD(0)\right] = 0$
This corresponds to
$\log\left(\frac{OD(t)}{OD(0)}\right)=1$
so the growth parameter estimates for AUC, carrying capacity, and death will be relative to th starting OD. See below for more details.
curve.baseline
1.0
The data
method summarizes different formulations of the input and predicted data.
curve.data()
Time | GP_Input | GP_Output | GP_Derivative | OD_Growth_Data | OD_Growth_Fit | OD_Fit | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -0.000442 | -0.014018 | -0.039502 | 0.000000 | 0.000000 | 0.986080 |
1 | 0.166667 | -0.020443 | -0.020189 | -0.034521 | -0.019793 | -0.006067 | 0.980013 |
2 | 0.333333 | -0.040852 | -0.025511 | -0.029305 | -0.039586 | -0.011269 | 0.974811 |
3 | 0.500000 | -0.040852 | -0.029942 | -0.023812 | -0.039586 | -0.015578 | 0.970502 |
4 | 0.666667 | -0.030595 | -0.033431 | -0.018004 | -0.029690 | -0.018959 | 0.967121 |
... | ... | ... | ... | ... | ... | ... | ... |
140 | 23.333333 | 1.586973 | 1.592528 | -0.009288 | 3.889368 | 3.930081 | 4.916161 |
141 | 23.500000 | 1.593027 | 1.590837 | -0.011144 | 3.919058 | 3.921775 | 4.907855 |
142 | 23.666667 | 1.574752 | 1.588765 | -0.013880 | 3.829988 | 3.911614 | 4.897694 |
143 | 23.833333 | 1.595037 | 1.586157 | -0.017576 | 3.928954 | 3.898860 | 4.884940 |
144 | 24.000000 | 1.586973 | 1.582849 | -0.022300 | 3.889368 | 3.882726 | 4.868806 |
145 rows × 7 columns
tmpc = curve.data()
fig,axes = plt.subplots(3,1,figsize=[8,13])
axes = np.ravel(axes)
ax = axes[0]
ax.plot(tmpc.Time,tmpc.GP_Input,label='GP_Input',color='black',lw=5)
ax.plot(tmpc.Time,tmpc.GP_Output,label='GP_Output',color='red',lw=2.5,ls='--')
ax.set_ylabel('ln OD')
ax.legend()
#fig,ax = plt.subplots(figsize=[8,4])
ax = axes[1]
ax.plot(tmpc.Time,tmpc.GP_Derivative,label='GP_Derivative',color='black',lw=5)
ax.set_ylabel('d/dt ln OD')
ax.legend()
#fig,ax = plt.subplots(figsize=[8,4])
ax = axes[2]
ax.plot(tmpc.Time,tmpc.OD_Growth_Data,label='OD_Growth_Data',color='black',lw=5)
ax.plot(tmpc.Time,tmpc.OD_Growth_Data,label='OD_Growth_Fit',color='red',lw=2.5,ls='--')
ax.plot(tmpc.Time,tmpc.OD_Fit,label='OD_Fit',color='blue',lw=2.5,ls='--')
ax.legend()
ax.set_xlabel('Time (hours)')
ax.set_ylabel('OD')
Text(0, 0.5, 'OD')
Grwoth parameters are stored in a dictionary
diauxie
is a binary variable that indicates if diauxic shift was detected (1) or not (0).
df_dx
is a pandas.DataFrame
that contains one row for each unique growth phase with columns being the growth parameters. If there is no diauxie, the dataframe has a single row.
curve.params
{'auc_lin': 76.27266003059205, 'auc_log': 30.23957749859053, 'k_lin': 4.794029686752292, 'k_log': 1.754422704363531, 't_k': 9.666666666666666, 'gr': 0.46112918038997325, 'dr': -0.022300089604496882, 'td': 1.5031518499301135, 't_gr': 6.5, 't_dr': 24.0, 'death_lin': 0.9113038876376032, 'death_log': 0.17157395980796863, 'lagC': 4.33886417420166, 'lagP': 1.5, 't0': 0.0, 'tf': 24.0, 'diauxie': 0, 'df_dx': dx_auc_lin dx_auc_log dx_k_lin dx_k_log dx_t_k dx_gr dx_dr \ 0 76.27266 30.239577 4.79403 1.754423 9.666667 0.461129 -0.0223 dx_td dx_t_gr dx_t_dr dx_death_lin dx_death_log dx_lagC dx_lagP \ 0 1.503152 6.5 24.0 0.911304 0.171574 4.338864 1.5 dx_t0 dx_tf 0 0.0 24.0 }
curve.params['df_dx']
dx_auc_lin | dx_auc_log | dx_k_lin | dx_k_log | dx_t_k | dx_gr | dx_dr | dx_td | dx_t_gr | dx_t_dr | dx_death_lin | dx_death_log | dx_lagC | dx_lagP | dx_t0 | dx_tf | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 76.27266 | 30.239577 | 4.79403 | 1.754423 | 9.666667 | 0.461129 | -0.0223 | 1.503152 | 6.5 | 24.0 | 0.911304 | 0.171574 | 4.338864 | 1.5 | 0.0 | 24.0 |
We are using the tmp variable from earlier
tmp
Time | Substrate | OD | |
---|---|---|---|
0 | 0.000000 | 1 | -0.000442 |
1 | 0.166667 | 1 | -0.020443 |
2 | 0.333333 | 1 | -0.040852 |
3 | 0.500000 | 1 | -0.040852 |
4 | 0.666667 | 1 | -0.030595 |
... | ... | ... | ... |
865 | 23.333333 | 0 | 0.822661 |
866 | 23.500000 | 0 | 0.827480 |
867 | 23.666667 | 0 | 0.827480 |
868 | 23.833333 | 0 | 0.822661 |
869 | 24.000000 | 0 | 0.827480 |
870 rows × 3 columns
conditions = tmp.drop(['Time','OD'],axis=1).drop_duplicates().reset_index(drop=True)
conditions
Substrate | |
---|---|
0 | 1 |
1 | 0 |
gms = {}
curves = {}
for idx,row in conditions.iterrows():
print(row.to_dict())
df_cond = subsetDf(tmp,row.to_dict())
df_cond = df_cond.drop(labels=row.keys(),axis=1)
gms[idx] = GrowthModel(df=df_cond,ARD=True)
curves[idx] = gms[idx].run()
print(gms[idx].model)
print()
{'Substrate': 1} Name : GP regression Objective : -672.7603429906137 Number of Parameters : 3 Number of Optimization Parameters : 3 Updates : True Parameters: GP_regression. | value | constraints | priors rbf.variance | 0.8872139291158524 | +ve | rbf.lengthscale | 3.2030737519782266 | +ve | Gaussian_noise.variance | 0.0021326326464497967 | +ve | {'Substrate': 0} Name : GP regression Objective : -567.6865741276025 Number of Parameters : 3 Number of Optimization Parameters : 3 Updates : True Parameters: GP_regression. | value | constraints | priors rbf.variance | 0.21469910943594195 | +ve | rbf.lengthscale | 6.450779141771983 | +ve | Gaussian_noise.variance | 0.003895365122942465 | +ve |
curves[0].data()
Time | GP_Input | GP_Output | GP_Derivative | OD_Growth_Data | OD_Growth_Fit | OD_Fit | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | -0.000442 | -0.005236 | -0.051763 | 0.000000 | 0.000000 | 0.994777 |
1 | 0.166667 | -0.020443 | -0.013150 | -0.043123 | -0.019793 | -0.007841 | 0.986936 |
2 | 0.333333 | -0.040852 | -0.019590 | -0.034117 | -0.039586 | -0.014177 | 0.980600 |
3 | 0.500000 | -0.040852 | -0.024513 | -0.024943 | -0.039586 | -0.018992 | 0.975785 |
4 | 0.666667 | -0.030595 | -0.027906 | -0.015785 | -0.029690 | -0.022297 | 0.972480 |
... | ... | ... | ... | ... | ... | ... | ... |
430 | NaN | 1.496606 | NaN | NaN | 3.466948 | NaN | NaN |
431 | NaN | 1.496606 | NaN | NaN | 3.466948 | NaN | NaN |
432 | NaN | 1.505456 | NaN | NaN | 3.506650 | NaN | NaN |
433 | NaN | 1.498826 | NaN | NaN | 3.476873 | NaN | NaN |
434 | NaN | 1.489917 | NaN | NaN | 3.437171 | NaN | NaN |
435 rows × 7 columns
Similar to the uni-dimentional data, the data
method summarizes different formulations of the input and predicted data.
HOWEVER, here GP_Input
and OD_Growth_Data
are n
-times lager than the other variables, where n
is the number of pooled samples or pooled replicates.
fig,ax = plt.subplots(figsize=[6,3])
ax.plot(curves[0].data().GP_Input)
ax.plot(curves[0].data().OD_Growth_Data)
[<matplotlib.lines.Line2D at 0x7fd7fe904550>]
Here it is plotted differently.
n_samples = int(curves[0].data().shape[0] / curves[0].data().dropna().shape[0])
time = list(curves[0].data().Time.dropna()) #* n_samples
# create dummy Sample ID variable to simplify plotting
sample_id = np.ravel([[ii]*len(time) for ii in range(n_samples)])
# repeat time for number of samples
time = time * n_samples
# put together into a pandas.DataFrame to simplify plotting.
tmpc = pd.DataFrame([sample_id,time,curves[0].data().GP_Input,curves[0].data().OD_Growth_Data],
index=['Sample_ID','Time','GP_Input','OD_Growth_Data']).T
fig,ax = plt.subplots(figsize=[6,4])
tmp_gp_input = tmpc.pivot(index='Time',columns='Sample_ID',values='GP_Input')
tmp_od_grwoth_data = tmpc.pivot(index='Time',columns='Sample_ID',values='OD_Growth_Data')
ax.plot(tmp_od_grwoth_data.index.values,tmp_od_grwoth_data.values,
color=(0,0,1,0.8),label='Input Data');
ax.plot(tmp_gp_input.index.values, tmp_gp_input.values,
color=(0,0,0,0.8),label='Log Input Data');
ax.set_xlabel('Time (Hours)',fontsize=20)
ax.set_ylabel('log OD',fontsize=20)
largeTickLabels(ax,fontsize=20)
ax.legend(bbox_to_anchor=(1.05,.85),fontsize=15)
<matplotlib.legend.Legend at 0x7fd7fdda9070>
However we are interested in simply the model estimates which we can store into a single pandas.DataFrame as shown below
data_ls = []
for idx,condition in conditions.iterrows():
time = pd.DataFrame(gms[idx].x_new,columns=['Time'])
# mean and covariance of the latent function
mu0, var0 = np.ravel(gms[idx].y0), np.ravel(np.diag(gms[idx].cov0))
# mean and covariance of the derivative latent function
mu1, var1 = np.ravel(gms[idx].y1), np.ravel(np.diag(gms[idx].cov1))
# Gaussian noise hyperparameter, i.e. time-independent noise term
sigma_noise = np.ravel([gms[idx].noise]*time.shape[0])
mu_var = pd.DataFrame([mu0,var0,mu1,var1,sigma_noise],
index=['mu','Sigma','mu1','Sigma1','Noise']).T
gp_data = pd.DataFrame([list(condition.values)]*len(mu0),columns=condition.keys())
gp_data = gp_data.join(time).join(mu_var)
data_ls.append(gp_data)
gp_data = pd.concat(data_ls,sort=False).reset_index(drop=True)
gp_data
Substrate | Time | mu | Sigma | mu1 | Sigma1 | Noise | |
---|---|---|---|---|---|---|---|
0 | 1 | 0.000000 | -0.005236 | 0.000311 | -0.051763 | 0.001507 | 0.002133 |
1 | 1 | 0.166667 | -0.013150 | 0.000180 | -0.043123 | 0.000986 | 0.002133 |
2 | 1 | 0.333333 | -0.019590 | 0.000117 | -0.034117 | 0.000624 | 0.002133 |
3 | 1 | 0.500000 | -0.024513 | 0.000091 | -0.024943 | 0.000384 | 0.002133 |
4 | 1 | 0.666667 | -0.027906 | 0.000083 | -0.015785 | 0.000235 | 0.002133 |
... | ... | ... | ... | ... | ... | ... | ... |
285 | 0 | 23.333333 | 0.747025 | 0.000114 | -0.013299 | 0.000111 | 0.003895 |
286 | 0 | 23.500000 | 0.744732 | 0.000139 | -0.014223 | 0.000133 | 0.003895 |
287 | 0 | 23.666667 | 0.742283 | 0.000172 | -0.015165 | 0.000158 | 0.003895 |
288 | 0 | 23.833333 | 0.739676 | 0.000217 | -0.016125 | 0.000187 | 0.003895 |
289 | 0 | 24.000000 | 0.736907 | 0.000274 | -0.017100 | 0.000220 | 0.003895 |
290 rows × 7 columns
Above, we have a dataframe that has meta-data columns indicating the unique conditions, and columns that indicate model predictions:
def mean_and_bands(ax,df,add_noise=True,derivative=False,confidence=95):
alpha = (100 - confidence)/100
z_value = (1 - alpha/2)
from scipy.stats import norm
scaler = norm.ppf(z_value) # 95% confidence intervals
if not derivative:
mu = df.mu
cov = df.Sigma
else:
mu = df.mu1
cov = df.Sigma1
if add_noise:
Sigma = cov + df.Noise
else:
Sigma = cov
x_time = df.Time
y_avg = mu
y_low = mu - scaler*np.sqrt(Sigma)
y_upp = mu + scaler*np.sqrt(Sigma)
return (x_time,y_avg,y_low,y_upp)
fig,ax = plt.subplots(figsize=[6,3])
for idx, row in conditions.iterrows():
tmp_gp = subsetDf(gp_data,row.to_dict())
x_time,y_avg,y_low,y_upp = mean_and_bands(ax,tmp_gp,add_noise=True)
ax.plot(x_time,y_avg,label=row.to_dict())
ax.fill_between(x=x_time,y1=y_low,y2=y_upp,alpha=0.1)
ax.legend(bbox_to_anchor=(1.05,0.65))
ax.set_xlabel('Time (Hours)',fontsize=15)
ax.set_ylabel('log OD',fontsize=15)
largeTickLabels(ax,fontsize=15)
ax.legend(loc=4,fontsize=15)
ax.set_title('Latent Function',fontsize=15);
This is how you can plot the prediction of the derivative of the latent function
fig,ax = plt.subplots(figsize=[6,3])
for idx, row in conditions.iterrows():
tmp_gp = subsetDf(gp_data,row.to_dict())
x_time,y_avg,y_low,y_upp = mean_and_bands(ax,tmp_gp,add_noise=True,derivative=True)
ax.plot(x_time,y_avg,label=row.to_dict())
ax.fill_between(x=x_time,y1=y_low,y2=y_upp,alpha=0.1)
ax.legend(bbox_to_anchor=(1.05,0.65))
ax.set_xlabel('Time (Hours)',fontsize=15)
ax.set_ylabel('d(log OD)/dt',fontsize=15)
largeTickLabels(ax,fontsize=15)
ax.legend(loc=1,fontsize=15)
ax.set_title('Derivative of Latent Function',fontsize=15);
Here are the parameters: you can extract only the means or the means and standard deviations.
curves[0].params
{'auc_lin': 74.08227944357785, 'auc_log': 29.9000213026098, 'k_lin': 4.826653566677286, 'k_log': 1.7615460660033073, 't_k': 9.833333333333334, 'gr': 0.4351388239735125, 'dr': -0.031763527580060735, 'td': 1.5929334326696118, 't_gr': 6.5, 't_dr': 10.666666666666666, 'death_lin': 1.0837211996563036, 'death_log': 0.20599226291130046, 'lagC': 4.2228335721544195, 'lagP': 1.3333333333333333, 't0': 0.0, 'tf': 24.0, 'diauxie': 0, 'df_dx': dx_auc_lin dx_auc_log dx_k_lin dx_k_log dx_t_k dx_gr dx_dr \ 0 74.082279 29.900021 4.826654 1.761546 9.833333 0.435139 -0.031764 dx_td dx_t_gr dx_t_dr dx_death_lin dx_death_log dx_lagC \ 0 1.592933 6.5 10.666667 1.083721 0.205992 4.222834 dx_lagP dx_t0 dx_tf 0 1.333333 0.0 24.0 }
curves[0].sample().posterior
{'mean(auc_lin)': 74.10920229508673, 'mean(auc_log)': 29.909787281052278, 'mean(k_lin)': 4.826549276315117, 'mean(k_log)': 1.7616705912045432, 'mean(t_k)': 9.77, 'mean(gr)': 0.43516704639222903, 'mean(dr)': -0.04250706437113829, 'mean(td)': 1.5933109835105137, 'mean(t_gr)': 6.468333333333334, 'mean(t_dr)': 16.451666666666668, 'mean(death_lin)': 1.0807124949457554, 'mean(death_log)': 0.20545658824209517, 'mean(lagC)': 4.218940790579507, 'mean(lagP)': 1.3816666666666666, 'std(auc_lin)': 0.47017564084833186, 'std(auc_log)': 0.05223884098437252, 'std(k_lin)': 0.04345900582303403, 'std(k_log)': 0.006464686892533931, 'std(t_k)': 0.12027839013537672, 'std(gr)': 0.007583535801856875, 'std(dr)': 0.019480464747271952, 'std(td)': 0.02787967442728814, 'std(t_gr)': 0.07375985543538997, 'std(t_dr)': 5.855024365916592, 'std(death_lin)': 0.0848040693320632, 'std(death_log)': 0.01757131804219809, 'std(lagC)': 0.04009110330066765, 'std(lagP)': 0.2275805995530579}
curves[1].sample().posterior
{'mean(auc_lin)': 17.939726555262727, 'mean(auc_log)': 12.282595533668786, 'mean(k_lin)': 1.1984968020718156, 'mean(k_log)': 0.7886102060637665, 'mean(t_k)': 15.148333333333333, 'mean(gr)': 0.11726505720599784, 'mean(dr)': -0.017902999132222958, 'mean(td)': 5.915721239766617, 'mean(t_gr)': 7.621666666666667, 'mean(t_dr)': 22.91833333333333, 'mean(death_lin)': 0.11128637148719928, 'mean(death_log)': 0.05200550227790204, 'mean(lagC)': 4.639883952243414, 'mean(lagP)': 2.466666666666667, 'std(auc_lin)': 0.4530562966680967, 'std(auc_log)': 0.07339149503021895, 'std(k_lin)': 0.02147779307782217, 'std(k_log)': 0.005843482820310608, 'std(t_k)': 1.3706673397396452, 'std(gr)': 0.003343309427975638, 'std(dr)': 0.010388819559635084, 'std(td)': 0.16937329540452845, 'std(t_gr)': 0.1738437834810161, 'std(t_dr)': 2.2391395202724174, 'std(death_lin)': 0.03263552233775529, 'std(death_log)': 0.01557920368807274, 'std(lagC)': 0.1106022696053838, 'std(lagP)': 0.1895117478716537}
def getConfInts(mean,std,conf=0.95):
from scipy.stats import norm
alpha = 1-conf
z_value = (1-alpha/2)
scaler = norm.ppf(z_value)
low, upp = mean-scaler*std, mean+scaler*std
return low,upp
tmpc = curves[0].sample().posterior
getConfInts(tmpc['mean(auc_lin)'],tmpc['std(auc_lin)'],conf=0.975)
(73.06644897171864, 75.1391559675049)
params = [ii.split('(')[1][:-1] for ii in tmpc.keys() if 'mean' in ii ]
for param in params:
mu = tmpc['mean({})'.format(param)]
std = tmpc['std({})'.format(param)]
low,high = getConfInts(mu,std,conf=0.975)
print('{}:\t{:.2f}\t({:.2f},{:.2f})'.format(param,mu,low,high))
auc_lin: 74.10 (73.07,75.14) auc_log: 29.90 (29.79,30.01) k_lin: 4.82 (4.72,4.93) k_log: 1.76 (1.74,1.78) t_k: 9.78 (9.55,10.00) gr: 0.43 (0.42,0.45) dr: -0.04 (-0.08,-0.00) td: 1.59 (1.54,1.64) t_gr: 6.47 (6.33,6.62) t_dr: 16.11 (1.93,30.30) death_lin: 1.09 (0.91,1.27) death_log: 0.21 (0.17,0.24) lagC: 4.22 (4.14,4.30) lagP: 1.32 (0.70,1.94)