Table of Contents


Heatmaps are very useful for identifying patterns in your data. You can use the heatmap function of AMiGA to visually compare a specific growth parameter under different conditions. For example, the below will plot a heatmap with normalized growth rate norm(auc) as the value, Strain on the x-axis, and Substrate on the y-axis. The heatmap will be saved as a PDF file in the summary folder.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC -x Isolate -y Substrate -v "norm(auc)" --title "Normalized AUC"

Here, all of the above arguments are required except for the title which is optional. This will generate a file Biolog_ERandomii_Norm_AUC.pdf which will be a very heatmap of 191 rows (substrates) and 4 columns (isolates). This is a very large heatmap so I am limiting its display below. We can however use the --filter argument to reduce the size of the heatmap and focus on substrates that matter, i.e. ones that support growth.

amiga heatmap


Reducing a heatmap by filtering for specific vlaues

Let us assume that you are analyzing Biolog PM plate where and you only want to select for substrates that support the growth of any of the isolates above a certain threshold, then you should use the -f or --filter argumnet. See the help section below for more details.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered -x Isolate -y Substrate -v "norm(auc)" --title "Normalized AUC" --filter "row any > 1.2" --verbose

amiga heatmap


Plotting a heatmap on data with multiple replicates

If the summary file included multiple replicates of the same condition, you will have to indicate whether you want to average these replicates using either the mean or median with the -p or --operation argument. In the next example, we will plot the Strain instead of Isolates for the columns. Because our data contains two isolates per strain, we need to aggregate them so we will prompt the heatmap function to use the --operation or -p of mean (i.e. averaging values across replicate conditions). An alternative is using the median.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --verbose

amiga heatmap


Adjusting the aesthetics of a heatmap

This is done by passing the --kwargs argument which directly pipes user arguments to the Seaborn heatmap function (see below). The key and value of each agument are separted by a colon. For example,

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2" --verbose

Some of the most useful arguments include:

  • cmap: defines the colormap used by the heatmap. See list of available colormaps here.
  • vmin, center, and vmax: define the minimum, center, and maximum value of the colorbar.
  • row_cluster and col-cluster: See next section

WARNING: not all of the arguments will be accepted. AMiGA here is limited to accepting boolean values (i.e. True or False), strings (i.e. alpha-numeric characters), and numebrs. Some of the arguments may also be over-ridden by AMiGA. In future updates, I may give more flexibility in accepting more of these arguments.

amiga heatmap


Clustering a heatmap by rows or columns

To cluster the heatmap either by rows and/or columns, make sure to include either row_cluster:True and/or col_cluster:True in the --kwargs argument. For example,

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys_Clustered -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2;row_cluster:True;col_cluster:True" --verbose

amiga heatmap


Users can also cluster either rows or the columns with the --cluster-x or --cluster-y arguments, respectively.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys_Clustered -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2" --verbose --cluster-x --cluster-y


Sorting a heatmap by rows or columns

If the heatmap has a single row or column, the cells will be sorted by value. The user can also request sorting by a specific meta-data variable. This variable must be included in the input summary file as a column.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys_Clustered -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2" --verbose --sort-x-by Isolate --cluster-y-by Substrate_Type

This hypothetical example would sort the x-axis (columns) by Isolate name and the y-axis (rows) by Substrate_Typ. In this example, Substrate_Type is for example either Monosaccharide,Disaccharide,Sugar alcohol, …, etc.


Adding colors to label either the rows or columns

This is useful to evaluate if your data clustered in a certain way. Here, we want to plots the four isolates as column but distinguish their Strain with a color label. Notice that the -x arugment is now Strain and we added two more arguments --color-x-by and --color-scheme-x.

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys_Clustered -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2;row_Cluster:True;col_cluster:True" --color-x-by "Strain" --color-scheme-x "USA_1995:navy;MEX_2020:green" --verbose
  • --color-y-by indicates that the user want to color rows (on y-axis) by the Strain value corresponding to each row.
  • --color-scheme explicitly assigns colors to each strain. Color assignments are sepearted by a semicolon (;), and each Strain (because it is the argument for --color-y-by) is followed by a colon (:) and then the color of choice. See list of colors available here.

amiga heatmap

The legend is saved as a separate PDF.

amiga heatmap


Users can also add colors to label the rows

python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_normalized.txt -o Biolog_ERandomii_Norm_AUC_Filtered_Average_Greys_Clustered_XY_Labeled -x Strain -y Substrate -v "norm(auc)" -p "mean" --title "Normalized AUC" --filter "row any > 1.2" --kwargs "cmap:gray;vmin:1;vmax:2;row_Cluster:True;col_cluster:True" --color-x-by "Strain" --color-scheme-x "USA_1995:navy;MEX_2020:green" --color-y-by "Substrate" --color-scheme-y "D-Sorbitol:gold;2-Aminoethanol:red;L-Lactic Acid:orange;Sorbic Acid:purple" --verbose

amiga heatmap

The legends are saved as a separate PDF.

amiga heatmap

amiga heatmap
Users cans also define colors using a text file

If you have lots of potential labels and you don’t want to pass them through the command line, create a tab-delimited table wand save it as a text file anywhere you would like. The header for the color column must be “Color”, but the header for the first column does not matter. For example,

head /home/outbreaks/erandomii/colors.txt
Strain Color
USA_1995 navy
MEX_2020 green
python amiga.py heatmap -i /home/outbreaks/erandomii/summary/Biolog_ERandomii_summary_filtered.txt -o Biolog_ERandomii_norm_gr_clustered -x Strain -y Substrate -v norm(gr) -p 'mean' --kwargs "cmap:gray;vmin:1;vmax:2;row_cluster:True;col_cluster:True" --title "Normalized Growth Rate" --color-y-by "Strain" --color-file-y "/home/outbreaks/erandomii/colors.txt" --color-x-by "PM" --color-scheme-x "1:lightgray;1:darkgray" --verbose
  • Here I used the --color-file-y instead to pass the text file. You can do th same thing for labeling the coloumns with --color-file-x but you will have to create a different text file that maps colors to the variable on the x-axis.

Users cans also adjust the proportion of the color bars or colorbar

Users can also adjust the propotion of the figure dedicated to the color labels. See --color-x-ratio, --color-y-ratio, and --colorbar-ratio.


Can I simply ask AMiGA to labels rows or columns without specifying colors?

Not at the moment. This is a feature that I may include at a later time.


What if I forgot to specify a color for some of the labels?

AMiGA will generate a random color that is very distinct from the colors that you did specify. It will use this random color to label any of the rows or columns that you did not specify. You can also pass your choice for labeling colors of undefined labels, for example --missing-color white will keep the label white for any labels for which you do not explicilty define a color.


Where is the legend or key for the labels?

AMiGA saves the legends as separate PDFs. Please look for it in the same location of your summary text file. It will have the suffix of either _x_legend.pdf o _y_legend.pdf.


Command-Line arguments

To see the full list of arguments that amiga heatmap will accept, run

python amiga.py heatmap --help

which will return the following message

usage: amiga.py [-h] -i INPUT -o OUTPUT [-s SUBSET] -v VALUE -x X_VARIABLE -y
                Y_VARIABLE [-p {mean,median}] [-f FILTER] [-t TITLE]
                [--kwargs KWARGS] [--verbose] [--save-filtered-table]
                [--width-height WIDTH_HEIGHT WIDTH_HEIGHT]
                [--colorbar-ratio COLORBAR_RATIO] [--x-rotation X_ROTATION]
                [--highlight-labels HIGHLIGHT_LABELS]
                [--color-x-by COLOR_X_BY] [--color-y-by COLOR_Y_BY]
                [--color-file-x COLOR_FILE_X] [--color-file-y COLOR_FILE_Y]
                [--color-scheme-x COLOR_SCHEME_X]
                [--color-scheme-y COLOR_SCHEME_Y]
                [--color-x-ratio COLOR_X_RATIO]
                [--color-y-ratio COLOR_Y_RATIO]
                [--missing-color MISSING_COLOR] [--cluster-x] [--cluster-y]
                [--sort-x] [--sort-y] [--sort-x-by SORT_X_BY]
                [--sort-y-by SORT_Y_BY] [--keep-rows-missing-data]
                [--keep-columns-missing-data]
                [--x-tick-labels-scale X_TICK_LABELS_SCALE]
                [--y-tick-labels-scale Y_TICK_LABELS_SCALE]
                [--color-bar-labels-scale COLOR_BAR_LABELS_SCALE]

Plot a heatmap

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
  -o OUTPUT, --output OUTPUT
  -s SUBSET, --subset SUBSET
  -v VALUE, --value VALUE
  -x X_VARIABLE, --x-variable X_VARIABLE
  -y Y_VARIABLE, --y-variable Y_VARIABLE
  -p {mean,median}, --operation {mean,median}
  -f FILTER, --filter FILTER
  -t TITLE, --title TITLE
  --kwargs KWARGS
  --verbose
  --save-filtered-table
  --width-height WIDTH_HEIGHT WIDTH_HEIGHT
  --colorbar-ratio COLORBAR_RATIO
                        Proportion of figure size devoted to color bar.
                        Default is 0.1
  --x-rotation X_ROTATION
  --highlight-labels HIGHLIGHT_LABELS
  --color-x-by COLOR_X_BY
  --color-y-by COLOR_Y_BY
  --color-file-x COLOR_FILE_X
  --color-file-y COLOR_FILE_Y
  --color-scheme-x COLOR_SCHEME_X
  --color-scheme-y COLOR_SCHEME_Y
  --color-x-ratio COLOR_X_RATIO
                        Proportion of the heatmap devoted to the column color
                        labels. Default is 0.1
  --color-y-ratio COLOR_Y_RATIO
                        Proportion of the heatmap devoted to the row color
                        labels. Default is 0.1
  --missing-color MISSING_COLOR
  --cluster-x
  --cluster-y
  --sort-x-by SORT_X_BY
  --sort-y-by SORT_Y_BY
  --keep-rows-missing-data
                        Drops columsn that have any missing data
  --keep-columns-missing-data
                        Drops rows that have any missing data
  --x-tick-labels-scale X_TICK_LABELS_SCALE
                        Must be between 0 (smallest) and 1 (largest).
  --y-tick-labels-scale Y_TICK_LABELS_SCALE
                        Must be between 0 (smallest) and 1 (largest).
  --color-bar-labels-scale COLOR_BAR_LABELS_SCALE
                        Must be between 0 (smallest) and 1 (largest).

-i or --input

Accepts a string which is the path to a working directory of summary files or a specific summary file. The path can be relative or absolute.

-o or --output

Accepts a string which defines the filename (no extension and no path) for your summary or data files. If no filename is passed , AMiGA will instead give output files a unique time stamp (e.g. 2020-08-26_09-09-59).

-s or --subset

Defines which files to be included in the analysis. See Data Subsetting for more details.

-v or --value

Defines which growth parameter to summaize with a heatmap.

-x or --x_variable

Defines the x-axis variable.

-y or --y_variable

Defines the y-axis variable.

-p or --operation

Defines the operation to be used for grouping replicates. Options are mean or median. If not used, AMiGA will assume that summary data do not include any replicates. If the summary data do include replicates, and no opeation is passed, this will result in a fatal error.

-f or --filter

Dictates how the heatmap will be reduced to smaller size by selecting for specific values. This uses an ad hoc symbolic language to communicate with AMiGA. Each filtering criteria requires four items. For example, row any >= 1.2 dictates that

  • for each row in the heatmap (alternative is col),
  • if any of the values (alterantive is all)
  • is >= (alternatives are >, <, <=, =)
  • relative to a certain value, here 1.2.

Multiple criteria can be concatenated with an OR. For example, row any >= 1.2 OR row any = 1 OR row any <= 0.8 argument will force AMiGA to only display rows where any of the values are either larger than or equal to 1.2, equal to 1.0, or lower than or equal to 0.8.

-t or --title

Text argument for title of heatmap.

--kwargs

User can pass here some arguments to the Seaborn heatmap function. Arguments are separated by semi-colons. The key and value of each agument are separted by a colon. For example, center:1;vmin:0.


See more details for these arguments in Command Line Interface

--save-filtered-table

Save the input data after it has been reduced or filtered using the --filter argument. File will have the suffix _filtered.txt.

--width-height

Two numbers that define the width and height in inches (e.g. --width-height 5 10)

--colorbar-ratio

Ratio of width to height for the colorbar. Default is 0.1.

--color-x-by

Specifies the column in the summary file which you would like to use for labeling rows in the heatmap.

--color-y-by

Specifies the column in the summary file which you would like to use for labeling rows in the heatmap.

--color-file-x

Used to pass a tab-delimited text file containing a table that maps colors to values of the variable defined by --color-x-by.

--color-file-y

Used to pass a tab-delimited text file containing a table that maps colors to values of the variable defined by --color-y-by.

--color-scheme-x

Explicitly assigns colors to each value of the variable defined by --color-x-by. Color assignments are sepearted by a semicolon (;), and each Strain (because it is the argument for --color-x-by) is followed by a colon (:) and then the color of choice. See list of colors available here

--color-scheme-y

Explicitly assigns colors to each value of the variable defined by --color-y-by. Color assignments are sepearted by a semicolon (;), and each Strain (because it is the argument for --color-y-by) is followed by a colon (:) and then the color of choice. See list of colors available here

--color-x-ratio Defines the proportion of the heatmap devoted to the column color labels. Default is 0.1

--color-y-ratio Defines the roportion of the heatmap devoted to the row color labels. Default is 0.1.

--missing-color

Defines the color of the label for any samples for which the user does not define the color.

--cluster-x A boolean argument that requests clustering for rows.

--cluster-y A boolean argument that requests clustering for columns.

--sort-x-by Defines which variable to use for sorting the columns. Variable must be included as a column in the input summary file. Sorting will not occur if user also requests clustering wiht --cluster-x.

--sort-y-by Defines which variable to use for sorting the rows. Variable must be included as a column in the input summary file. Sorting will not occur if user also requests clustering wiht --cluster-y.

--keep-rows-missing-data

If your heatmap (matrix) has any missing values, you can opt to drop columns with missing values if you would like to retain all rows. Otherwise, rows with missing values will be dropped.

--keep-columns-missing-data

If your heatmap (matrix) has any missing values, you can opt to drop rows with missing values if you would like to retain all columns. Otherwise, rows with missing values will be dropped.

--x-tick-labels-scale Must between 0 and 1. Sets the size of the column labels with 1 being the maximum possible size.

--y-tick-labels-scale Must between 0 and 1. Sets the size of the row labels with 1 being the maximum possible size.

--color-bar-labels-scale Must between 0 and 1. Sets the size of the color bar numeric labels with 1 being the maximum possible size.

--x-rotation

Number that defines the rotation of the column labels. Default is 90.

--highlight-labels

User can pass an argument to highlight specific axes labels in bold red font. For example, y:MDA_113,SH_1149;x:D-Trehalose,D-Fructose will highlight the MAD_113 and SH_1149 labels on the y-axis and the D-Trehalose and D-Fructose labels on the x-axis. Labels will be highlighted only if detected.