Kostas lab | Genome matrix

Frequently Asked Questions

§ Online submision

What type of data can I use?: Any collection of genomes (for ANI) or proteomes (for AAI). It's important that you include only genomes OR only proteomes. Once you have the different files (one per organism), simply place them in the same folder and build an archive such as .zip, .tar, .tar.gz, or .tar.bz2. Draft genomes are accepted, although they should be good advanced drafts for accurate estimations (e.g., >80% complete).
How many genomes can I upload?: We're currently limiting this service to 50 genomes. Note that the running time grows quadratically with the number of genomes, and large collections may take over a day to finish running.
How should I upload draft genomes?: You don't have to treat draft genomes differently. Just use one file per genome in the archive.

§ Output

ANI/AAI matrix plot

The graphic output is a symmetrical matrix with ANI or AAI values. Whenever possible, consistent groups at the species level are highlighted with red rectangles (≥95% ANI or ≥90% AAI). You can download a high-resolution version of this image in the link below, as well as the list of values, and the distance matrix (as raw text). Note that the matrix may contain several zeroes (100s in the distance matrix). These are values that were below the accurate range, typically for ANI between organisms of different genera (<80% ANI). If your matrix contains too many of such values, the clustering may not be accurate, and you should use AAI (proteins) instead. For more details, see Goris et al 2007.

Distance clustering plot

The matrix above is used for hierarchical clustering of the input genomes, and the resulting tree is displayed. Note that this is simply a clustering, and it shouldn't be assumed to be a phylogenetic tree (although in many cases it may correlate well).

List of ANI/AAI values

The list of ANI or AAI values is a raw text tab-delimited table with header. Each row corresponds to a pair-wise comparison, and the columns are:

SeqA: ID of the first genome.
SeqB: ID of the second genome.
ANI/AAI: Value of ANI or AAI (%).
SD: Standard deviation of identity (%) between reciprocal best matching fragments (ANI) or proteins (AAI).
N: Number of reciprocal best matches found.
Omega: Minimum number of fragments (ANI) or proteins (AAI) between the two genomes. This is the maximum possible number that N can take.
Frx: N/Omega ratio (%); the percentage of the genome shared.

ANI/AAI-Matrix

Genome-based distance matrix calculator

Frequently Asked Questions

§ Online submision

§ Output