Genome-based distance matrix calculator

Frequently Asked Questions

§ Online submision

What type of data can I use?
Any collection of genomes (for ANI) or proteomes (for AAI). It's important that you include only genomes OR only proteomes. Once you have the different files (one per organism), simply place them in the same folder and build an archive such as .zip, .tar, .tar.gz, or .tar.bz2. Draft genomes are accepted, although they should be good advanced drafts for accurate estimations (e.g., >80% complete).
How many genomes can I upload?
We're currently limiting this service to 50 genomes. Note that the running time grows quadratically with the number of genomes, and large collections may take over a day to finish running.
How should I upload draft genomes?
You don't have to treat draft genomes differently. Just use one file per genome in the archive.

§ Output

ANI/AAI matrix plot
The graphic output is a symmetrical matrix with ANI or AAI values. Whenever possible, consistent groups at the species level are highlighted with red rectangles (≥95% ANI or ≥90% AAI). You can download a high-resolution version of this image in the link below, as well as the list of values, and the distance matrix (as raw text). Note that the matrix may contain several zeroes (100s in the distance matrix). These are values that were below the accurate range, typically for ANI between organisms of different genera (<80% ANI). If your matrix contains too many of such values, the clustering may not be accurate, and you should use AAI (proteins) instead. For more details, see Goris et al 2007.
Distance clustering plot
The matrix above is used for hierarchical clustering of the input genomes, and the resulting tree is displayed. Note that this is simply a clustering, and it shouldn't be assumed to be a phylogenetic tree (although in many cases it may correlate well).