Expected processing time: about 3 hours (last updated 6 months).
This database contains all the prokaryotic genomes with complete and chromosome status in the NCBI Genome database, with the exception of 26 blacklisted genomes.
Silvanigrella_aquatica_CP017838taxid:1912593 to taxid:1915309.
Staphylococcus_epidermidis_CP018841taxid:1929941 to taxid:1282.
Staphylococcus_aureus_subsp__aureus_CP025490, single plasmids, as well as
Cupriavidus_sp__NH9_NZ_CP017758, missing chromosome I.
Jan-19-2018: Soft-unlinked the following datasets for indexing (will be included in the next update):
Jan-15-2018: Blacklisted 1 dataset composed exclusively of plasmid sequences
Halobacterium_salinarum_NC_002121. Total reference genomes: 10,637.
Dec-11-2017: Database update. 381 datasets eliminated and 884 datasets added. Total reference genomes: 10,638.
Nov-07-2017: The following datasets were soft-unlinked for indexing, but will be included in the next update:
Oct-12-2017: Database update. 129 datasets eliminated and 549 added. Total reference genomes: 10,224 post-update.
Aug-30-2017: Manually modified taxonomic rank
dataset in dataset
which was confusing MiGA into thinking it had a registered kingdom.
Aug-30-2017: A filesystem error caused an interruption of the following datasets, which will be unlinked for this update and re-downloaded in the next update:
Aug-15-2017: Database update. 75 datasets eliminated and 573 added. Total reference genomes: 9,559 (post-update).
Jul-07-2017: The following datasets were temporarily unlinked to complete
Natrialbaceae_archaeon_JW_NM_HA_15_NZ_CP019893. These datasets will be
included in the next update.
Jul-02-2017: The dataset
Stenotrophomonas_maltophilia_NC_001383 is only
composed of plasmid sequences and was manually removed.
Jun-23-2017: Database update. 91 datasets eliminated and 430 added. Total reference genomes: ,8724 (pre-update) - 9,063 (post-update).
May-12-2017: Database update. 222 datasets eliminated and 324 added. Total reference genomes: 8,622 (pre-update) - 8,724 (post-update).
Apr-25-2017: Database update. 7 datasets eliminated and 73 added. Total reference genomes: 8,557 (pre-update) - 8,622 (post-update). Database not indexed for this update.
Apr-21-2017: The following datasets were composed only of plasmids and were eliminated:
Candidatus_Tremblaya_princeps_LN998829: This dataset has a sequence named chromosome I, but it only contains 51 genes (140Kbp), so it's likely a plasmid.
Burkholderia_pseudomallei_NZ_CM007659 dataset only contains
the second chromosome of B. pseudomallei, resulting in a completeness of
2.7% (3 essential genes), it was therefore removed. The current database has
8,562 reference datasets.
Apr-17-2017: Database update. 87 datasets were eliminated and 218 datasets added. The following datasets were eliminated based on the previous update or completeness report (<1% and no 16S):
Total reference genomes: 8,437 (pre-update) - 8,563 (post-update).
Apr-16-2017: Manually modified domain in the taxonomy of
Apr-15-2017: Note for next update: Check out
seems to be composed only of plasmids. Evaluate completeness to clean the
Mar-06-2017: Database update. 239 datasets were eliminated and 663
datasets added. The dataset
Mycobacterium_tuberculosis_NC_025025 is only a
plasmid with 6,898 bp and no chromosome sequence, and was manually removed.
Total reference genomes: 8,015 (pre-update) -> 8,438 (post-update). The
Legionella_fallonii_LLAP_10_NZ_LN614827 was manually removed because
of a corrupt database file (it'll be incorporated in the next update),
resulting in 8,437 datasets.
complete project so it can be used in the website, but I'll keep running the
distances of this dataset in the meantime.