All available genomes from the Bacillus cereus sensu lato clade, including the species:
Exploration of all available genomes belonging to genus Marinobacter.
All available genomes from the Mycobacterium tuberculosis complex, including the species:
This database contains all the prokaryotic genomes with complete and chromosome status in the NCBI Genome database, with the exception of 26 blacklisted genomes.
Silvanigrella_aquatica_CP017838taxid:1912593 to taxid:1915309.
Staphylococcus_epidermidis_CP018841taxid:1929941 to taxid:1282.
Staphylococcus_aureus_subsp__aureus_CP025490, single plasmids, as well as
Cupriavidus_sp__NH9_NZ_CP017758, missing chromosome I.
Jan-19-2018: Soft-unlinked the following datasets for indexing (will be included in the next update):
Jan-15-2018: Blacklisted 1 dataset composed exclusively of plasmid sequences
Halobacterium_salinarum_NC_002121. Total reference genomes: 10,637.
Dec-11-2017: Database update. 381 datasets eliminated and 884 datasets added. Total reference genomes: 10,638.
Nov-07-2017: The following datasets were soft-unlinked for indexing, but will be included in the next update:
Oct-12-2017: Database update. 129 datasets eliminated and 549 added. Total reference genomes: 10,224 post-update.
Aug-30-2017: Manually modified taxonomic rank
dataset in dataset
which was confusing MiGA into thinking it had a registered kingdom.
Aug-30-2017: A filesystem error caused an interruption of the following datasets, which will be unlinked for this update and re-downloaded in the next update:
Aug-15-2017: Database update. 75 datasets eliminated and 573 added. Total reference genomes: 9,559 (post-update).
Jul-07-2017: The following datasets were temporarily unlinked to complete
Natrialbaceae_archaeon_JW_NM_HA_15_NZ_CP019893. These datasets will be
included in the next update.
Jul-02-2017: The dataset
Stenotrophomonas_maltophilia_NC_001383 is only
composed of plasmid sequences and was manually removed.
Jun-23-2017: Database update. 91 datasets eliminated and 430 added. Total reference genomes: ,8724 (pre-update) - 9,063 (post-update).
May-12-2017: Database update. 222 datasets eliminated and 324 added. Total reference genomes: 8,622 (pre-update) - 8,724 (post-update).
Apr-25-2017: Database update. 7 datasets eliminated and 73 added. Total reference genomes: 8,557 (pre-update) - 8,622 (post-update). Database not indexed for this update.
Apr-21-2017: The following datasets were composed only of plasmids and were eliminated:
Candidatus_Tremblaya_princeps_LN998829: This dataset has a sequence named chromosome I, but it only contains 51 genes (140Kbp), so it's likely a plasmid.
Burkholderia_pseudomallei_NZ_CM007659 dataset only contains
the second chromosome of B. pseudomallei, resulting in a completeness of
2.7% (3 essential genes), it was therefore removed. The current database has
8,562 reference datasets.
Apr-17-2017: Database update. 87 datasets were eliminated and 218 datasets added. The following datasets were eliminated based on the previous update or completeness report (<1% and no 16S):
Total reference genomes: 8,437 (pre-update) - 8,563 (post-update).
Apr-16-2017: Manually modified domain in the taxonomy of
Apr-15-2017: Note for next update: Check out
seems to be composed only of plasmids. Evaluate completeness to clean the
Mar-06-2017: Database update. 239 datasets were eliminated and 663
datasets added. The dataset
Mycobacterium_tuberculosis_NC_025025 is only a
plasmid with 6,898 bp and no chromosome sequence, and was manually removed.
Total reference genomes: 8,015 (pre-update) -> 8,438 (post-update). The
Legionella_fallonii_LLAP_10_NZ_LN614827 was manually removed because
of a corrupt database file (it'll be incorporated in the next update),
resulting in 8,437 datasets.
complete project so it can be used in the website, but I'll keep running the
distances of this dataset in the meantime.
Exploration of genomes classified as "Candidatus Pelagibacter ubique".
This database contains all reference prokaryotic genomes in the NCBI RefSeq database.
Acetobacter_aceti, which only had from family-down (perhaps a network issue while downloading?).
The RefSoil collection is a manually-curated set of genomes derived from NCBI's RefSeq database containing only organisms previously shown to be associated with soils, as described in Choi et al, 2017, ISME J.
This project hosts a set of 957 metagenome-assembled genomes (MAGs) from the TARA Oceans metagenomes, as described by Delmont et al, 2018, Nat Microb. All public data in the study can be found at Recovering HBDs from TARA Oceans Metagenomes.
The taxonomy of the datasets was inferred by MiGA using NCBI Prok as a reference with p-value < 0.05.
Exploration of available genomic data from the phylum Thaumarchaeota.
This project hosts metagenome-assembled genomes (MAGs) from multiple collections compiled by Tsementzi et al (in preparation). The set included here is the "high-quality set", with genome quality > 50%, based on CheckM estimates as:
quality = completeness - 5 x contamination.
The taxonomy of the datasets was inferred by MiGA using NCBI Prok as a reference at p-value < 0.05.
High-quality Metagenome-Assembled Genomes (MAGs) from 5 Lakes and 2 estuarine locations along the Chattahoochee River, Southwest USA.
MAGs obtained from a collection of 100 metagenomes using Subtractive Iterative Binning.
All available genomes from the genus Xanthomonas.
This project excludes the dataset
including a large scaffold
(119 Kbp) mostly covered by a single homopolymer (poly-T).