This database contains all the prokaryotic genomes with complete and chromosome status in the NCBI Genome database, with the exception of 14 blacklisted genomes that appear to be composed only by plasmids.
Salmonella_enterica_subsp__salamae_serovar_55_k_z39_str__1315K_NZ_CP022139, which was confusing MiGA into thinking it had a registered kingdom.
Natrialbaceae_archaeon_JW_NM_HA_15_NZ_CP019893. These datasets will be included in the next update.
Stenotrophomonas_maltophilia_NC_001383is only composed of plasmid sequences and was manually removed.
Candidatus_Tremblaya_princeps_LN998829: This dataset has a sequence named chromosome I, but it only contains 51 genes (140Kbp), so it's likely a plasmid.
Burkholderia_pseudomallei_NZ_CM007659dataset only contains the second chromosome of B. pseudomallei, resulting in a completeness of 2.7% (3 essential genes), it was therefore removed. The current database has 8,562 reference datasets.
Lactococcus_lactis_NC_004164Total reference genomes: 8,437 (pre-update) - 8,563 (post-update).
Escherichia_coli_NC_011752, it seems to be composed only of plasmids. Evaluate completeness to clean the collection.
Mycobacterium_tuberculosis_NC_025025is only a plasmid with 6,898 bp and no chromosome sequence, and was manually removed. Total reference genomes: 8,015 (pre-update) -> 8,438 (post-update). The dataset
Legionella_fallonii_LLAP_10_NZ_LN614827was manually removed because of a corrupt database file (it'll be incorporated in the next update), resulting in 8,437 datasets.
Candidatus_Tremblaya_princeps_LN998829to complete project so it can be used in the website, but I'll keep running the distances of this dataset in the meantime.
Acetobacter_aceti, which only had from family-down (perhaps a network issue while downloading?).
The RefSoil collection is a manually-curated set of genomes derived from NCBI's RefSeq database containing only organisms previously shown to be associated with soils, as described in Choi et al, 2017, ISME J.
This project hosts a set of 957 metagenome-assembled genomes (MAGs) from the TARA Oceans metagenomes, as described by Delmont et al. All public data in the study can be found at Recovering HBDs from TARA Oceans Metagenomes.
The taxonomy of the datasets was inferred by MiGA using NCBI Prok as a reference at p-value < 0.05.