Public data

from Kostas lab

Chirag et al (2018), includes references to the following material:

Name Description Files
D1 Dataset 1 (NCBI RefSeq) Sequences (1.6 Gb)
IDs* (36 Kb)
D2 Dataset 2 (Bacillus cereus) Sequences (911 Mb)
IDs* (113 Kb)
D3 Dataset 3 (Escherichia coli) Sequences (6.2 Gb)
IDs* (2.2 Mb)
D4 Dataset 4 (Bacillus anthracis) Sequences (670 Mb)
IDs* (3.1 Kb)
D5 Dataset 5 (Parks et al MAGs) Sequences (5.8 Gb)
IDs* (2.4 Mb)
NCBI_Prok NCBI Genome - Prokaryotic section Sequences (95 Gb)
FastANI matrix (6.2 Gb)
IDs* (30 Mb)


* The ID files are gzipped tab-delimited raw text files with the following columns:
  1. Name of the dataset as used in the manuscript.
  2. IDs in the NCBI nuccore database separated by commas, except for D2 in which some datasets contain identifiers from the Center for Disease Control and Prevention, Division of High-Consequence Pathogens and Pathology (prefixed with CDC:DHCPP:).
  3. When available, links to the publicly available dataset in MiGA.