Chirag et al (2018), includes references to the following material:
Name | Description | Files |
---|---|---|
D1 | Dataset 1 (NCBI RefSeq) |
Sequences (1.6 Gb) IDs* (36 Kb) |
D2 | Dataset 2 (Bacillus cereus) |
Sequences (911 Mb) IDs* (113 Kb) |
D3 | Dataset 3 (Escherichia coli) |
Sequences (6.2 Gb) IDs* (2.2 Mb) |
D4 | Dataset 4 (Bacillus anthracis) |
Sequences (670 Mb) IDs* (3.1 Kb) |
D5 | Dataset 5 (Parks et al MAGs) |
Sequences (5.8 Gb) IDs* (2.4 Mb) |
NCBI_Prok | NCBI Genome - Prokaryotic section |
Sequences (95 Gb) FastANI matrix (6.2 Gb) IDs* (30 Mb) |
* The ID files are gzipped tab-delimited raw text files with the following columns:
- Name of the dataset as used in the manuscript.
- IDs in the NCBI nuccore database separated by commas, except for D2 in
which some datasets contain identifiers from the Center for Disease Control
and Prevention, Division of High-Consequence Pathogens and Pathology
(prefixed with
CDC:DHCPP:
). - When available, links to the publicly available dataset in MiGA.