Enveomics collection

A toolbox for microbial genomics and metagenomics

aai.rb

Calculates the Average Amino acid Identity between two genomes.

    See source code, Artistic license 2.0.

§ References

    Konstantinidis & Tiedje, 2005, JBac; Altschul et al, 2000, JMB (BLAST); Kent WJ, 2002, Genome Res (BLAT); Buchfink B, Xie C, Huson D, 2015, Nat Meth (Diamond); Rodriguez-R & Konstantinidis, 2016, PeerJ Preprints.

§ Requirements

§ Usage

aai.rb --seq1 in_file --seq2 in_file [opts]

§ Arguments

Sequence 1*
 --seq1 in_file  FastA file containing the genome 1 (proteins).
Alternatively, you can supply the NCBI-acc of a genome (nucleotides) with the format ncbi:CP014272 instead of files.
Sequence 2*
 --seq2 in_file  FastA file containing the genome 2.
Alternatively, you can supply the NCBI-acc of a genome (nucleotides) with the format ncbi:NC_004337 instead of files.
Length
 --len integer  Minimum alignment length (in aa).
Length fraction
 --len-fraction float  Minimum alignment length as a fraction of the shorter sequence (range 0-1).
Identity
 --id float  Minimum alignment identity (in %).
Bit-score
 --bitscore float  Minimum bit score (in bits).
Hits
 --hits float  Minimum number of hits.
Nucleotides
 --nucl   The input sequences are nucleotides (genes), not proteins.
Max ACTG
 --max-actg float  Maximum fraction of ACTGN in the sequences before assuming nucleotides.
Executables
 --bin in_dir  Path to the directory containing the binaries of the search program.
Program
 --program select  Search program to be used.
Make sure that you have installed the search program you want to use. If you have downloaded the program, but it's not installed, please use the Executables option above.
Threads
 --threads integer  Number of parallel threads to be used.
SQLite3 DB
 --sqlite3 out_file  Path to the SQLite3 database to create (or update) with the results.
Name 1
 --name1 string  Name of Sequence 1 to use in SQLite3 DB. By default determined by filename.
Name 2
 --name2 string  Name of Sequence 2 to use in SQLite3 DB. By default determined by filename.
Don't save RBM
 --no-save-rbm   Don't save the reciprocal best matches in the --sqlite3 database.
Lookup first
 --lookup-first   Indicates if the AAI should be looked up first in the database. Requires SQLite3 DB, Auto, Name 1, and Name 2. Incompatible with Result, Tab, Out, and RBM.
Precision
 --dec integer  Decimal positions to report.
RBM
 --rbm out_file  Saves a file with the reciprocal best matches.
Out
 --out out_file  Saves a file describing the alignments used for two-way AAI.
Result
 --res out_file  Saves a file with the final results.
Tab
 --tab out_file  Saves a file with the final two-way results in a tab-delimited form. The columns are (in that order): AAI, standard deviation, proteins used, proteins in the smallest genome.
Auto
 --auto   ONLY outputs the AAI value in STDOUT (or nothing, if calculation fails).
Quiet
 --quiet   Run quietly (no STDERR output).
* Mandatory.