Counts the different AA substitutions in the best hit blast alignments, from a BLASTP pairwise format output (-outfmt 0 in BLAST+, -m 0 in legacy BLAST).
Calculates the percentage of a partial BLAST result. The value produced slightly subestimates the actual advance, due to un-flushed output and trailing queries that could be processed but generate no results.
Estimates the sequencing depth of subject sequences. The values reported by this script may differ from those of BlastTab.seqdepth.pl, because this script uses the aligned length of the read while BlastTab.seqdepth.pl uses the aligned length of the subject sequence.
Estimates the average sequencing depth of subject sequences (genes or contigs) assuming a Zero-Inflated Poisson distribution (ZIP) to correct for non-covered positions. It uses the corrected method of moments estimators (CMMEs) as described by Beckett et al [1]. Note that [1] has a mistake in eq. (2.4), that should be: pi-hat-MM = 1 - (X-bar / lambda-hat-MM). Also note that a more elaborated mixture distribution can arise from coverage histograms (e.g., see [2] for an additional correction called 'tail distribution' and mixtures involving negative binomial) so take these results cum grano salis.
[1] http://anisette.ucs.louisiana.edu/Academic/Sciences/MATH/stage/stat2012.pdf
[2] Lindner et al, Bioinformatics, 2013.
Sums the weights of all the queries hitting each subject. Often (but not necessarily) the BLAST files contain only best matches. The weights can be any number, but a common use of this Script is to add up counts (weights are integers). For example, in a BLAST of predicted genes vs some annotation source, the weights could be the number of reads recruited by each gene.
Generates a list of hits from a BLAST result concatenating the subject sequences. This can be used, e.g., to analyze BLAST results against draft genomes. This script creates two files using <map.bls> as prefix with extensions .rec (for the recruitment plot) and .lim (for the limits of the different sequences in <seq.fa>).
Calculates the N50 value of a set of sequences. Alternatively, it can calculate other N** values. It also calculates the total number of sequences, the total added length, and the longest sequence length.
Calculates the quartiles of the length in a set of sequences. The Q2 is also known as the median. Q0 is the minimum length, and Q4 is the maximum length. It also calculates TOTAL, the added length of the sequences in the file, and AVG, the average length.
Interpose sequences in FastA format from two files into one output file. If more than two files are provided, the script will interpose all the input files.
Interposes sequences in FastQ format from two files into one output file. If more than two files are provided, the script will interpose all the input files.
There are several FastQ formats. This script takes a FastQ in any of them, identifies the type of FastQ (this is, the offset), and generates a FastQ with the given offset.
Takes a table of OTU abundance in one or more samples and calculates the Rao (Q_alpha), Rao-Jost (Q_alpha_eqv), Shannon (Hprime), and inverse Simpson (1_lambda) indices of alpha diversity for each sample.
Takes a table of OTU abundance in one or more samples and calculates the chao1 index (with 95% Confidence Interval) for each sample. To use it with Qiime OTU Tables, run it ignoring 1 left column and with header.
Estimates the Ka/Ks ratio from the SNPs in a VCF file. Ka and Ks are corrected using pseudo-counts, but no corrections for multiple substitutions are applied.
Generates iToL-compatible files from a .jplace file (produced by RAxML's EPA or pplacer), that can be used to draw pie-charts in the nodes of the reference tree.
Estimates the log2-ratio of different amino acids in homologous sites using an AAsubs file (see BlastPairwise.AAsubs.pl). It provides the point estimation (.obs file), the bootstrap of the estimation (.boot file) and the null model based on label-permutation (.null file).
Concatenates several multiple alignments in FastA format into a single multiple alignment. The IDs of the sequences (or the ID prefixes, if using --ignore-after) must coincide across files.
Counts the different AA substitutions in the best hit blast alignments, from a BLASTP pairwise format output (-outfmt 0 in BLAST+, -m 0 in legacy BLAST).
Calculates the Rand Index and the Adjusted Rand Index between two clusterings. The clustering format is a raw text file with one cluster per line, each defined as comma-delimited members, and a header line (ignored). Note that this is equivalent to the OGs format for 1 genome.
Generates a list of hits from a BLAST result concatenating the subject sequences. This can be used, e.g., to analyze BLAST results against draft genomes. This script creates two files using <map.bls> as prefix with extensions .rec (for the recruitment plot) and .lim (for the limits of the different sequences in <seq.fa>).