Enveomics collection

A toolbox for microbial genomics and metagenomics

BlastTab.seqdepth_ZIP.pl

Estimates the average sequencing depth of subject sequences (genes or contigs) assuming a Zero-Inflated Poisson distribution (ZIP) to correct for non-covered positions. It uses the corrected method of moments estimators (CMMEs) as described by Beckett et al [1]. Note that [1] has a mistake in eq. (2.4), that should be: pi-hat-MM = 1 - (X-bar / lambda-hat-MM). Also note that a more elaborated mixture distribution can arise from coverage histograms (e.g., see [2] for an additional correction called 'tail distribution' and mixtures involving negative binomial) so take these results cum grano salis. [1] http://anisette.ucs.louisiana.edu/Academic/Sciences/MATH/stage/stat2012.pdf [2] Lindner et al, Bioinformatics, 2013.

    See source code, Artistic license 2.0.

§ References

    Rodriguez-R & Konstantinidis, 2016, PeerJ Preprints.

§ Requirements

§ Usage

BlastTab.seqdepth_ZIP.pl cat in_file | [opts] in_file > out_file

§ Arguments

blast*
 in_file  One or more Tabular BLAST files of reads vs genes (or contigs).
Script
 task 
genes_or_ctgs.fna*
 in_file  A FastA file containing the genes or the contigs (db).
genes_or_ctgs.cov*
 out_file  Output file with the following columns: (1) Subject ID. (2) Estimated average sequencing depth (CMME lambda). (3) Zero-inflation (CMME pi). (4) Observed average sequencing depth. (5) Observed median sequencing depth. (6) Observed median sequencing depth excluding zeroes. (7) Number of mapped reads. (8) Length of the subject sequence.
* Mandatory.