hmmscan vs. hmmsearch speed: the numerology

From today’s email…

Suppose, for example, you want to search 300 million metagenomic sequence reads, each about 200nt long, against the Pfam database. What’s the best way to do that task with HMMER3? The bottom line: use hmmsearch, not hmmscan. For the numerology of why (and chapter and verse on how hmmsearch and hmmscan scale to large multithreaded and MPI tasks, their limitations, advice on how we do it, and some clues about what’s coming in the future), keep reading…
Continue reading →

Extracting HMMER results to sequence files: Easel miniapplications

Easel logo

The HMMER and Infernal code includes some hidden tools: the Easel library, and its “miniapplications”. Easel is our code library (in the easel subdirectory of both HMMER and Infernal), and the miniapplications (in easel/miniapps) are a set of command line utilities that we use for manipulating sequence data. For example, esl-reformat is a utility for reformatting from one sequence file format to another, and esl-sfetch is a tool for retrieving sequence(s) or subsequence(s) from a large sequence flatfile. These utilities work together with HMMER and Infernal to enable sequence analysis in a flexible, arcane, unix-y command line sort of way.

For example, yesterday someone wrote to ask, suppose I want to extract all the sequences that were hit by a HMMER hmmsearch, and save them in a separate file in FASTA format — how do I do that? This is a good example for introducing Easel’s miniapplications.
Continue reading →