HMMER3 is stubborn

We’ve had a couple of reports of some less-than-intuitive behavior of HMMER3 on poor-scoring sequences. As one correspondent described it, HMMER3 is stubborn. It will refuse to score and align certain low-scoring sequences no matter what options you try to set. It’s probably worth explaining this behavior in public, partly because it’s an opportunity for me to briefly describe the fact that H3 has two processing pipelines: the “acceleration” pipeline, and the “domain postprocessing” pipeline. Only the acceleration pipeline is written up for publication, reasonably well documented, and well controllable by options. The domain postprocessor is ad hoc, not terribly satisfying, not well documented, not easily configurable — and it kicks back a side effect that drops some poor-scoring sequences entirely.
Continue reading

hmmscan vs. hmmsearch speed: the numerology

From today’s email…

Suppose, for example, you want to search 300 million metagenomic sequence reads, each about 200nt long, against the Pfam database. What’s the best way to do that task with HMMER3? The bottom line: use hmmsearch, not hmmscan. For the numerology of why (and chapter and verse on how hmmsearch and hmmscan scale to large multithreaded and MPI tasks, their limitations, advice on how we do it, and some clues about what’s coming in the future), keep reading…
Continue reading

Extracting HMMER results to sequence files: Easel miniapplications

Easel logo

The HMMER and Infernal code includes some hidden tools: the Easel library, and its “miniapplications”. Easel is our code library (in the easel subdirectory of both HMMER and Infernal), and the miniapplications (in easel/miniapps) are a set of command line utilities that we use for manipulating sequence data. For example, esl-reformat is a utility for reformatting from one sequence file format to another, and esl-sfetch is a tool for retrieving sequence(s) or subsequence(s) from a large sequence flatfile. These utilities work together with HMMER and Infernal to enable sequence analysis in a flexible, arcane, unix-y command line sort of way.

For example, yesterday someone wrote to ask, suppose I want to extract all the sequences that were hit by a HMMER hmmsearch, and save them in a separate file in FASTA format — how do I do that? This is a good example for introducing Easel’s miniapplications.
Continue reading

HMMER3 at your (web) service

hmmer-154x184

Over at hmmer.janelia.org, you’ll notice a significant change over on the right side of the page. See the “Search” button? You don’t have to use HMMER at the UNIX command line any more. Thanks to support from the Howard Hughes Medical Institute, and hard work from Rob Finn and Jody Clements here in the skunkworks at HMMER Labs, HMMER searches are now available on interactive web servers.
Continue reading