HMMER3.1 beta test 1 released


The HMMER dev team is happy to announce an upgrade release of HMMER3, release 3.1. A beta test version of the code is publicly available as a tarball available for download, or from, where you’ll also find precompiled binary releases for Mac and Linux.

HMMER 3.1 includes nhmmer and nhmmscan, programs for DNA/DNA homology searches with profile HMMs. nhmmer has already been incorporated in RepeatMasker, in collaboration with Arian Smit and colleagues, and is the software underlying the Dfam database of profiles for mobile DNA elements.

HMMER 3.1 database searches are about twice as fast as HMMER 3.0 was, fulfilling old campaign promises.

HMMER 3.1 includes hmmpgmd, the parallel search daemon underlying HMMER Web Services at

This code is expected to be stable, but we’re releasing it as a beta test just to be careful. After some time in the wild, we’ll make a release candidate, and if you folks haven’t chewed any of that up too badly, we’ll make the final 3.1 release reasonably soon.

Congratulations to Travis Wheeler, 3.1’s build master — note the TJW on the notes below the fold, not an SRE — the first HMMER release managed by someone besides me (Sean).

Meanwhile… slowly, slowly, HMMER4 takes shape, as the gnomes of HMMER Labs toil sleeplessly on their latest monstrosities. The long awaited return of glocal alignment has been delayed into HMMER4, because the changes required turned out to be, um, quite extensive.

Detailed release notes for 3.1b1 are below the fold.
Continue reading →

Join Rob’s HMMER team


Rob Finn’s HMMER web services team is expanding. We’re looking for people to apply to two new positions to help Rob and Jody push forward on some important ideas for our services. We’re pushing in the direction of using more phylogenetic information (species trees) as we compute database homology searches and deliver the results — organizing everything on trees, rather than treating the protein database as a bag of unrelated sequences, as we (the community) have tended to do in the past. We’ll need help on the data visualization side (navigating search results organized on the tree of life), on the computing back end (accelerating our searches by searching representative subsets of complete proteomes, rather than “all” sequences — which will allow us to deliver fully interactive search times, measured in milliseconds), and on collaborative efforts with the primary protein sequence and genome data resources, as we (the community) get our data ecosystem organized around complete annotated genomes, not individual protein sequences. The positions, written in HR-speak, are advertised on HHMI’s web site here and here.

HMMER3 is stubborn

We’ve had a couple of reports of some less-than-intuitive behavior of HMMER3 on poor-scoring sequences. As one correspondent described it, HMMER3 is stubborn. It will refuse to score and align certain low-scoring sequences no matter what options you try to set. It’s probably worth explaining this behavior in public, partly because it’s an opportunity for me to briefly describe the fact that H3 has two processing pipelines: the “acceleration” pipeline, and the “domain postprocessing” pipeline. Only the acceleration pipeline is written up for publication, reasonably well documented, and well controllable by options. The domain postprocessor is ad hoc, not terribly satisfying, not well documented, not easily configurable — and it kicks back a side effect that drops some poor-scoring sequences entirely.
Continue reading →

hmmscan vs. hmmsearch speed: the numerology

From today’s email…

Suppose, for example, you want to search 300 million metagenomic sequence reads, each about 200nt long, against the Pfam database. What’s the best way to do that task with HMMER3? The bottom line: use hmmsearch, not hmmscan. For the numerology of why (and chapter and verse on how hmmsearch and hmmscan scale to large multithreaded and MPI tasks, their limitations, advice on how we do it, and some clues about what’s coming in the future), keep reading…
Continue reading →