HMMER3 alpha test release

I have one concluding favor, to request of my Reader; that he will not expect to be equally diverted and informed by every Line, or every Page of this Discourse, but give some Allowance to the Author’s Spleen, and short Fits or Intervals of Dullness…

Jonathan Swift, A Tale of the Tub

HMMER3 alpha test code is now available as a tarball at our FTP site. Later this evening, it’ll also be linked from the HMMER web page.

It includes a User Guide with introductory notes about installation and tutorial examples of command line usage. The guide is telegraphic, documenting basic usage of the new HMMER3 programs. It does not yet go into all the command line options — partly because a few of them don’t work as advertised yet.

Linux binaries are included, as well as source code. If you compile it for yourself, be aware that you need a compiler that deals well with Intel Streaming SIMD Extensions (SSE), specifically SSE2 code. The GNU C compiler gcc will compile HMMER fine, but it produces binaries with markedly suboptimal speed performance. We now use the Intel C compiler, icc, for production code. The Linux binaries in the tarball have been compiled with icc.

The purpose of the alpha test period is in part to smoke out any remaining bugs, of course, but also to give people a chance to give feedback on how it all behaves at the command line. The core of H3’s functionality seems stable to me, but all the stuff that you see — the applications, the command line options, the i/o formats — is deliberately still protoypical and fluid. The invited alpha testers are a group of power users at centers that do high throughput annotation. In the test phase, I’d like to be free to make changes that make HMMER3 work best in anyone’s analysis pipeline. The corollary to this is that you can’t count on HMMER3 output formats just yet, so don’t go writing any heavy duty parsers.

There are eleven “invited” alpha testers, but anyone else adventurous enough to test the code is welcome to. The code is publicly accessible. The best place to leave comments and discuss issues/problems/bugs is here at Cryptogenomicon.


  1. I was very much impressed by the speed. The new version will solve the main problem that we currently have. Won’t you add the –acc option to the hmmscan as you had in the hmmpfam?



  2. I could not execute the binaries on my CentOS5 or Fedora10 running on the Dell Optiplex 755 with Core2Quad processor. Are they for the 64bit architecture? I had no problems at the home-made binaries with icc.



  3. It seems that the “cut-off” functions have not yet implemented. When I take the “–cut_ga” option for example, the hmmscan skips all the heuristic filters and takes long time. Am I doing some wrong?



  4. Yes, the speed is indeed impressive. I’m also very glad that, upon inspection with one of my favorite proteins, the new HMMER3 overcomes an obscure design problem that HMMER2 had, in which hits that were significant but had negative scores would not appear if there’s more than one hit of the same family. I’m guessing this has to do with the heuristics that HMMER2 didn’t have.

    Since you ask about output formats, I’m always a fan of tab-delimited tables! A simple header for the columns at the most, no other junk. I was thinking it’d be nice to have a table with ALL the information in a single row per domain (with the sequence family scores included, not in their own rows in a separate table), sort of the way the wrapper has it, but without loss of information from the standard HMMER output ( omitted certain things). In case this sounds confusing, this is the script I’m talking about:

    Yey for speed!



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s