HMMER 3.1 beta test 2 released


The HMMER dev team is happy to announce a bugfix release of HMMER3.1, release 3.1 beta 2, aka 3.1b2. Following Google’s ineffable lead in having perpetual beta test periods, 3.1 has been in beta test now for two years. When we said before that 3.1 will be released reasonably soon… “reasonably soon” continues to be a term of art for the dev team. Did we mention, it’s a stable beta release?

Anyhoo, moving right along. The 3.1b2 code is publicly available as a tarball available for download, or from, where you’ll also find precompiled binary releases for Mac and Linux.

The most significant upgrade in 3.1b2 is that the nhmmer program for DNA/DNA comparison now includes a somewhat radical heuristic acceleration technique that gets us about 10x more speed. Travis Wheeler has used an FM-index data structure to accelerate remote homology search in nhmmer. FM-index techniques are well known now in the computational biology community for fast near-exact matching (in read mappers, for example), and there have been some proofs of principle for accelerating Smith/Waterman especially with scoring systems set for close matches; Travis’s code is a full-fledged implementation in production code for remote homology. Travis is still working on it and writing it up. Meanwhile, you can try it out. If you format a DNA database with the new makehmmerdb command, and then use nhmmer –tformat hmmerfm to search the binary FM-index database, you’ll use the new acceleration.

Another significant upgrade is the inclusion of the hmmlogo program, which is essentially a commandline interface for producing the data underlying the Skylign profile logo server (

Also, eight, count ’em eight bugs have been fixed. Of the ones we count, anyway.

Congratulations again to Travis Wheeler, who continues as 3.1’s build master, even though he is now afar in his new mountain lair faculty position at the University of Montana, as the HMMER dev team continues to scatter and flee from Virginia.

The horrible grinding noise you hear is the HMMER4 development code branch. Do not be alarmed. All is well. It will be ready… reasonably soon.

Detailed release notes for 3.1b2 are below the fold.
Continue reading

Cryptic genetic variation in software: hunting a buffered 41 year old bug

In genetics, cryptic genetic variation means that a genome can contain mutations whose phenotypic effects are invisible because they are suppressed or buffered, but under rare conditions they become visible and subject to selection pressure.

In software code, engineers sometimes also face the nightmare of a bug in one routine that has no visible effect because of a compensatory bug elsewhere. You fix the other routine, and suddenly the first routine starts failing for an apparently unrelated reason. Epistasis sucks.

I’ve just found an example in our code, and traced the origin of the problem back 41 years to the algorithm’s description in a 1973 applied mathematics paper. The algorithm — for sampling from a Gaussian distribution — is used worldwide, because it’s implemented in the venerable RANLIB software library still used in lots of numerical codebases, including GNU Octave. It looks to me that the only reason code has been working is that a compensatory “mutation” has been selected for in everyone else’s code except mine.

Continue reading


In July 2015, our laboratory will move to Harvard University. I’ve accepted an offer to join Harvard’s faculty in two departments: Molecular & Cellular Biology (in the Faculty of Arts and Sciences), and Applied Mathematics (in the School of Engineering and Applied Sciences). The laboratory will be on the first floor of the venerable, historic Biolabs building on the Cambridge campus. Much like we optimized our Janelia location to be the closest lab to the pub, now we will be strategically located closest to the food truck on Divinity Avenue. I’m told that our future space was once occupied by Eric Lander, so who knows what we’re going to find during construction. Charred effigies of Craig Venter, I trust.
Continue reading


Belated congratulations to Rob Finn, who has moved to EMBL-EBI to lead the Protein Families team. The HMMER web services pilot project that HHMI has funded at Janelia, under Rob’s leadership, is now in transition to EBI, and Rob and I will write more about this in the future.

Slightly-less-belated congratulations to Travis Wheeler, who has left the lab to start his new job as an assistant professor at the University of Montana this fall. The nhmmer software project and his lead role in the Dfam mobile element database moves with him. We mourn the death of our collaborator and friend Jerzy Jurka this past July. Together with Arian Smit and others, sometime soon we hope to have more to say about the legacy of Jerzy’s seminal Repbase database.

Timely — perhaps even premature — congratulations to Eric Nawrocki, who is in process of accepting a position at the National Center for Biotechnology Information, where he seems likely to start in January. Eric will remain lead developer of the Infernal RNA homology search codebase. We’ll probably have more to say about that too, and what it means for Infernal development and the Rfam structural RNA database.

It’s getting a little underpopulated here in our Janelia monastery cells. But there’s good reason for that, all part of the master plan… another exodus story that I’ll post about soon.

Wellcome Trust Computational RNA Biology meeting, 11-13 November 2014

wtrnamtgThe Wellcome Trust Sanger Institute is hosting a meeting on computational RNA biology on 11-13 November 2014, co-organized by Alex Bateman (EMBL-EBI), Anton Enright (EMBL-EBI), Mihaela Zavolan (University of Basel), and myself. I’m especially keen on this meeting because we’ve invited several people who have been active in developing computational methods for incorporating new chemical probing data, such as SHAPE, into RNA structure prediction, something that I spent quite some time thinking about for a recent review I wrote for Annual Reviews of Biophysics. Registration deadline is 26 September. See the registration page at the Wellcome Trust Sanger Institute for more information, including the invited speakers list.

Open position: Director, Advanced Computation and Technology group

Janelia has a scientific director-level position open in our computation and technology group:

“The Janelia Farm Research Campus of the Howard Hughes Medical Institute seeks an exceptional individual to lead a world-class technology effort in advanced computation, data analysis, and instrumentation design and fabrication. The Director of Janelia’s Advanced Computation and Technology group is responsible for leading an array of data-driven technology teams, ranging from data acquisition to data analysis. In collaboration with Janelia research scientists, this person must drive a strategic vision and set priorities for focused investment of effort in sophisticated engineering and computational technology, enabling Janelia’s two scientific goals:(1) understanding the basic mechanisms used by nervous systems to store and process information and (2) developing new methods for image acquisition and analysis. This individual may not necessarily be a neuroscientist or even a biologist themselves; rather, the ideal individual will have extensive technical and engineering expertise in some area of large-scale data analysis, while being excited about applying this expertise in fundamental neuroscience research.”

We’re running ads in all the usual places, and you’ll find some more information about the position there: such as this one in Nature. Please spread the word. I’ll be happy to answer questions about the position informally, for anyone who may be interested, or who may know someone who’d be interested.