Over at hmmer.janelia.org, you’ll notice a significant change over on the right side of the page. See the “Search” button? You don’t have to use HMMER at the UNIX command line any more. Thanks to support from the Howard Hughes Medical Institute, and hard work from Rob Finn and Jody Clements here in the skunkworks at HMMER Labs, HMMER searches are now available on interactive web servers.
Three different types of searches are available. The main one is “phmmer”, the ability to search your protein sequence against sequence databases. You can also use “hmmscan” to search your sequence against the Pfam domain database (“phmmer” does this for you automatically too), or “hmmsearch” to upload a HMMER3 profile of your own to do a profile search of the sequence database.
If you’ve got a lot of searches to do, you don’t have to cut and paste into a browser. We also provide a RESTful web services interface to all three search servers, so you can access our web services directly through your own scripts. Documentation on the web services API can be found here.
Our goal is to achieve truly interactive search times. We want a typical search to return results to you in about 100-200 msec, way faster than a BLAST search, and as fast or faster than a Google search. We’re not quite there yet, but it’s not far from it; a typical search of the full NR database takes about 1 second. That’s not a typo: yes, one second, for a full profile HMM search with the full Forward summed-over-all-alignments HMM algorithm, with confidence probabilities calculated for every aligned residue. And we’re confident we can get that last ten-fold with a little more brawn and brain — more servers, and slick new algorithms slotting into place in HMMER3.
Longer term, we aim to change the way sequence database searches look to you. The good old batch search of a bag of sequences, reported as a tabular list of hits, was designed for the days when you were lucky to get any informative hit at all. Nowadays, the problem is that you get thousands of hits and you have to sort through a hundred unannotated genomes of Linnaean names you’ve never heard of to find the hit you want. We’re going to shift our displays from simple lists to phylogenetic trees, and we’re going to use that to drive fundamental changes in the underlying organization of the data and the computational searches themselves. Someday you’ll be searching on an organized phylogenetic tree of whole proteomes. Which, when combined with interactive search speed, will let us say piffle to the so-called data tsunami. We’ll run your search on a framework phylogeny of relatively constant size, and let you interactively burrow and focus into any clade of interest, however deep deep sequencing ever goes.
Behind the scenes here sits 144 HHMI-funded cores, in 12 12-core machines, running a new daemon program called “hmmpgmd”. hmmpgmd is what Michael Farrar was working on up until the night of his death last December. I’ve taken over where he left off. Michael couldn’t quite get enough speed out of standard cluster parallelization approaches like MPI, so hmmpgmd implements our own hand-rolled IP socket communication protocol. Whenever we finally release 3.1 or 3.0.1 or whatever we call the next code version, you’ll see hmmpgmd lurking as an executable; if you press us, we might even tell you how to use it yourself (or even document it properly, gasp).
There’s lots to do, and much that we can improve. We think there’s a lot of potential here, and we’re juggling and prioritizing our long to-do lists. Please give us your feedback on what you’d like to see in this nascent server. And please don’t be shy about using it — the more you drive our usage statistics up, the more likely I’ll be able to convince HHMI to continue supporting this as a public service.