It has been a couple of weeks now since we released jackhmmer on the HMMER website and so far (touch wood etc…), it seems to be performing as we had hoped – here on ‘the farm’ we are getting very excited with the results we are observing. Continue reading
Have you ever wondered how a new protein family would look in context of other Pfam domains? Well, look no further than the hmmer website! At the end of last week we released a new way of visualizing search results according to ‘domain architecture’ (applies to both phmmer and hmmsearch). Continue reading
We’ve had a couple of reports of some less-than-intuitive behavior of HMMER3 on poor-scoring sequences. As one correspondent described it, HMMER3 is stubborn. It will refuse to score and align certain low-scoring sequences no matter what options you try to set. It’s probably worth explaining this behavior in public, partly because it’s an opportunity for me to briefly describe the fact that H3 has two processing pipelines: the “acceleration” pipeline, and the “domain postprocessing” pipeline. Only the acceleration pipeline is written up for publication, reasonably well documented, and well controllable by options. The domain postprocessor is ad hoc, not terribly satisfying, not well documented, not easily configurable — and it kicks back a side effect that drops some poor-scoring sequences entirely.
From today’s email…
Suppose, for example, you want to search 300 million metagenomic sequence reads, each about 200nt long, against the Pfam database. What’s the best way to do that task with HMMER3? The bottom line: use hmmsearch, not hmmscan. For the numerology of why (and chapter and verse on how hmmsearch and hmmscan scale to large multithreaded and MPI tasks, their limitations, advice on how we do it, and some clues about what’s coming in the future), keep reading…
The HMMER and Infernal code includes some hidden tools: the Easel library, and its “miniapplications”. Easel is our code library (in the
easel subdirectory of both HMMER and Infernal), and the miniapplications (in
easel/miniapps) are a set of command line utilities that we use for manipulating sequence data. For example, esl-reformat is a utility for reformatting from one sequence file format to another, and esl-sfetch is a tool for retrieving sequence(s) or subsequence(s) from a large sequence flatfile. These utilities work together with HMMER and Infernal to enable sequence analysis in a flexible, arcane, unix-y command line sort of way.
For example, yesterday someone wrote to ask, suppose I want to extract all the sequences that were hit by a HMMER hmmsearch, and save them in a separate file in FASTA format — how do I do that? This is a good example for introducing Easel’s miniapplications.
Over at hmmer.janelia.org, you’ll notice a significant change over on the right side of the page. See the “Search” button? You don’t have to use HMMER at the UNIX command line any more. Thanks to support from the Howard Hughes Medical Institute, and hard work from Rob Finn and Jody Clements here in the skunkworks at HMMER Labs, HMMER searches are now available on interactive web servers.
The Guide now briefly describes the HMMER3 acceleration pipeline for profile/sequence comparison, and the methods used to identify domain “envelopes”, maximum expected accuracy alignments, and posterior probabilities of aligned residue confidence. The Guide also now documents the
--domtblout tabular output formats.
This should help address questions about what all the fiddly columns mean in these outputs, especially in
This version of the Guide included in HMMER distro tarballs is still the old version. The updated Guide will appear in distro tarballs for 3.1, which is coming Real Soon Now.