HMMER 3.1 beta test 2 released

hmmer-154x184

The HMMER dev team is happy to announce a bugfix release of HMMER3.1, release 3.1 beta 2, aka 3.1b2. Following Google’s ineffable lead in having perpetual beta test periods, 3.1 has been in beta test now for two years. When we said before that 3.1 will be released reasonably soon… “reasonably soon” continues to be a term of art for the dev team. Did we mention, it’s a stable beta release?

Anyhoo, moving right along. The 3.1b2 code is publicly available as a tarball available for download, or from hmmer.org, where you’ll also find precompiled binary releases for Mac and Linux.

The most significant upgrade in 3.1b2 is that the nhmmer program for DNA/DNA comparison now includes a somewhat radical heuristic acceleration technique that gets us about 10x more speed. Travis Wheeler has used an FM-index data structure to accelerate remote homology search in nhmmer. FM-index techniques are well known now in the computational biology community for fast near-exact matching (in read mappers, for example), and there have been some proofs of principle for accelerating Smith/Waterman especially with scoring systems set for close matches; Travis’s code is a full-fledged implementation in production code for remote homology. Travis is still working on it and writing it up. Meanwhile, you can try it out. If you format a DNA database with the new makehmmerdb command, and then use nhmmer –tformat hmmerfm to search the binary FM-index database, you’ll use the new acceleration.

Another significant upgrade is the inclusion of the hmmlogo program, which is essentially a commandline interface for producing the data underlying the Skylign profile logo server (skylign.org).

Also, eight, count ’em eight bugs have been fixed. Of the ones we count, anyway.

Congratulations again to Travis Wheeler, who continues as 3.1’s build master, even though he is now afar in his new mountain lair faculty position at the University of Montana, as the HMMER dev team continues to scatter and flee from Virginia.

The horrible grinding noise you hear is the HMMER4 development code branch. Do not be alarmed. All is well. It will be ready… reasonably soon.

Detailed release notes for 3.1b2 are below the fold.

HMMER 3.1b2 release notes

http://hmmer.org/
TJW, Sun Feb 22 07:59:45 2015
________________________________________________________________

3.1b2 includes the following large changes:

New heuristic for accelerating nhmmer roughly 10-fold.

We have developed a new algorithm that accelerates DNA search in
nhmmer. The acceleration can be tuned, such that greater speed will
tend to decrease sensitivity. The default settings yield roughly
10-fold acceleration while retaining nearly complete sensitivity
among hits with E-value < 1e-3 (with a modest loss in sensitivity
among marginal hits with  E > 1e-3)

This algorithm requires that the sequence database first be
preprocessed into a binary file format. The new tool makehmmerdb
performs this task.

New method in hmmbuild for deciding if a sequence is a fragment.

If hmmbuild determines that a sequence is a fragment, all leading and
trailing gap symbols (all gaps before the first residue and after the
last residue) are treated as missing data symbols, and thus do not
count as observed gaps.

In H3.0 and H3.1b1, a sequence was called a fragment if its length was
less than a specified fraction of the alignment length. In the case of
alignments with many sequences, this often resulted in all sequences
being labeled as fragments, which could lead to unexpected terminal
match states when a small fraction of sequences contained a long
terminal extension. Now, a sequence is labeled a fragment if its range
in the alignment (the number of alignment columns between the first
and last positions of the sequence) is not greater than a specified
fraction of the full alignment length. This should improve HMMER’s
ability to model alignments with ragged ends.

Other changes include:

-:- The DNA search tool, nhmmer, depends on a value MAXL, which hmmbuild
computes as an assertion of the maximum length at which HMMER
expects to see an instance of the model. This value could previously
become excessively long when building a model from an alignment with
many long insertions. The MAXL value computed by hmmbuild for DNA
alignments is now limited to 20*M, where M is the # of match states.

-:- A new tool, called hmmlogo, that computes letter height and indel
parameters that can be used to produce a profile HMM logo. This tool
can be thought of as a command-line interface for the data underlying
the Skylign logo server (skylign.org).

Bugfixes:

-:- #h100 hmmalign would segfault on a zero length input sequence.

-:- #h101 hmmsearch would segfault when searching a DNA HMM against a
protein db (on Linux only).

-:- #h102 Marginal hits late in a target sequence database were subject
to being filtered in an nhmmer search. This was due to a score
filter that (a) was intended to accelerate search, but had
essentially no impact on speed, and (b) was an overly
aggressive filter. Removed the filter.

-:- #h103 Error printing very small E-values. Closely related to #h98,
but occuring in the main thread (#h98 fixed the same problem
in worker threads).

-:- #h104 HMMER would not compile on OpenBSD, because netinet/in.h was
not included. This header file is included via arpa/inet.h
on most other systems, but not on OpenBSD.

-:- #h105 Errors encountered while running ‘make clean’ and ‘make distclean’
in binary builds. This was the result of the Makefile trying to
remove the userguide folder and LICENSE.txt file, which are
already removed in the release process. The Makefile now accounts
for this possibility.

-:- #h106 H3 failed to read some old H2 HMM files. This happened in the
cases that (1) there was an empty DESC field in the file, or (2)
the model was not normalized. Both cases have been resolved.

-:- #h107 hmmsim only worked for Amino Acid models. It now works for
nucleotide models, also.

6 thoughts on “HMMER 3.1 beta test 2 released

  1. Thanks for a terrific tool (have been using it for many years directly and indirectly – as part of other programs).

    Today was the first time I tried to use hmmbuild (from hmmer-3.1b2-linux-intel-x86_64) and discovered that it fails to build HMMs from alignments that have no gap characters “-“, so I added a “-” at the same location in each sequence of the MSA and then it worked fine.

    Just posting that here in case someone else has the same problem and gets this error:
    Alignment input open failed.
    Segmentation fault (core dumped)

    Like

    • I don’t know what you’re referring to here. HMMER doesn’t require gap characters; it will build a model from an ungapped sequence alignment just fine. It should never core dump on any input. Please email me a reproducible example, and I’ll look into it.

      Like

  2. I have a similar issue. Try this set of attC sites.
    >attC_1 all_bases
    gccgaacccggcgctgcacctgacaccgcccgctgacttgggacgcaccgctcatccggtgagggcggtgca
    ggtgagctctttgttcggcacacagagg
    > attC_2 all_bases
    gccgaacccggcgctgcacctgacaccgccggctgacttgtgacgcatcgctcattcgatgatggcggtgca
    ggtgagctctttgttcggcaacagaggg
    > attC_3 all_bases
    gccgaacccggcgctgcacctgacaccgcccgctgacttgtgacgcaccgctcattcgatggtggcggtgca
    ggtgagctgattgttcggcacacagagg

    Like

    • That could be an A2M file, an aligned FASTA file, or an unaligned FASTA file, and the format autodetector is getting confused. We’ve already fixed this bug in dev code, and the fix will appear soon. To work around the problem in the meantime, specify that you want this file parsed as an aligned fasta file, either using the “–informat” option or an “.afa” suffix:

      % hmmbuild –informat afa your.hmm your.afa

      Like

  3. I notice a difference between the 3.0 and the 3.1b2 file formats.

    When I use the 3.1b2 hmmpress to press 3.0 HMMs, things work fine.
    When I use the 3.1b2 hmmpress to press 3.1b2 HMMs, things work fine.

    But when I concatenate the two HMM files, and try to press the resulting file, I get

    “Working…
    Error: bad file format in HMM file old_new.HMM”

    Is there a way for me to mix 3.0 and 3.1b2 HMMs in one and the same file pressable using 3.1b2 (or to covert 3.0 HMMs to 3.1b2)? I have a truckload of 3.0 HMMs, and recreating them from the original FASTA files would take a very long time (in the sense of manual work, file juggling etc.).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s