HMMER 3.0


Our quest is at an end.

— Monty Python and the Holy Grail

Four years in development, and a year in testing: HMMER3 has reached its first public production release. Do we have time for a beer and a small celebration before we write the manuscripts and move straight on to 3.1 development? No? Thought not.

HMMER3 is available for download as a source code tarball. Over at hmmer.org, there are also links for downloading tarballs including precompiled binaries for Linux/Intel ia32, Linux/Intel x86_64, and Mac OS/X Intel platforms.

The release notes for 3.0 follow:



HMMER 3.0 release notes

http://hmmer.org/
SRE, Sun Mar 28 09:12:01 2010

This is the first release of HMMER 3.0.

H3 has been in testing since January 2009. It is now ready for production use. This means we’ll actually accept the blame now if it doesn’t work. It has been stable for many months. It is already widely deployed in its beta test versions for Pfam, Interpro, and other protein databases.

We are already working on 3.1. 3.1 is expected to bring in several new features that did not make it into 3.0, including DNA/DNA searches, and a wider set of alignment formats beyond Stockholm and aligned FASTA. But before that happens, we’re going to take a sort of breather, and finish the manuscripts that describe how H3 works.

There are only small differences in 3.0 relative to the previous 3.0rc2 release:

  • The User’s Guide now documents the UCSC SAM profile software’s A2M format, which H3 can export but not read, at present. (A2M is not aligned FASTA.)
  • Issues detected by the cppcheck static analyzer have been fixed.

21 thoughts on “HMMER 3.0

  1. Congrats on the release! You and the team are making a lot of people more productive with the new implementation (though having significantly less time waiting for results to come back is cutting down on the blogging).

    Just wanted to say thanks to you and the SELAB folks for the efforts, for releasing early, the open source nature of the code, and for your willingness to communicating to the masses what you are doing and why.

    Like

  2. Hoooraaaaay! Amazing. The single most important and fundamental tool in all of computational biology, created by the most brilliant and influential mind (and team!) in the field. Eddy-Method forward scores are today what Karlin-Altschul statistics were to the last generation of algorithms. I’m preempting any modesty on your part – it is true! (Also thank you for the Mac OSX binaries…)

    Great job on the release, the world is significantly better off because of you and your work. Unimaginably outstanding. Please don’t defect to neuroscience!

    Like

  3. Go for the beer and the celebration, your work has been impressive. And thanks for the pre-compiled binaries, it makes my life SO much easier!

    Like

  4. Boohoo! Let me be the first one to whine that the nucleotide functionality isn’t there yet – and I REALLY need it! I don’t think there should be time for beer yet. When 3.1 comes out, I’d be happy to deliver a case. Or a keg. Of your choice.

    Seriously. Wonderful job, and a million congratulations. But when will 3.1 be available? Is there a beta for that yet that we can test…?

    Like

  5. Great works. Congratulations!
    I put the following information for those who want to design a fast parser quickly.

    —————————————————————
    One can modify the line 1181 of p7_tophits.c to

    fprintf(ofp, “%-*s %-*s %5d %-*s %-*s %5d %9.2g %6.1f %5.1f %3d %3d %9.2g %9.2g %6.1f %5.1f %5d %5d %5ld %5ld %5d %5d %4.2f %st%st%st%st%st%sn”,

    and then line 1204 to

    (th->hit[h]->desc ? th->hit[h]->desc : “-“),
    th->hit[h]->dcl[d].ad->model,
    th->hit[h]->dcl[d].ad->mline,
    th->hit[h]->dcl[d].ad->aseq,
    (th->hit[h]->dcl[d].ad->csline ? th->hit[h]->dcl[d].ad->csline : “-“),
    (th->hit[h]->dcl[d].ad->rfline ? th->hit[h]->dcl[d].ad->rfline : “-“)
    );

    and for example one can do

    hmmsearch –domtblout /dev/stdout -o /dev/null file.hmm file.fasta

    To the standard output it gives a modified tabular description which has the alignment information at the end.
    It has to be noted that the alignment of the current code is probably not the very final version and authors want to work on it more.

    Like

  6. As a feature request, could you make this work:

    hmmscan -o /dev/null –domtblout – Pfam-A.hmm – out.txt

    The input can already be taken from standard input using this notation, and I can keep the regular output from going to standard output, but can I make the “domain table” go to standard output as suggested above?

    Of course, the idea is to be able to actually pipe the “domain table” into some other program, without having to write an intermediate temporary file.

    -Alex

    Like

  7. let me try again, the redirection symbols got lost because they look like html…

    hmmscan -o /dev/null –domtblout – Pfam-A.hmm – < in.fa > out.txt

    Like

  8. Any more thoughts on starting up a forum/mailing list?

    I’ve got a bucket full of questions regarding Infernal but I don’t necessarily want to address them directly to Sean, as I believe that other people might have had similar issues before…

    Like

  9. Congratulations on the final release. I have been a user of the previous betas for the last 6 months. Thanks you so much for the great effort and support.

    I second Alex Ochoa on his suggestions, they are much needed.

    I am not sure but having a tab separated output instead of space separated would be interesting as well so that users can open the output directly in spreadsheet instead of going through parsers.

    Like

  10. Hi, it is great work and very helpful for me. But I have question: I built a HMM file and read it. But I have no idea how HMM-3.0 calculate HMM probabilities (such as match emission probability, insertion emission probability, transition probability). Could you show me how it works? Your answer is appreciated.

    Best regards!

    Like

  11. Thanks Sean, this is a truly valuable contribution to the field and we are all grateful to all the developers for it. On the subject of feature requests, I have two:

    1) Something similar to what Alex and Hiroshi want. The option to produce only machine-readable tab-delimited output *including alignments* with one domain per line. Trivial parsing and minimal I/O.
    2) It seems the multi-threading is done at the internal level (I am just guessing), and it doesn’t completely fill a many-processor machine. So it would be great to have a sequence level threading option as this would completely fill the available CPUs. In the meantime I have written a wrapper that does this, just specify the number of threads and use it as you would hmmscan:
    http://www.cs.bris.ac.uk/~gough/software/hmmscan.pl

    USE AT YOUR OWN RISK! -report any issues-

    Like

  12. The hmmsearch –tblout option provides output in the following format. I could not get the meaning of each column headings in the HMMER Users guide. Kindly help.

    Full Sequence Best 1 domain
    target name accession query name accession E-value score bias E-value score bias exp reg clu ov env dom rep inc

    Like

  13. If they’re not self-explanatory for you, please wait until we have time to document this in the user guide for everyone at the same time. Answering a question like this individually isn’t a good use of our limited time. Apologies.

    Like

  14. I got jackhmmer error: “fatal exception (source file ../../src/p7_alidisplay.c, line 429): backconverted subseq didn’t end at expected length”.
    The error was reported with a query sequence from UniProt database, while the subject was a frame translated from 454 unassembled read using transeq with default universal genetic code.
    I should have identified the ORFs from the reads first and carried out the search on translated ORFs. But I encountered the error in a trial.
    I’m using HMMER 3.0 (March 2010) pre-compiled package for x86_64 Linux, and ssh session to remote sever with Linux version 2.6.31-23-server (buildd@crested) (gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) )

    Like

  15. Please confirm that you’re using HMMER 3.0, not a beta release. (Do “jackhmmer -h” to see the version number.) HMMER 3.0b2 and earlier had a bug in how jackhmmer handles “*” characters in translated DNA sequence. That bug (which gave exactly the message you report) was fixed in July 2009, and the fix is in the H3.0 code. I’m suspecting that you have a 3.0b2 installed in your ${PATH}.

    Meanwhile, a workaround is to not put * characters in protein sequences. You probably don’t want to do this anyway with HMMER — it treats one sequence as one sequence, it does not break a sequence containing *’s into multiple peptide sequences.

    Like

  16. Thank You Dr. Sean for the fast reply, I loved your book and the indispensable HMMER package, great insight.

    the md5 sum for the gzipped tarball I used is 96A069DC0D853966456DA5228233F89D, which matches that for Linux x86_64 HMMER 3.0 binaries downloadable on janelia.org ftp server, also I have added the following to my .profile instead of export command to bypass any HMMER package pre-installed on the server:
    if [ -d “$HOME/bin/hmmer-3.0/binaries” ] ; then
    PATH=”$HOME/bin/hmmer-3.0/binaries:$PATH”
    fi
    So jackhmmer -h output gives:
    # jackhmmer :: iteratively search a protein sequence against a protein database
    # HMMER 3.0 (March 2010); http://hmmer.org/
    # Copyright (C) 2010 Howard Hughes Medical Institute.
    # Freely distributed under the GNU General Public License (GPLv3).
    # – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –

    Now to my surprise after checking the offending sequences #Query >A9HXI5_BORPD (PF01068) from UniProt and the subject in my translated 454 fragments database, the error happened with no stop involved in translated fragment sequence.
    Is providing the fragment sequence and last used HMM check point file enough to reproduce the error?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s