Infernal 1.0

Infernal logo

Infernal 1.0 — the first production release of our software for RNA sequence/structure homology search and alignment — has been released. Source code and documentation are available over at infernal.janelia.org.

Infernal development is now led by Eric Nawrocki in the lab.

Infernal’s actually been used by the Rfam database since about 2002, in pre-1.0 versions. This is the first version that we think is really ready for production use, more than just a testbed for algorithms. The 1.0 code went through several release candidates in the past few months. We think we have all the obvious bugs shaken out of it.

The problem Infernal addresses is RNA homology search. You have an RNA sequence or a multiple alignment of related RNA sequences, and you want to search the sequence databases for homologs. Sequence similarity search (HMMER) might suffice, but you need about 60-70% sequence identity to detect significant RNA/DNA sequence alignments, and homologous sequences can erode below that detection level in as little as a few tens of millions of years.

If you were looking for a protein, you’d search by comparing amino acid sequences, not by comparing DNA/RNA sequence; partly because of the larger 20-letter alphabet, you get more statistical power from amino acid comparisons, with sequence tools being able to see down to about 20-30% pairwise aa identity, which will often be preserved across a billion years or more.

For a functional RNA, you obviously can’t resort to amino acid sequence comparisons. What you can do, though, is to use conserved RNA secondary structure as additional signal — at least, if your RNA of interest has a known conserved RNA secondary structure.

But how do you combine sequence conservation and RNA secondary structure conservation in a single consistent scoring system for homology search? That turns out to be a solved problem in other fields: there’s a class of “formal grammars” called stochastic context-free grammars that solves it beautifully, provided we only capture classical “nested” secondary structure, and give up on capturing any information from RNA pseudoknots. That’s acceptable; pseudoknotted base pairs are important, but always outnumbered (usually greatly outnumbered) by base pairs in standard stem-loops. Infernal implements models called profile stochastic context-free grammars. It’s essentially like HMMER and profile HMMs, extended to RNA secondary structure.

The disadvantage of profile SCFG methods is that they’re computationally intensive. We continue to work hard on accelerating Infernal, but it’s still slow. The only people who can really make routine use of it are people with a lot of computing resources at their disposal — like the Rfam team in Cambridge.

3 thoughts on “Infernal 1.0

  1. The leap from HMMER2 to HMMER3 (internal smarts + SIMD + Cell / GPU) is a huge one in both speed and sensitivity. Do you foresee such a leap in the Infernal implementation occurring?

    Like

  2. Diana Kolbe, a student in the lab, is indeed working on extending some of the techniques I use in HMMER3 to Infernal. The key problem to solve for speed is the SIMD vector parallelization of the 3D dynamic programming algorithms in Infernal, which are just enough different from the 2D DP algorithms in HMMER to make it a research problem, not a no-brainer. But Diana and I are hopeful that she’ll be able to develop a vectorization strategy that works.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s