I’ve been thinking about how we do “bioinformatics” in experimental biology, and I had the opportunity to talk about it recently. The following is the transcript of a keynote address I gave at Janelia’s meeting on High Throughput Sequencing for Neuroscience last weekend.
In genetics, cryptic genetic variation means that a genome can contain mutations whose phenotypic effects are invisible because they are suppressed or buffered, but under rare conditions they become visible and subject to selection pressure.
In software code, engineers sometimes also face the nightmare of a bug in one routine that has no visible effect because of a compensatory bug elsewhere. You fix the other routine, and suddenly the first routine starts failing for an apparently unrelated reason. Epistasis sucks.
I’ve just found an example in our code, and traced the origin of the problem back 41 years to the algorithm’s description in a 1973 applied mathematics paper. The algorithm — for sampling from a Gaussian distribution — is used worldwide, because it’s implemented in the venerable RANLIB software library still used in lots of numerical codebases, including GNU Octave. It looks to me that the only reason code has been working is that a compensatory “mutation” has been selected for in everyone else’s code except mine.
In July 2015, our laboratory will move to Harvard University. I’ve accepted an offer to join Harvard’s faculty in two departments: Molecular & Cellular Biology (in the Faculty of Arts and Sciences), and Applied Mathematics (in the School of Engineering and Applied Sciences). The laboratory will be on the first floor of the venerable, historic Biolabs building on the Cambridge campus. Much like we optimized our Janelia location to be the closest lab to the pub, now we will be strategically located closest to the food truck on Divinity Avenue. I’m told that our future space was once occupied by Eric Lander, so who knows what we’re going to find during construction. Charred effigies of Craig Venter, I trust.
Belated congratulations to Rob Finn, who has moved to EMBL-EBI to lead the Protein Families team. The HMMER web services pilot project that HHMI has funded at Janelia, under Rob’s leadership, is now in transition to EBI, and Rob and I will write more about this in the future.
Slightly-less-belated congratulations to Travis Wheeler, who has left the lab to start his new job as an assistant professor at the University of Montana this fall. The nhmmer software project and his lead role in the Dfam mobile element database moves with him. We mourn the death of our collaborator and friend Jerzy Jurka this past July. Together with Arian Smit and others, sometime soon we hope to have more to say about the legacy of Jerzy’s seminal Repbase database.
Timely — perhaps even premature — congratulations to Eric Nawrocki, who is in process of accepting a position at the National Center for Biotechnology Information, where he seems likely to start in January. Eric will remain lead developer of the Infernal RNA homology search codebase. We’ll probably have more to say about that too, and what it means for Infernal development and the Rfam structural RNA database.
It’s getting a little underpopulated here in our Janelia monastery cells. But there’s good reason for that, all part of the master plan… another exodus story that I’ll post about soon.
The Wellcome Trust Sanger Institute is hosting a meeting on computational RNA biology on 11-13 November 2014, co-organized by Alex Bateman (EMBL-EBI), Anton Enright (EMBL-EBI), Mihaela Zavolan (University of Basel), and myself. I’m especially keen on this meeting because we’ve invited several people who have been active in developing computational methods for incorporating new chemical probing data, such as SHAPE, into RNA structure prediction, something that I spent quite some time thinking about for a recent review I wrote for Annual Reviews of Biophysics. Registration deadline is 26 September. See the registration page at the Wellcome Trust Sanger Institute for more information, including the invited speakers list.
Janelia has a scientific director-level position open in our computation and technology group:
“The Janelia Farm Research Campus of the Howard Hughes Medical Institute seeks an exceptional individual to lead a world-class technology effort in advanced computation, data analysis, and instrumentation design and fabrication. The Director of Janelia’s Advanced Computation and Technology group is responsible for leading an array of data-driven technology teams, ranging from data acquisition to data analysis. In collaboration with Janelia research scientists, this person must drive a strategic vision and set priorities for focused investment of effort in sophisticated engineering and computational technology, enabling Janelia’s two scientific goals:(1) understanding the basic mechanisms used by nervous systems to store and process information and (2) developing new methods for image acquisition and analysis. This individual may not necessarily be a neuroscientist or even a biologist themselves; rather, the ideal individual will have extensive technical and engineering expertise in some area of large-scale data analysis, while being excited about applying this expertise in fundamental neuroscience research.”
We’re running ads in all the usual places, and you’ll find some more information about the position there: such as this one in Nature. Please spread the word. I’ll be happy to answer questions about the position informally, for anyone who may be interested, or who may know someone who’d be interested.
The HMMER dev team is happy to announce an upgrade release of HMMER3, release 3.1. A beta test version of the code is publicly available as a tarball available for download, or from hmmer.org, where you’ll also find precompiled binary releases for Mac and Linux.
HMMER 3.1 includes nhmmer and nhmmscan, programs for DNA/DNA homology searches with profile HMMs. nhmmer has already been incorporated in RepeatMasker, in collaboration with Arian Smit and colleagues, and is the software underlying the Dfam database of profiles for mobile DNA elements.
HMMER 3.1 database searches are about twice as fast as HMMER 3.0 was, fulfilling old campaign promises.
HMMER 3.1 includes hmmpgmd, the parallel search daemon underlying HMMER Web Services at hmmer.org.
This code is expected to be stable, but we’re releasing it as a beta test just to be careful. After some time in the wild, we’ll make a release candidate, and if you folks haven’t chewed any of that up too badly, we’ll make the final 3.1 release reasonably soon.
Congratulations to Travis Wheeler, 3.1’s build master — note the TJW on the notes below the fold, not an SRE — the first HMMER release managed by someone besides me (Sean).
Meanwhile… slowly, slowly, HMMER4 takes shape, as the gnomes of HMMER Labs toil sleeplessly on their latest monstrosities. The long awaited return of glocal alignment has been delayed into HMMER4, because the changes required turned out to be, um, quite extensive.
Detailed release notes for 3.1b1 are below the fold.