A new bioRxiv preprint from Rafael Tavares, Anna Marie Pyle, and Srinivas Somarowthu challenges conclusions of our 2017 Nature Methods paper where we describe R-scape, a method for detecting support for conserved RNA secondary structure in sequence alignments by statistical analysis of base pair covariations. In our paper, among other things, we showed that the evidence presented by Somarowthu (2015) in support of a putative conserved structure for the HOTAIR lncRNA was not statistically significant, using the same alignment that they had analyzed. The new Tavares paper argues that by changing R-scape’s default statistic to a different one called RAFS, now statistically significant evidence for conserved structure is detected in their HOTAIR alignment and others.
Tavares’ conclusions depend on an assumption that the RAFS statistic is an appropriate measure of RNA base pair covariation, but RAFS was not designed to measure covariation alone. RAFS detects positive signals in common patterns of primary sequence conservation in absence of any covariation. The problem is severe; Tavares’ analysis reports “significantly covarying base pairs” in 100% identical sequence alignments with no variation or covariation. The base pairs that Tavares et al. identify as significantly covarying actually arise from primary sequence conservation patterns. Their analysis still reports similar numbers of “significant covarying” base pairs in negative controls in which we permute residues in independent alignment columns to destroy covariation. There remains no significant covariation support for evolutionarily conserved RNA structure in the HOTAIR lncRNA or other lncRNA structures and alignments we have analyzed.
We have posted a PDF of a full response to the Tavares et al. preprint on the lab’s web site.
HMMER 3.2.1 is now available, fixing some small issues, mostly portability issues having to do with compilation on PowerPC machines and ancient x86 machines.
Thanks to the GCC Compile Farm folks for access to ppc64be and i686 machines that I’m using for testing now!
The glorious master plan was to finish HMMER4 while hoping that HMMER3 stayed stable. Alas, HMMER4 development has been even slower than expected, and bugs and bitrot have accumulated on HMMER3. Here’s a new HMMER 3.2 release to tide us all over. I’m managing HMMER releases again, with Travis Wheeler having moved a while ago to a faculty position at U. Montana.
You can get the HMMER3.2.1 source tarball from here.
Continue reading →
Astronomy began when the Babylonians mapped the heavens. Our descendants will certainly not say that biology began with today’s genome projects, but they may well recognize that a great acceleration in the accumulation of biological knowledge began in our era.
Graeme Mitchison wrote those opening lines of our book Biological Sequence Analysis in Richard Durbin’s parents’ house in London. We four coauthors had borrowed the house for a month to write together, knowing that we had to get Richard out of the Sanger Centre or no progress would be made. The living room looked like a spy ring’s safe house, drapes drawn and full of improvised desks, computers, printer, and papers. We paired off in warring alliances to write, to cook, to argue, and to take long walks on the Hampstead Heath to cool down. At one point over a late dinner and wine, Anders Krogh proposed that one could make a hidden Markov model to recognize each of our writing styles. Richard proposed that mine could be recognized trivially by a high emission probability of the word “simple”. I recall snapping something back. I was struggling to draft our introduction and feeling defensive. At some point Graeme took it from me and in a few strokes replaced my clumsy efforts with the chapter that began with the beautiful lines above.
Continue reading →
I’m looking for four teaching fellows (TFs) for my course MCB112 Biological Data Analysis in the fall 2018 semester. TFs are typically Harvard G2 or G3 students (second- or third-year PhD students, in Harvard-speak), but can be more senior students or even postdocs. I teach the course in Python and Jupyter Notebook, using numpy and pandas, so experience in these things is a plus. Email me if you’re interested, or if you know someone else at Harvard who might be interested, let them know.
Tim Sackton’s Harvard bioinformatics core, in collaboration with Catherine Dulac’s lab in the MCB Department, is still searching for a bioinformatics scientist to work on single cell RNA-seq analysis in the mouse brain. See the ad in Science for more details!
The Harvard FAS Informatics group, led by Tim Sackton, has an open bioinformatics scientist position, to work with Catherine Dulac’s lab on large-scale single-cell RNA-seq in mouse brain. For more information on the position and how to apply, see the ad on the Harvard jobs site.