Imagine you’re a legal US resident, with a legal position in the US. You’re away visiting your family in another country, and when you try to fly back home to the US, the US government won’t let you back in. You could be a PhD student or a postdoc in a research lab here, living here for years, all your stuff and your friends are here – doesn’t matter. Tough for you. Go home to your country, US border officials tell you.
Bullshit, right? Right. Now go sign this.
These are not American values. It has to stop.
I’ve been thinking about how we do “bioinformatics” in experimental biology, and I had the opportunity to talk about it recently. The following is the transcript of a keynote address I gave at Janelia’s meeting on High Throughput Sequencing for Neuroscience last weekend.
So I read in the newspaper this week that the ENCODE project has disproven the idea of junk DNA. I sure wish I’d gotten the memo, because this week a collaboration of labs led by myself, Arian Smit, and Jerzy Jurka just released a new data resource that annotates nearly 50% of the human genome as transposable element-derived, and transposon-derived repetitive sequence is the poster child for what we colloquially call “junk DNA”.
The newspapers went on to say that ENCODE has revolutionized our understanding of noncoding DNA by showing that far from being junk, noncoding DNA contains lots of genetic regulatory switches. Well, that’s also odd, because another part of my lab is (like a lot of other labs in biology these days) studying the regulation of genes in a model animal’s brain (the fruit fly Drosophila). We and everyone else in biology have known for fifty years that genes are controlled by regulatory elements in noncoding DNA. (Well, I’ve only known for thirty years, not fifty, I admit — only since Mrs. Dell’Antonio kicked me out of high school biology class and gave me a molecular genetics textbook to read by myself.)
Now, with all respect to my journalist friends, I’ve learned not to believe everything I read in the newspapers. I figured I’d better read the actual ENCODE papers. This is going to take a while. I’ve only read the main Nature paper carefully so far (there’s 30+ of them, apparently, across multiple journals). But it’s already clear that at least the main ENCODE paper doesn’t say anything like what the newspapers say.
The ENCODE project and our existing knowledge of genomes are both vastly more substantial than the discussion the ENCODE authors are provoking in the press right now.
Science is running a poll titled “The Well-Behaved Scientist” this week that asks “how should we promote publication of data that can be replicated and reproduced?” Of the ideas on their list — more funding from funding agencies, more rewards from institutions — conspicuous in its absence is the rather fundamental idea that the purpose of scientific journals, including Science, is to publish reproducible research.
The National Academy of Sciences has just released the report Sequence-Based Classification of Select Agents: A Brighter Line. A committee of 13 of us, chaired by Jim LeDuc (Director, Galveston National Laboratory, University of Texas Medical Branch) and cat-herded by India Hook-Barnard (NAS), has been working on this report for the past year or so. It’s good to see it done. There is an NAS press release, and Nature and Science, among others, have already picked up the story.
The NIH National Human Genome Research Institute is going through a process of making plans for the next five years of research in genomics, including computational genomics. In the past couple of months I’ve been at two planning meetings – the Cloud Computing workshop (31 March – 1 April) and most recently the NHGRI Informatics and Analysis Planning Meeting (21-22 April). Goncalo Abecasis and I got the job of trying to summarize the consensus of the Informatics and Analysis meeting. I’m not sure how good I am at identifying consensus, but I just sent off four pages of notes to Vivien Bonazzi, one of NHGRI’s informatics program officers, describing some of my personal views of the future, strongly colored by the discussions at the planning meetings. I thought I’d share the same comments here. Transparency in government and all that. Just imagine all the potential conflicts of interest here; good thing I’m paid by a dead billionaire, not so much by federal tax dollars.
Anyone with an interest in how the sausage is made at NIH might want to peek under the hood, below. Continue reading
There are two ways of spreading light: to be the candle or the mirror that reflects it.
— Edith Wharton, Vesalius in Zante
On April 14, the US Patent and Trademark Office awarded us a trademark on HMMER. This is a good moment to explain how we plan to deal with intellectual property.
HMMER is scientific software, and its methods are described in journal publications. That means that it must be made available in a form that enables any scientist to understand, reproduce, and extend — like any other result of a scientific paper. For software, this is essentially the same as what people mean by “open source”. Our intent is to make HMMER widely and freely available to the entire scientific community as open source code. At the same time, we have to recognize that HMMER is a large, growing, and increasingly valuable codebase, not just a one-off result, so we’re taking steps to make sure we can sustain it as a long term, coherent open source project. Continue reading