ENCODE says what?

So I read in the newspaper this week that the ENCODE project has disproven the idea of junk DNA. I sure wish I’d gotten the memo, because this week a collaboration of labs led by myself, Arian Smit, and Jerzy Jurka just released a new data resource that annotates nearly 50% of the human genome as transposable element-derived, and transposon-derived repetitive sequence is the poster child for what we colloquially call “junk DNA”.

The newspapers went on to say that ENCODE has revolutionized our understanding of noncoding DNA by showing that far from being junk, noncoding DNA contains lots of genetic regulatory switches. Well, that’s also odd, because another part of my lab is (like a lot of other labs in biology these days) studying the regulation of genes in a model animal’s brain (the fruit fly Drosophila). We and everyone else in biology have known for fifty years that genes are controlled by regulatory elements in noncoding DNA. (Well, I’ve only known for thirty years, not fifty, I admit — only since Mrs. Dell’Antonio kicked me out of high school biology class and gave me a molecular genetics textbook to read by myself.)

Now, with all respect to my journalist friends, I’ve learned not to believe everything I read in the newspapers. I figured I’d better read the actual ENCODE papers. This is going to take a while. I’ve only read the main Nature paper carefully so far (there’s 30+ of them, apparently, across multiple journals). But it’s already clear that at least the main ENCODE paper doesn’t say anything like what the newspapers say.

The ENCODE project and our existing knowledge of genomes are both vastly more substantial than the discussion the ENCODE authors are provoking in the press right now.

The human genome has a lot of junk DNA

Genome size varies a lot. You might think that apparently more complex organisms like human would have more DNA than simpler organisms like single-celled amoebae but that turns out not to be true. Salamanders have 10-fold more DNA than us; lungfish, about 30-fold more.

So maybe we don’t really know how to define or measure “complexity”; maybe we’re just being anthropocentric when we think of ourselves as complex. Who’s to say that amoebae are less complex than humans? Ever looked at an amoeba? (They’re pretty awesome.) Still. The key observation isn’t just that very different creatures have very different genome sizes; it’s that similar species can have very different genome sizes. This fact, surprising at the time, begged a good explanation. If two species are similar, yet their genomes are 10x different in size, what’s all that extra DNA doing?

This observation about genome sizes (called the “C-value” paradox, for technical reasons) raised the idea that maybe genomes could expand (and shrink) rapidly (on an evolutionary timescale) as a result of some neutral (non-adaptive) processes — that maybe organisms could tolerate DNA that didn’t have a direct functional effect on the organism itself, but was instead was being created and maintained by neutral or even parasitic mechanisms of evolution. Somebody (it’s a good bet that T. Ryan Gregory knows who) dubbed this “junk” DNA, and that was probably an unfortunate term, because it’s incited people’s anger from the day it was coined. It’s not polite to tell someone their beautiful house is full of junk. Even if it is.

A key discovery that satisfactorily explained the C-value paradox was the discovery that genomes, especially animal and plant genomes, contain large numbers of transposable (mobile) elements that replicate all by themselves, often at the (usually slight) expense of their host genome. For instance, about 10% of the human genome is composed about a million copies of a small mobile element called Alu. Another big fraction of the genome is composed of a mobile element called L1. Transposons are related to viruses, and we think that for the most part they are parasitic in nature. They infect a genome, replicating, spreading, and multiplying; eventually they die, mutate, and decay away, leaving their DNA sequences. Sometimes when an Alu replicates and hops into a new place in our genome, it breaks something. Usually (partly because the genome is mostly nonfunctional) a new Alu just hops somewhere else in the junk, and has no appreciable effect on us.

So it turns out that when we look at all these different genome sizes, almost all of the puzzling size variation is explained by genomes having different “loads” of transposable elements. Some creatures, like pufferfish, have only low loads of transposons. Some creatures, like salamanders, lungfish, amoebae, corn, and lilies, are loaded with massive numbers of transposons. As it happens, the human genome is annotated as about 50% transposon-derived sequence — right at that 50/50 borderline where someone can say “the human genome is mostly junk” and someone else can say “the human genome is mostly not junk”.

In 1980, two key papers — by Orgel and Crick, and by Sapienza and Doolittle — nicely laid out the argument that genomes contain “selfish” or “junk” DNA, largely transposon-derived, sometimes quite large amounts of it. These papers are quite beautiful and scholarly. They are careful to say, for example, that it would be surprising if evolution did not sometimes co-opt useful functions from this great amount of extra DNA sequence slopping around. Indeed, we are now finding many interesting examples of transposon-derived stuff being co-opted for organismal function (but these are the exception, not the rule). Without trying to be snide or pedantically academic, I’ll note that the main ENCODE paper cites neither Orgel/Crick or Sapienza/Doolittle; what this means is, regardless of what we read in the newspapers, ENCODE is not actually trying to interpret their data in light of the current thinking about junk DNA, at least in the actual paper.

Transposon-derived sequences are the poster child for “junk DNA” because we can positively identify transposon-derived sequences by computational analysis, and reconstruct the evolutionary history of transposon invasions of genomes. There’s likely to be other nonfunctional DNA “junk” too, in the DNA that we can’t currently put any annotation at all on, but the key point is that the dead bones of many transposons are something we can affirmatively identify.

Noncoding DNA is part junk, part regulatory, part unknown

It is crucial to understand that “noncoding” DNA is not synonymous with “junk” DNA. The current view of the human genome, which ENCODE has now systematically and comprehensively confirmed and extended, is that it is about 1% protein-coding, in perhaps about 20,000 “genes” averaging about 1500 coding bases each (where the concept of a “gene” is amorphous, but useful; we know one when we see one). Genes are turned on and off by regulatory DNA regions, such as promoters and enhancers — as has been worked out over fifty years, starting with how bacterial viruses work. In animals like humans, most people (ok, I) would guess that there are maybe 10-20 regulatory regions per gene, each maybe 100-300 bases long; so, very roughly, maybe on the order of about 1000-6000 bases of noncoding regulatory information per 1500 coding bases in a gene. I’m only giving hand-wavy back of the envelope notions here because it’s actually quite difficult to pin these numbers down exactly; our current knowledge of regulatory DNA sequences in detail is distressingly incomplete. That’s something that ENCODE’s trying to help figure out, in systematic fashion, and where a lot of ENCODE’s substantive value is. The point is, we already knew there was likely at least as much regulatory DNA as coding DNA, and probably more; we just don’t have a very satisfying handle on it all yet, and we thought we needed an ENCODE project to survey things more comprehensively.

So when you read a Mike Eisen saying “those damn ENCODE people, we already knew noncoding DNA was functional”, and a Larry Moran saying “those damn ENCODE people, there is too a lot of junk DNA”, they aren’t contradicting each other. They’re talking about different (sometimes overlapping) fractions of human DNA. About 1% of it is coding. Something like 1-4% is currently expected to be regulatory noncoding DNA given what we know (and our knowledge about regulatory sites is especially incomplete). About 40-50% of it is derived from transposable elements, and thus affirmatively already annotated as “junk” in the colloquial sense that transposons have their own purpose (and their own own biochemical functions and replicative mechanisms), like the spam in your email. And there’s some overlap: some mobile-element DNA has been co-opted as coding or regulatory DNA, for example.

Now that still leaves a lot of the genome. What’s all that doing? Transposon-derived sequence decays rapidly, by mutation, so it’s certain that there’s some fraction of transposon-derived sequence we just aren’t recognizing with current computational methods, so the 40-50% number must be an underestimate. So most reasonable people (ok, I) would say at this point that the human genome is mostly junk (“mostly” as in, somewhere north of 50%).

At the same time, we still have only a tenuous grasp on the details of gene regulation, even though we think we understand the broad strokes now. Nobody should bet against finding more and more regulatory noncoding DNA, either. The human genome surely contains a lot of unannotated functional DNA. The purpose of the ENCODE project was to help us sort this out. Its data sets, and others like them, will be fundamental in giving us a comprehensive view of the functional elements of the human genome.

ENCODE’s definition of “functional” includes junk

ENCODE has assigned a “biochemical function” to 80% of the genome. The newspapers add, “therefore it’s not junk”, but that’s a critically incorrect logical leap. It presumes that junk DNA doesn’t have a “biochemical function” in the sense that ENCODE chose to operationally define “function”. So in what sense did ENCODE define the slippery concept of biological function, to allow them to assign a human genome fraction (to two significant digits, ahem)?

ENCODE calls a piece of DNA “functional” if it reproducibly binds to a DNA-binding protein, is reproducibly marked by a specific chromatin modification, or if it is transcribed. OK. That’s a fine, measurable operational definition. (One might wonder, why not just call “DNA replication” a function too, and define 100% of the genome as biochemically functional, but of course, as Ewan Birney (the ENCODE czar) would tell you, I would never be that petty. No sir.) I am quite impressed by the care that the ENCODE team has taken to define “reproducibility”, and to process their datasets systematically.

But as far as questions of “junk DNA” are concerned, ENCODE’s definition isn’t relevant at all. The “junk DNA” question is about how much DNA has essentially no direct impact on the organism’s phenotype – roughly, what DNA could I remove (if I had the technology) and still get the same organism. Are transposable elements transcribed as RNA? Do they bind to DNA-binding proteins? Is their chromatin marked? Yes, yes, and yes, of course they are – because at least at one point in their history, transposons are “alive” for themselves (they have genes, they replicate), and even when they die, they’ve still landed in and around genes that are transcribed and regulated, and the transcription system runs right through them.

Thought experiment: if you made a piece of junk for yourself — a completely random DNA sequence! — and dropped it into the middle of a human gene, what would happen to it? It would be transcribed, because the transcription apparatus for that gene would rip right through your junk DNA. ENCODE would call the RNA transcript of your random DNA junk “functional”, by their technical definition. And if even it weren’t transcribed, that would be because it acted as a different kind of functional element (your random DNA could accidentally create a transcriptional terminator).

The random genome project

So a-ha, there’s the real question. The experiment that I’d like to see is the Random Genome Project. Synthesize a hundred million base chromosome of entirely random DNA, and do an ENCODE project on that DNA. Place your bets: will it be transcribed? bound by DNA-binding proteins? chromatin marked?

Of course it will.

The Random Genome Project is the null hypothesis, an essential piece of understanding that would be lovely to have before we all fight about the interpretation of ENCODE data on genomes. For random DNA (not transposon-derived DNA, not coding, not regulatory), what’s our null expectation for all these “functional” ENCODE features, by chance alone, in random DNA?

(Hat tip to The Finch and Pea blog, a great blog that I hadn’t seen before the last few days, where you’ll find essentially the same idea.)

Evolution works on junk

Even if you did the Random Genome Project and found that a goodly fraction of a totally random DNA sequence was “functional”, transcribed and bound and chromatin-marked, would this somehow diminish your view of the human genome?

Personally, I don’t think we can understand genomes unless we try to recognize all the different noisy, neutral evolutionary processes at work in them. Without “noise” — without a background of specific but nonfunctional transcription, binding, and marking — evolution would have less traction, less de novo material to grab hold of and refine and select, to make it more and more useful. Genomes are made of repurposed sequence, borrowed from whatever happened to be there, including the “junk DNA” of invading transposons.

As Sydney Brenner once said, there’s a difference between junk and garbage; garbage is stuff you throw out, junk is stuff you keep because it just might be useful someday.

Conflict of interest/full disclosure: I was a member of the national advisory council to the NIH National Human Genome Research Institute at the time ENCODE was conceived and planned – so I’m not quite as innocent and disinterested in policy questions of NIH NHGRI big science projects and media engagement strategy as this post may have made it sound.

55 Comments

  1. Mike Eisen called for a neutral model of genome function, which I think is close to your random genome. (Of course, actually physically realizing the random genome would require some improvements in nucleus injection, not to mention cheaper DNA synthesis. Unless perhaps you allowed a big random insertion in an existing viable genome…)

    There are also numerous models of transposon activity, from which one could imagine building a neutral model of genome *architecture* (c.f. the Michael Lynch book you linked to), at least in theory. Those models are rather parameter-rich, though (as would be Eisen’s neutral model of function). There is currently nothing as simple as Kimura’s neutral model for evolution, or Hubbell’s neutral model of biodiversity (both rather beautiful null hypotheses)

    Like

    Reply

  2. We have a design for the “random genome project” on the drawing board, cheap and do-able today, with only one leeetle teensy experimental wrinkle that might be problematic. But as they say in science fiction, you’re allowed one miracle in any good story — just no more than one. If someone wanted to come spend some time in the lab, say from horrible cold Berkeley to beautiful Washington DC…

    Like

    Reply

  3. I can’t resist ratcheting this up yet another level. About to put the kids to bed so it will be shorter than optimal. Bottom line: y’all only talk about ‘junk’ in the ASSEMBLED human genome, which in fact is only ~70% of the full human genome….the rest is the real poster child for junk….simple satellite repeats in centromeric and pericentromeric regions. Not of course protein coding, but clearly required for genome propagation and function (centromeres for sure, probably nuclear architecture) and behave in amazing ways during evolution (highest mutability, homogenized by molecular drive mechanisms, whatever those are).
    I too found the whole discussion about the ENCODE surprises ridiculous. The above describes one reason ….duh, those of us who have worked on centromeres and heterochromatin and TEs and satellites have for years thought about function differently from those focused on protein coding genes…..but in addition, modENCODE dealt with all of this in papers over a year ago…. of course that was flies and worms and they just don’t matter compared to human tissue culture cells 🙂

    Like

    Reply

  4. One thing in particular caught my eye reading this: ‘The “junk DNA” question is about how much DNA has essentially no direct impact on the organism’s phenotype – roughly, what DNA could I remove (if I had the technology) and still get the same organism.’

    I’m an outsider to the field, so I ask these questions earnestly. Couldn’t junk DNA have non-obvious effects? That is, perhaps in terms of its biochemical activity it is for all intents and purposes neutral, but might it provide critical structure to chromosomes? Or in some way affect the dynamics of gene transcription? Though you say transcription would ‘rip right through’, would the time spent transcribing certainly be trivial? It would be surprising to me if you could remove that much mass/length/structure from the genome and not change some aspect of the developmental process. Maybe the changes would be trivial. I suppose if they weren’t, the sense of functional would have to be expanded so as to encompass non-biochemical effects.

    Just a thought. Again, I’m an outsider. Maybe others have already dealt with these questions!

    Like

    Reply

  5. It may not be necessary to introduce Mb of random DNA into mammalian cells. To my best knowledge there is at least on D.Melanogaster species that captured a complete Wolbachia genome recently (in geological terms). For a start it may be sufficient to build a complete E.coli genome into a mini gene and to introduce it into some mammalian cell line. My prediction is that by ENCODE’s defiition it will be as functional as the cell’s own DNA.

    Like

    Reply

  6. Gary: yeah, the assembly’s up to 2.9 Gb now (about 90% of the genome) but you’re right. I noticed that ENCODE gives the genome size as 2.9G (the assembly), not 3.1-3.2G (which I think is what it’s really supposed to be), which does probably show something about mindset. But that’s been a common error throughout genomics, starting from the days of “we assembled the whole human genome (cough cough, we mean, the part of the genome that we could assemble)”.

    Sparc: shush. shush. you’re absolutely correct. It’s D. ananassae. You’re giving away lab secrets. And there are plenty of examples of people introducing artificial constructs into cells and seeing the vector backbone “function”. I’d still rather do it with completely random DNA someday, because with the kind of arguments that go in in this field, someone is sure to claim that Wolbachia and E. coli retain a homeopathic memory of their ancestry with animal genomes, so they’re recognized by the Drosophila or human transcriptional apparatus.

    Tim: uh, you don’t sound like an outsider. Yes, “junk” could have all sorts of indirect effects. For example, if you introduce a 10x load of extra DNA that soaks up a bunch of DNA-binding proteins, the cell is going to have to compensate by making more of those proteins; if you then suddenly removed the extra DNA, you can expect to see a big perturbation. And like Gary said above you, some very functional DNA, like telomeres and centromeres, is highly repetitive (Drosophila telomeres are made out of transposons, one of the great examples of co-option). And the timing of transcription does make a big difference, just as you intuit: in Drosophila embryos (and probably elsewhere) there are examples of genes that use their junked-up length as a regulatory mechanism: in early embryos, cells divide so fast that the gene never finishes transcription, but when cell division slows down, the gene has time to finish, and complete mRNA gets expressed. Personally I think it’s all sort of gorgeous, how many ways evolution hacks the system together.

    Like

    Reply

  7. Sydney Brenner had an amusing alternative hypothesis for the c-value paradox: that
    the “junk” DNA mightnbe required to maintain the viscosity of the nucleus. like many
    of his jokes, this had a serious thought behind it. he also admonished us to remember
    that what you discard is,called “garbage” but “junk” is what you keep. in this case,
    the transposing sequences keep themselves, so to speak. over the years I have found
    these Brenner sayings useful.

    on a more serious note– I think all this should be sent to nature as rebuttal. I also,
    think it is an object lesson that hyping one’s own work in the hope of impressing
    Congress is very likely to backfire.. we scientists need to strive for objectivity as
    best we can lest we make ourselves no longer credible.

    Like

    Reply

  8. Fantastic post. Sean, your response to Tim’s comment and comments by Michael Eisen regarding DNA binding proteins and transcription factors interacting randomly, or at least non-functionally, throughout the genome gave me pause…perhaps some of this ‘junk’ DNA, derived over time from transposable elements, etc., has indeed aquired a novel but less obvious function as a giant parking garage, or repository, for DNA binding proteins to stably settle between jobs or prior to function, instead of regerating de novo these proteins every time they are required by their respective cellular processes. Of course this would require an entire new level of regulation – shuttling proteins to and fro the job site. Given this hypothesis, it seems that removing large amounts of this ‘junk’ DNA would have deleterious effects on the cell. So my question is, is there any lab out there designing the experiments or technology to remove significant amounts of this putative ‘junk’, to ultimately test the function of all this extra DNA?

    Like

    Reply

  9. As DNA synthesis costs come down, the Random Genome Project should become feasible…

    As it is, synthesis is cheap enough that I was able to recently synthesize 84 kb of random DNA to serve as my baseline, control distribution in a high-throughput enhancer assay. The assay is plasmid-based, and so admittedly done in an artificial context, but the results are striking – 1) it’s easy to see activity from random DNA, and 2) many classes of genomic sites that look like they should be functional don’t behave differently from random DNA.

    Like

    Reply

  10. Regarding the comments on the unassembled portion of the genome. Do we have the ribosomal RNA genes assembled yet? This paper by Stults et al. suggests we don’t. So, just for interest I ran a BLAT against the human genome using the UCSC genome browser for our 5S rRNA, SSU rRNA and LSU rRNA (including 5.8S rRNA) genes. The top ranked regions for each were chr1:228,766,136-228,766,255, chrUn_gl000220:109078-110946 and chrUn_gl000220:112024-118417.

    What I find interesting is that if we believe UCSC’s conservation track then one of the most conserved set of genes on Earth is not. If we believe the expression data, then one of the most abundant transcripts in any cell is not. As for the rest of the data, I think we can make a pretty good case that the ribosome is “biochemically functional” and therefore should be including much of chromosomes 1q42, 13p12, 14p12, 15p12, 21p12, and 22p12 in these calculations.

    I realise the rRNAs are an enormous pain to deal with. I ran Rfam for four years and the ribosomal RNA families broke just about every bioinformatic pipeline I ever wrote. However, I think the ribosome deserves better.

    Like

    Reply

  11. @John Little: I think you asked the critical question I was trying to get at. And you gave an interesting possibility for a ‘function’ to junk DNA that’s of a different nature than that of non-junk DNA. I think if the null were to be rejected in the experiment you proposed that would force a reconceptualization of what functional can mean with respect to regions of the genome.

    @Sean: Thanks for the response. I agree, it’s incredibly gorgeous the way it’s cobbled together. And I am an outsider relative to the level of expertise here. My home base is in neuroscience. This discussion has definitely grabbed my interest though 🙂

    Like

    Reply

  12. Terrific. I too cringe at the “functional” definition. A better definition is “a change whose fitness effect is detectable by natural selection”. Set aside for the moment our inability to measure fitness effects; this is true, unfortunate, and irrelevant to the question of whether the definition is correct. The crucial point is that many biochemically measurable changes will fail this challenge in a way that is meaningful: such changes are effectively neutral and are indistinguishable, in evolutionary terms, from no change at all. (This is not a novel thought, and I am basically channeling Lynch here.)

    One can envision a research program which attempts to ascertain which measurable changes have evolutionary consequences. As far as I know, little is being done on this front. The Random Genome Project is an extreme but highly informative take on a quasi-inverse point: absent selection, what measurable features arise? The point is that the RGP, as a kind of null hypothesis, is specifically a null in the case of zero selection — not the case of zero function. Creating a genome with zero function might require effort!

    Selectability is not the only, nor the best, definition. A change might wreak havoc on the on an organism but, by virtue of a tiny effective population size, be undetectable by selection, something quite unsatisfying to a biochemist. Still, it provides a way to cull the many measurable-but-“irrelevant” changes by providing a meaningful framework for evaluating relevance. Some such framework is clearly needed. What other framework is there?

    Like

    Reply

  13. The quote from Sydney above that “the “junk” DNA might be required to maintain the viscosity of the nucleus” is really interesting. In a similar vein, we have done some studies, which show that some DNA sequences in the genome may just be “filler” sequences to keep adjacent functional ones from doing too much.

    Like

    Reply

  14. Allan: you’ve already foreseen phase II of the Random Genome Project. The random genome would produce transcripts (“genes”), and my bet is that if we applied standard experimental techniques to them — mutate them, knock them out by RNAi, overproduce them on constructs, look for perturbations of other (real) gene expression levels — we would see reproducible and significant phenotypes (albeit marginal ones, of the sort that we see all the time in reverse genetics studies). I completely agree, I don’t think a random chromosome would be “functionless” in the system. I think this is part of the RGP’s value as a null hypothesis.

    Feng: I agree, and I think that’s part of the slipperiness of the term “function”, and why the term “junk” is only a colloquialism. The junk on my desk is junk, but if you suddenly removed it, my coffee cup would fall over and spill into my laptop; the junk has become part of the system.

    Like

    Reply

  15. Great post Sean, I had the same reaction and I couldn’t have written it better myself.
    Susumu Ohno’s office was next to my room when I wrote my dissertation. By that time (early 90s) he wasn’t as sharp anymore, but I seem to remember himself explaining the difference between junk and garbage, which now seems to be credited to Brenner.

    Like

    Reply

  16. To the people discussing a structural role for junk DNA: in that context it bears mentioning that in terms of the the amount of hereditary information passed on, “spacer” DNA would have to be pretty small potatoes in comparison to sequence-dependent DNA. The latter represents on the order of 2*n bits for an n-base sequence, the former on the order of log_2(n) bits. It takes 4096 bits to specify a specific sequence of 2048 bases, but just specifying a length of 2048 requires only 12 bits. (Well, you might say you could need more since other sequences might run much longer, but it still only takes 32 bits to specify the length of any sequence shorter than the genome itself.)

    Like

    Reply

  17. […] When the ENCODE consortium publications were released last week, a media blitzkrieg ensued. Soon after, there was a backlash by scientists based on some of the claims that they were seeing made. Some of the issues were due to flawed representations in the press that were legitimate targets of the scientists. Some of the attacks on the science writers were unfair. Some folks had issues with the publication process. Some pushback on the “big science” structure and funding arose. Another thread of discussion was about some of the global claims by the ENCODE team—largely about the parsing of the term “functional”. But this parsing discussion was actually quite informative and useful—the good kind of “inside baseball” that goes on among scientists. Although to people outside the field it may be misunderstood, that’s the way we challenge each other and it’s not personal—it’s about the data. It was like watching a huge world-wide lab meeting take place over a few days via twitter and blogs, and it was really pretty cool. (My favorite take on that drama so far was Sean Eddy’s piece: ENCODE says what?) […]

    Like

    Reply

  18. I don’t think a random genome is a good null model for ENCODE: random genome will tell you something about functional sequences, but not in the right context. ENCODE is trying to figure out which sites in the human genome is functional, not which sequences are functional. For example, the consensus sequence of a transcription factor binding site could be functional near the transcription start site, but non-functional in the middle of a gene desert. In other words, the function of a DNA element is dependent on its genomic context, which will be completely destroyed in a random genome.

    My biggest problem with ENCODE is that they attempt to find regulatory elements by doing experiments under only one condition. Since regulation is only necessary in varying environments, it seems like they would be more likely to find functional regulatory elements if they performed their measurements in multiple conditions.

    Like

    Reply

  19. Hi Sean. Great job! A modest suggestion: I think we need to update Sydney’s (or Susumu’s) ‘junk / garbage’ terminology, which seems a bit outmoded in our sustainability-oriented era. Here in green Seattle, we have a multiplicity of terms for household waste (also for rain, as you might imagine), and the different types have different fates. There is ‘garbage’, which goes to a landfill. There are ‘recyclables’, which get recycled. And there is ‘yard waste’, which includes all sorts of compostable, biodegradable stuff (including food & leaves) and which goes somewhere to molder and slowly decay. I submit that the ENCODE folks are in fact correct: the genome is not mostly ‘junk’, it is instead mostly ‘yard waste’ (perhaps with a few recyclables thrown in).
    One other comment: although often the press deserves much of the blame for miscommunicating scientific findings, in this particular case it should all go to the ENCODE project scientists who talked to them. I won’t name names, since many of them are my friends, but in these news articles there are some appalling quotes from people who should know better. My perhaps overly cynical suspicion is that they have an undeclared financial interest in claiming most of the genome has (unknown) functions, since that means more research money to figure it all out. But one consequence could be to make the public think genome scientists are clueless. We don’t exactly look competent when we confidently say first ‘junk’, then ‘oops, no junk’, particularly when we were right the first time.

    Like

    Reply

  20. First of all I would like to express my interest for having access to this type of discussion where formed opinions are over politics. I am not so well-formed in the subject so I would like to make a question and I hope not to sound too naïve with it.
    What are the implications of recognizing the active role of this part of DNA in the whole metabolism of cells for the actual concept of Genetic design (GMO´s)? Could the outcome of this research be interpreted as an indication of that manipulating and redesigning DNA sequences (with the actual state of knowledge) could trigger malfunction in a very sophisticated “switchboard”?

    Like

    Reply

  21. Five reasons why my theory on the function of ‘junk DNA’ is better than theirs

    I intend to submit the paper below for publication in a peer-reviewed journal. Before submitting it, and have it reviewed by a handful (if that) of peers, I decided to post it here on the Blogosphere Preprint Server, which is rapidly becoming the front-line platform for transparent and comprehensive evaluation of scientific contributions.

    The ENCODE project has produced high quality and valuable data. There is no question about that. And, the micro-interpretation of the data has been of equal status. The problem is with the macro-interpretation of the results, which some consider to be the most important part of the scientific process. Apparently, the leaders of the ENCODE project agreed with this criterion, as they came out with one of the most startling biological paradigm since, well, since the Human Genome Project has shown that the DNA sequences coding for proteins and functional RNA, including those having well defined regulatory functions (e.g. promoters, enhancers), comprise less than 2% of the human genome.

    According to ENCODE’s ‘big science’ conclusion, at least 80% of the human genome is functional. This includes much of the DNA that has been previously classified as ‘junk DNA’ (jDNA). A metaphorically presented, in both scientific and lay media, ENCODE’s results means the death of the jDNA.

    However the eulogy of jDNA (all of it) was written more than two decades ago, when I proposed (and conceptually proven) that ‘jDNA’ functions as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting functional DNA (fDNA) from inactivation or alteration of its expression (see a copy of my paper posted here: http://sandwalk.blogspot.com/2012/06/tributre-to-stephen-jay-gould.html; also, see a recent comment in Science, that I posted at Sandwalk: http://sandwalk.blogspot.com/2012/09/science-writes-eulogy-for-junk-dna.html ).

    So, how does ENCODE theory stack ‘mano-a-mano’ with my theory? Here are five reasons why mine is superior:

    #5. In order to label 80% of the human genome functional, ENCODE changed the definition of ‘functional’; apparently, 80% of the human genome is ‘biochemically’ functional, which from a biological perspective might be meaningless. My model on the function of jDNA is founded on the fact that DNA can serve not only as an information molecule, a function that is based on its sequence, but also as a ‘structural’ molecule, a function that is not (necessarily) based on its sequence, but on its bare or bulk presence in the genome.

    #4. Surprisingly, ENCODE theory is not explicitly immersed in one of the fundamental tenets of modern biology: Nothing in biology makes sense except in the light of evolution. Indeed, there is no talk about how jDNA (which contain approximately 50% transposon and viral sequences) originated and survived evolutionarily. On the contrary, my model is totally embedded and built on evolutionary principles.

    #3. One of the major objectives of the ENCODE project was to help connect the human genome with health and diseases. Labeling 80% of these sequences ‘biochemically functional’ might create the aura that these sequences contain genetic elements that have not yet been mapped out by the myriad of genome wide studies; well, that remains to be seen. In the context of my model, the protective function of jDNA, particularly in somatic cells, is vital for preventing neoplastic transformations, or cancer; therefore, a better understanding of this function might have significant biomedical applications. Interestingly, this major tenet of my model can be experimentally addressed: e.g. transgenic mice carrying DNA sequences homologous to infectious retro-viruses, such as murine leukemia viruses (MuLV), might be more resistant to cancer induced by experimental MuLV infections as compared to controls.

    #2. The ENCODE theory is a culmination of a 250 million US dollars project. Mine, zilch; well, that’s not true, my model is based on decades of remarkable scientific work by thousands and thousands of scientists who paved the road for it.

    #1. The ENCODE theory has not passed yet the famous Onion Test ( http://www.genomicron.evolverzone.com/2007/04/onion-test/), which asks: why do onions have a genome much larger than us, the humans? Do we live in an undercover onion world? The Onion Test is so formidable and inconvenient that, to my knowledge, it has yet to make it through the peer review into the conventional scientific literature or textbooks. So, does my model pass the Onion Test? I think it does, but for a while, I’m going to let you try to figure it out how! And, maybe, when I’m going to submit my paper for publication, I’ll use your ideas, if the reviewers will ever ask me for an answer. Isn’t that smart?

    Like

    Reply

  22. In my parodic comment above, ”Five reasons why my theory on the function of ‘junk DNA’ is better than theirs”, I brought forward an old model (1) on the genome evolution and on the origin and function of the genomic sequences labeled ‘junk DNA’ (jDNA), which in some species represents up to 99% of the genome.

    Since then, I posted in Science five mini-essays outlining some of the key tenets associated with this model, which might solve the C-value and jDNA enigmas ( http://comments.sciencemag.org/content/10.1126/science.337.6099.1159).

    As discussed in the original paper (1) and these mini-essays, the so called jDNA serves as a defense mechanism against insertional mutagenesis, which in humans and many other multicellular species can lead to cancer.

    Expectedly, as an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the insertional mutagenesis activity and the evolutionary constrains on genome size.

    1. Bandea CI. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. 1990.

    Like

    Reply

  23. Sean, I read the pre-print at:
    http://selab.janelia.org/publications/Eddy12/Eddy12-preprint.pdf

    Please be aware that my criticism originates in a genuine desire to understand the genome- that is my life’s work. I believe that we have a long way to go to understand the genome, and the technology we have today to interrogate the genome is far from capable to explain some of the “apparently” paradoxical genome size question.

    Your preprint’s primary use is in clearly stating the assumptions and reasoning you made. And that’s what’s important here: being explicit about the assumptions and reasoning you made makes it a lot easier to argue about the underlying facts. For example you make the assumption “Selfish DNA
    elements function for themselves, rather than having an adaptive function for their host.” (although you allude to a far more subtle interplay between organismal transposition and its abuse by self-replicating entities).

    The problem is that all you or ENCODE is doing when debating the 80% figure is arguing about definitions, which isn’t particularly interesting. What would be more interesting is to look at the ENCODE data in detail, and understand the WHY of what they observe. If we can explain it with null hypotheses, rather than overreaching conclusions, that’s a good thing. If instead the new data helps us understand something vexing, that’s a *great* thing.

    Ultimately, i think we can resolve the entire debate by having ENCODE:
    1) admit that their function definition is very loose
    2) admit some of their claims are overreaching
    3) spend a lot more time coming up with significantly more sensitive and accurate methods to determine actual “functionalism” in genomic DNA.

    and having ENCODE’s detractors:
    1) spend a lot more time looking at the ENCODE data
    2) trying to disprove some of their own beliefs and assumptions. I have a hunch that ENCODE’s data is telling us something,

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s