Computational challenges in large-scale sequencing

Thanks to everyone who provided comments and suggestions for my talk for NIH NHGRI’s five-year planning process meeting on The Future of the Large-Scale Sequencing Program, March 23-24 in Bethesda. A copy of my slides is here, if you’re interested.


  1. I borrowed that slide from Dan Meiron, and actually both of us have used it as an example of what *not* to do; e.g. a concrete example of how overly hyped many claims of “data deluge” are. (It’s hard to tell that from the slide without hearing how I talked about it.) Note, for example, that the y-axis of that plot is already on a log scale; so for there to be an exponential increase in data on that plot, the rate of data accumulation would be superexponential, not exponential. The plots is adding new acquisition capabilities together in layers *in log space*, which means that somehow the data acquired by new means are *multiplying*, not summing with previous technologies, which is nonsense. Meiron’s version goes on to show a big red ‘X’ over the whole slide. A back of the envelope calculation of complete imaging at superhigh space/time resolution of the entire Earth surface can’t even reach the data rates that high. Military imaging data acquisition does have some of the same data throughput issues we see in other fields, but it’s just nowhere, nowhere near as bad as that bogus slide implies.



Leave a Reply to Ian Holmes Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s