Farrar’s striped SIMD Smith/Waterman

Michael Farrar’s striped SIMD Smith/Waterman source code [swsse2; Bioinformatics, 2007], is now available under an open source license for the first time. You can download it [here].

In 2007, Michael Farrar introduced a beautiful method for speeding up sequence alignment algorithms. Farrar’s “striped” algorithm rearranges dynamic programming matrix indices in a counterintuitive way that meshes neatly with the parallel vector instructions on modern CPU hardware (so-called SIMD vector instructions). Farrar did this for fun, writing in his free time while working as an embedded systems engineer at a telecommunications company.

He only made the source code accompanying his 2007 Bioinformatics paper available under a restrictive no-commercial-use license that is incompatible with open source licenses. He also filed for a patent on the technology. When I developed methods for accelerating profile HMM algorithms in my open source HMMER3 code, I used his striping idea in my implementation, and I obtained a license from him for using the technology in case the patent issued. (It didn’t.) I recruited him to my laboratory at HHMI Janelia Farm in 2008. While at Janelia, he contributed his striped Smith/Waterman code to Bill Pearson’s FASTA package.

Farrar died in December 2010, tragically and unexpectedly. He is survived by his wife Annelee and their three children.

The reputation and use of his code and his technique has grown and spread. In some cases, people have copied his source code and incorporated it into open source bioinformatics codes. To my knowledge, with the exception of FASTA, none of these derivatives are properly licensed, because of the license conflict. Of course, it wasn’t possible to negotiate new agreements with him. In two cases that I know of, his code was copied in published works with his name, copyright, and license information removed, perhaps to obscure the fact that these authors knew there’s a licensing problem. In my view, plagiarizing his work is even worse than violating his license.

I think a good long-term solution to the problem is to release his source code under an open source license. Over the past few weeks I’ve discussed this with his wife Annelee, his heir. She agrees with me that he would have wanted his code to be used, and that the important thing is for him to be credited for his work. Annelee has agreed to let me release his source code under a BSD open source license, and I’ve done this. You can download an open source code tarball from [here].

If you are using a copy of Farrar’s code, you can replace it with this version, or replace the copyright and license statements with what’s in the open source version.

One Comment

  1. Reblogged this on Picking Up The Tabb and commented:
    Have you ever found yourself thinking “it doesn’t really matter what license I use with the source code I develop?” This post from Sean Eddy reveals the interesting story of code to accelerate local alignment of two biological sequences. In it, Michael Farrar made it possible to accelerate this core process of bioinformatics using “SIMD” parallel execution. He published his approach in 2007 and then suddenly died in 2010. His approach has only now been issued under a proper open source license. I think you will find Dr. Eddy’s tale quite enlightening!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s