Abstract: Public archives of DNA sequencing data are now extremely large and growing rapidly. But they are very hard to use; the situation resembles that of the world wide web before we had accurate and blazing fast search engines. As a result, public datasets are underutilized, and the scientific endeavor suffers from the resulting lack of reuse and reproducibility.
I will describe work my lab and others have undertaken toward the goal of making it easy for researchers to use the vast stores of public genomic data available today. Primarily, I will describe tools and resources we have developed, including Rail-RNA, Snaptron and recount2, that altogether function as a cloud-based search engine for public RNA sequencing data. I will also share results from scientific endeavors enabled by this search engine. Finally, I will suggest future directions for how to make it easier to leverage public data in everyday life science research.
Bio: Ben Langmead is an Assistant Professor of Computer Science at Johns Hopkins University. His lab aims to create methods and software that enable scientists to easily analyze DNA sequencing data. He has released high-impact software tools (e.g. Bowtie, Bowtie 2) addressing common genomics research questions, and has also released scalable software tools (e.g. Myrna, Rail-RNA) and resources (Intropolis, recount2, Snaptron) that make it easier for scientists to pose scientific queries against large archives of sequencing data. The lab takes software seriously, always aiming to create usable, maintainable, well documented tools that live a long and healthy life in the community. He is the recipient of a Sloan Research Fellowship (2014), a National Science Foundation CAREER award (2014) and the Benjamin Franklin award for contributions to open access (2016).