Joint NSLS-Biology talk: Machine Learning techniques applied to peptide screening

Go Back to Talks and Events

Events Summary

Elaine DiMasi of the NSLS and Sean R. McCorkle are hosting this talk by computer scientist Burr Settles on his use of machine-learning techniques for massive peptide screening.
Time: Monday, May 4 at 10 AM. Place: Biology Seminar Room.

Abstract

The speaker will be here all that day - if you'd like to meet with him before or after the talk, drop a line to mccorkle@bnl.gov or dimasi@bnl.gov

Adaptive High-Throughput Peptide Screening and Analysis via Active Learning Burr Settles, Dept. of Computer Sciences, University of Wisconsin-Madison.

Modern high-throughput technologies allow biomedical researchers to take thousands of experimental easurements at once. Typically, these come from fixed, pre-defined libraries; however, recent advances enable scientists to specify which measurements to take in a given high-throughput experiment. For example, enzymologists can now affordably synthesize libraries of arbitrary peptides and measure their binding affinities with a particular protein of interest. While this gives researchers more flexibility, it introduces the problem of how to design such libraries. Consider that there are over 10 trillion possible peptide 10mers, while only a few hundred or thousand may be reasonably tested.

We propose a novel machine learning approach not only to the analysis and interpretation of high-throughput screens, but also the design of the libraries themselves via "active learning." In active learning, a statistical model is allowed to choose the data from which it learns---in this case peptide sequences and their associated binding affinities---and the goal is to minimize the amount of data required to learn a model of high accuracy. While most work in active learning assumes the model can select each nstance and re-train sequentially, these high-throughput screens require the model to select hundreds of peptides at once, i.e., in "batches."

This talk describes ongoing work into understanding the binding behavior of human sirtuin proteins, which have been implicated in transcription regulation, aging, and diabetes. I will discuess "batch active learning" algorithms designed for machine learning models which, based on preliminary work, are both (i) accurate for this task and (ii) easily interpretable by biological researchers.

 

Department of Computer Science • Stony Brook University, Stony Brook, NY 11794-4400 • 631-632-8470 or 631-632-8471