Course Information


Class Description

CSE 549 will cover commonly used machine learning algorithms and their applications to computational biology. The class is structured so that problem motivates the application of the methods. Problems are divided into sections according to corresponding types of data: sequence, matrix, graphs, and 3D structure. In each of the sections problems will be described, then example machine learning method used to solve the problem will be discussed. We will learn about entropy, relative entropy, and mutual information in the context of solving DNA-binding site identification; learn about mixture models in the context of finding nucleosome positions; learn about graph structure learning in context of gene network construction; learn about graph searching in context of biomolecule searching; learn about feature selection in biomarker discovery; and feature extraction in context of protein searching. The class will involve combination of book & slides to describe the problems and machine learning methods and paper reading to see how it is actually applied. There will be a midterm exam, a final exam, and a semester project of your choosing.

Instructor

Assistant Professor Sael Lee
Office: Academic Bldg. B422
Email: sael at sunykorea dot ac dot kr
Phone: +82 (32) 626-1215

Meeting Time

[lecture]Mo/We 15:30~16:50 Academic Bldg. B204

Office Hours

Office Hours: TBA (or send emails for appointments) at B422

Prerequisites

None

TextBook

Required: N.A.

Recommended:
Pattern Recognition and Machine Learning, 2007, C.M. Bishop
Information Theory, Inference and Learning Algorithms, D. MacKay
Elements of Information Theory. T.M. Cover, J.A.
Thomas Bioinformatics: Sequence and Genome Analysis, David W. Mount
Introduction to Bioinformatics, 2008, A. M Lesk

Grading

Midterm will be worth 30% of your grade.
Final Exam will be worth 30% of your grade.
Project will be worth 40% of your grade.

Project

The final project will not be limited to topics in computational biology, but will require you to apply methods and ideas that have been discussed in class. (proposal 10% + report 35% + presentation 5% = 50%)




Notice




Pdf version of this syllabus can be found here.




Course Materials




# DATE CONTENT READING SLIDES
1 2/27 Introduction: Whys and whats of computational biology Slide01
3/1 NO CLASS: Independence Movement Day
2 3/6 Defining the problem: Information Content in Biology and DNA Binding F. Fabris JIM 2009 Slide02
3/8 Method 1: Entropy, relative entropy and mutual information Ch.2 of Elements of Info. Theory Slide03
3 3/13 Method 1 Cont. : Entropy, relative entropy and mutual information Slide03
3/15 Example Solutions: DNA-binding site identification using information theory TD. Schneider Nano Commun Netw. 2010
Erill and O'Neill BMC Bioinformatics 2009
Slide04
4 3/20 Project description and QA
Defining the problem: Finding nucleosome positions
project doc
Jiang C, Pugh BF. Nat Rev Genet. 2009
Slide05; Slide06;
3/22 Method 2: Mixture models Slide07
5 3/27 Method 2: Mixture models Chapter 9 of PRML Slide08
3/29 Example Solutions: Finding nucleosome positions using mixture models Polishko et al. Bioinformatics. 2012 Slide09
6 4/3 Defining the problem: Biomarker discovery Slide11>
4/5 Method 3: Feature selection Chapter 3&7 PRML Slide12-13
7 4/10 Example Solutions: Biomarker discovery by feature selection Abeel et al. Bioinformatics. 2010 I. Guyon et al. JML 2002 Slide14
4/12 Defining the problem:Protein Structure and Dynamics Slide15
8 4/17 Midterm Exam Proposal Due
4/19 Method 4: Feature Extraction: PCA Slide15
9 4/24 Method 4: Feature Extraction: Kernel PCA Chapter 12 of PRML Slide16
4/26 Example Solutions: Protein Dynamics with Feature Extraction Bakan A, Bahar I. PNAS 2009 Slide18
10 5/1 Problem 5: Biological Network Analysis bio-net
5/3 NO CLASS: Buddha's Birthday
11 5/8 Method 5: Random Walk with Restart (RWR) Random Walk
5/10 Method 5 cont:Graph Kernels Graph Kernels
12 5/15 Application: Gene Function Prediction Graph applications
5/17 Problem 6:Genome Analysis
13 5/22 Method 6: Deep Learning
5/24 Method 6: Deep Learning http://www.heatmapping.org tutorial1http://www.heatmapping.org tutorial2
14 5/29 Applications 6.: Deep Applications DNN Applications
5/31 Application 6: DNN Applications 2
15 6/5 NO CLASS: Adjustment Day
6/7 Review
6/12 PROJECT PRESENTATION project report deadline
6/20 FINAL EXAM




Course Policy


Attendance policy

Everyone is strongly urged to attend class regularly and actively participate. You will be responsible for learning all the materials covered in class. Lecture slides and supplementary handouts will cover most of the material; however, in-class participation through engaging in discussions and asking questions should be valued learning activity.

The SUNY Korea Attendance Policy states "If a student has over 20% unexcused absence, the student's final course grade will be an 'F'."

Assignments grading policy

Assignment will be handed out in class and are due at the start of class of the due date. Legible handwritten copies of the assignments should be turned in.

Total points of each assignment will be different depending on the difficulty of the problems. However, the maximum total point of an assignment will be less than or equal to two times the minimum total point of an assignment. Expect to see difficult problems towards the end of semester.

I will drop the lowest grade from among your assignment scores. No late assignments will be accepted.

Project grading policy

You will be required to propose and execute a final project based on the contents we will learn in class. The class grading will be based on 10% of the content of the proposal, 25% on the final report, and 5% project presentation which add up to 60% of your grade. SUNY-SB Blackboard facility will be used for submissions. The Blackboard facility will mark your time of submission. It is your responsibility to check if the uploads are done properly and to check if you received a proper grade. Grades will be e-mailed to you individually in a timely fashion.

Academic misconduct policy

There is no excuse in cheating. Cheating will be considered as an academic misconduct and handled according to the Stony Brook regulations. If cheating has occurred during exam or is evident in submitted assignments, your will get a grade of F. Discussion of assignments is acceptable, however, returned assignments must show originality. This means near duplicate assignments with your peers or duplications of materials found on the web will be considered cheating. All involved personals in cheating will be penalized.




University Policy


Americans with Disabilities Act

If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC(Educational Communications Center) Building, Room 128, (631)632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential.Disability Support Services.

Academic Integrity

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty is required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty please refer to the academic judiciary website at Academic Judiciary

Critical Incident Management

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.