CSE 519 - Data Science

Fall 2023

Data Science is a rapidly emerging discipline at the intersection of statistics, machine learning, data visualization, and mathematical modeling. This course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.

  • Course Time: 10:00-11:20AM Tuesday and Thursday
    Place: 104 Frey Hall
  • Steven Skiena's office hours are 11:30AM-1:00PM Tuesday-Thursday, and by appointment. You can also catch me right after class.
  • The course teaching assistants will be:
  • Videos and slides from my Fall 2020 lectures is available here. The best stuff should always be available at www.data-manual.com.
  • Sign up for the Piazza class discussion board at https://piazza.com/stonybrook/fall2023/cse519.
  • Syllabus
  • Lecture Schedule

    Textbook

    We will use my book The Data Science Design Manual, Springer-Verlag, 2017.The associated website www.data-manual.com points to many resources, including lecture notes/videos, errata, a problem solution Wiki, and sample Python notebooks for generating figures from the book.

    I will welcome feedback on the book. Please keep track of errata in the book send them to me, ideally in one batch at the end of the semester.

    Homework Assignments

    Lecture Notes

    I will give about 25 formal lectures this semester. All classes will be recorded by Zoom and made available on Blackboard.

    Ritika Nevatia made lecture notes she took in class one year available to all interested students. You may check them out if you wish.

    Old lecture notes are available from the previous offering in Fall 2014.

    Short Course on Computational Social Science

    I taught a minicourse on machine learning and NLP for social scientists at the European University Institute (EUI) in Florence, Italy in November 2022. This course was largely (but not completely) based on my slides from CSE 519. I give my lecture slides from this course below.

    Short Course on Word and Graph Embeddings

    I taught a minicourse on Word and Graph Embeddings at BigDat 2023 on Gran Canary in Spain's Canary Islands. I give the links to my lecture videos and slides below.

    Semester Projects

    Roughly half of the course grade will come from a course project. Students will typically work in small groups (2-3 people) on independent research projects. I will distribute a list of possible projects about six weeks into the semester.

    Recommended Readings

    The field of data science is still emerging, but there are several books which it will be useful to read and consult:

    Videos: The Quant Shop

    The Quant Shop is a series of eight 30 minute programs on Data Science, which are a product of the Fall 2014 offering of this course. Watch them for inspiration at the Quant Shop Vimeo channel.

    Related Links

    Professor

    Steven S. Skiena
    251 New Computer Science Building
    Department of Computer Science
    Stony Brook University
    Stony Brook, NY 11794-2424, USA
    skiena@cs.stonybrook.edu
    631-632-9026