CSE 357, Fall 2023: Statistical Methods for Data Science

News:
08/15: Piazza course sign-up link
08/15: Our first lecture will be on Aug 28th (Mon) at 1:00pm in CS 2120.
08/15: Course website up.

CSE 357: Statistical Methods for Data Science
Fall 2023


When: Mon Fri, 1:00pm - 2:20pm
Where: CS 2120

Instructor: Anshul Gandhi
Instructor Office Hours: Mon Fri, 2:20pm - 3:20pm

Course TAs: Quan, Nikhil, Aaditya
TA Office Hours: Tues, 5-6pm, NCS 336

Course Description

This interdisciplinary course introduces the mathematical concepts required to interpret results and subsequently draw conclusions from data in an applied manner. The course presents different techniques for applied statistical inference and data analysis, including their implementation in Python, such as parameter and distribution estimators, hypothesis testing, Bayesian inference, and likelihood.

More informally, this 3-credit, undergraduate-level course covers probability and statistics topics required for data scientists to analyze and interpret data. The course will involve theoretical topics and some programming assignments. The course is targeted primarily for junior and senior undergraduate students who are comfortable with concepts relating to probability and are comfortable with basic programming. Undergraduates from Computer Science, Applied Mathematics and Statistics, and Electrical and Computer Engineering would be well suited for taking this class. Topics covered include Probability Theory, Random Variables, Stochastic Processes, Statistical Inference, Hypothesis Testing, and Regression. For more details, refer to the syllabus below.

The class is in-person, and is expected to be interactive and students are encouraged to participate in class discussions.

Grading will be on a curve, and will be based primarily on assignments and exams. For more details, refer to the section on grading below.

Prerequisites: C or higher in CSE 216 or CSE 260; AMS 310; CSE major. Comfort in probability theory and proficiency with Python (since programming assignments tasks will be in Python) will be helpful.

Learning Objectives:An understanding of core concepts of probability theory and standard statistical techniques. An understanding of random variables, distributions, and hypothesis testing. An ability to apply quantitative research methods (correlation and regression), and modern techniques of optimization and machine learning such as clustering and prediction.

Syllabus & Schedule

Date Topic Readings Notes
Aug 28 (Mon)
[Lec 01]
Course introduction, class logistics
Sep 01 (Fri)
[Lec 02]
Probability review - 1
  • Basics: sample space, outcomes, probability
  • Events: mutually exclusive, independent
  • Calculating probability: sets, counting, tree diagram
  • AoS 1.1 - 1.5
    MHB 3.1 - 3.4
    assignment 1 out
    Sep 04 (Mon) Labor Day observed No class
    Sep 08 (Fri)
    [Lec 03]
    Probability review - 2
  • Conditional probability
  • Law of total probability
  • Bayes' theorem
  • AoS 1.6, 1.7
    MHB 3.3 - 3.6
    Sep 11 (Mon)
    [Lec 04]
    Random variables - 1
  • Mean, Moments, Variance
  • pmf, pdf, cdf
  • Bernoulli(p)
  • Indicator RV
  • Binomial(n, p)
  • Geometric(p)
  • AoS 2.1 - 2.3, 3.1 - 3.4
    MHB 3.7 - 3.9
    Sep 15 (Fri)
    [Lec 05]
    Random variables - 2
  • Uniform(a, b)
  • Exponential(λ)
  • Normal(μ, σ2), and its several properties
  • AoS 2.4, 3.1 - 3.4
    MHB 3.7 - 3.9, 3.14.1
    assignment 1 due
    assignment 2 out
    Sep 18 (Mon)
    [Lec 06]
    Random variables - 3
  • Joint probability distribution
  • Linearity and product of expectation
  • Linearity of variance
  • AoS 2.5 - 2.7
    MHB 3.10, 3.13

    Sep 22 (Fri)
    [Lec 07]
    Probability inequalities
  • Weak Law of Large Numbers
  • Central Limit Theorem
  • AoS 4.1 - 4.2, 5.3 - 5.4
    MHB 3.14.2, 5.2
    Sep 25 (Mon)
    [Lec 08]
    Non-parametric inference - 1
  • Basics of inference
  • Empirical PMF
  • Sample mean
  • bias, se, MSE
  • AoS 6.1, 6.2, 6.3.1 assignment 2 due
    assignment 3 out
    Required pokemon.csv dataset for A3.
    Sep 29 (Fri)
    [Lec 09]
    Non-parametric inference - 2
  • Empirical Distribution Function (or eCDF)
  • Statistical Functionals
  • Plug-in estimator
  • AoS 6.3.1, 7.1 - 7.2 Python scripts:
    binomial, eCDF
    Oct 02 (Mon)
    [Lec 10]
    Confidence intervals
  • Percentiles, quantiles
  • Normal-based confidence intervals
  • AoS 6.3.2, 7.1
    Oct 06 (Fri)
    [Lec 11]
    Parametric inference - 1
  • Basics of parametric inference
  • Method of Moments Estimator (MME)
  • Properties of MME
  • AoS 6.3.1 - 6.3.2, 9.1 - 9.2 assignment 3 due
    Oct 09 (Mon) Fall Break No class
    Oct 13 (Fri)
    [Lec 12]
    Mid-term 1 review
    Oct 16 (Mon)
    [Lec 13]
    Python review (optional) examples, pokemon_with_sno
    Oct 20 (Fri) Mid-term 1
    Oct 23 (Mon)
    [Lec 14]
    Parametric inference - 2
  • Likelihood
  • Maximum Likelihood Estimator (MLE)
  • Properties of MLE
  • AoS 9.3 - 9.4, 9.6 assignment 4 out
    Required data: iris.csv, q7_b_X.csv, q7_b_Y.csv.
    Oct 27 (Fri)
    [Lec 15]
    Hypothesis testing - 1
  • Basics of hypothesis testing
  • The Wald test
  • AoS 10 - 10.1
    DSD 5.3 - 5.3.1
    Oct 30 (Mon)
    [Lec 16]
    Hypothesis testing - 2
  • Type I and Type II errors
  • The Wald test
  • AoS 10 - 10.1
    DSD 5.3.1
    Nov 03 (Fri)
    [Lec 17]
    M1 discussion
    Hypothesis testing - 3
  • Z-test
  • AoS 10.10.2
    DSD 5.3.2
    assignment 4 due
    assignment 5 out
    Required datasets: a5_q4.csv, height_female.csv, height_female_200.csv, height_male.csv.
    Nov 06 (Mon)
    [Lec 18]
    Hypothesis testing - 4
  • t-test
  • Kolmogorov-Smirnov test (KS test)
  • AoS 15.4, 10.2
    DSD 5.3.3, 5.5
    Nov 10 (Fri)
    [Lec 19]
    Hypothesis testing - 5
  • Kolmogorov-Smirnov test (KS test)
  • p-values
  • AoS 10.2, 10.5
    DSD 5.5
    Nov 13 (Mon)
    [Lec 20]
    Hypothesis testing - 6
  • p-values
  • Permutation test
  • AoS 3.3, 10.3 - 10.4
    DSD 2.3
    Nov 17 (Fri)
    [Lec 21]
    Hypothesis testing - 7
  • Pearson correlation coefficient
  • Chi-square test for independence
  • AoS 3.3, 10.3 - 10.4
    DSD 2.3
    assignment 5 due
    assignment 6 out
    Required datasets: sample_covid.csv, sample_football.csv.
    Nov 20 (Mon)
    [Lec 22]
    Hypothesis testing - 8
  • Chi-square test for independence
  • AoS 3.3, 10.3 - 10.4
    DSD 2.3
    Nov 24 (Fri) Thanksgiving break No class
    Nov 27 (Mon)
    [Lec 23]
    Regression - 1
  • Basics of Regression
  • Simple Linear Regression
  • AoS 13.1, 13.3 - 13.4
    DSD 9.1
    Dec 01 (Fri)
    [Lec 24]
    Regression - 2
  • Ordinary Least Squares
  • Multiple Linear Regression
  • AoS 13.5
    DSD 9.1
    Dec 04 (Mon)
    [Lec 25]
    Mid-term 2 review assignment 6 due on Dec 06
    Dec 08 (Fri)
    [Lec 26]
    Mid-term 2 review
    Dec 11 (Mon) Mid-term 2


    Resources

    Grading (tentative)

    Academic Integrity Statement

    Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty is required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Professions, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity/index.html.

    Critical Incident Management

    Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Student Conduct and Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn. Faculty in the HSC Schools and the School of Medicine are required to follow their school-specific procedures. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.

    Student Accessibility Support Center Statement

    If you have a physical, psychological, medical, or learning disability that may impact your course work, please contact the Student Accessibility Support Center, Stony Brook Union Suite 107, (631) 632-6748, or at sasc@stonybrook.edu. They will determine with you what accommodations are necessary and appropriate. All information and documentation is confidential.
     Please report any errors to the Instructor.