STAT 100: Introduction to Statistics and Data Science


Course goals:

In this course, you will learn how to think critically with data by engaging in the entire data analysis process. By the end of the course, you will have improved your data acumen and your ability to think statistically. More concretely, you will be better able to accomplish the following tasks, which have been broken down by steps of the data analysis workflow:

Data analysis workflow: question formulation to data acquisition to data wrangling to exploration and visualization to modeling and inference to communicating findings

 

Question formulation:

  • Translate a research problem into a set of questions that can be answered with data.
  • Formulate data questions as measurable statements about parameters in a model.

 

Data acquisition:

  • Determine the necessary data to conduct analyses.
  • Reflect on how design structures and data collection impact potential conclusions.
  • Identify potential ethical concerns surrounding data collection and data privacy.

 

Data wrangling:

  • Explore datasets to determine what wrangling may be required (e.g., removing missing values, filtering out variables or observations, collapsing categories of a categorical variable).
  • Apply basic data wrangling operations.

 

Exploration and visualization:

  • Understand key principles of designing and creating effective data visualizations.
  • Master creating graphs and drawing sound conclusions from graphs.
  • Compute and interpret summary statistics.

 

Modeling and inference:

  • Understand and be able to explain key probabilistic and inferential concepts, such as, sampling, variability, random variables, distributions, confidence, and significance.
  • Determine the correct model for a given problem and set of data.
  • Appropriately apply and draw inferences from a statistical model, including quantifying and interpreting the uncertainty in model estimates.
  • Consider the ethical implications of various modeling practices.

 

Communicating findings:

  • Develop a reproducible workflow using Quarto documents.
  • Interpret and communicate results of statistical analyses effectively for both a statistical and a non-statistical audience.
  • Be able to reflect on the data involved in an analysis and show a curiosity for other ways of examining and thinking about the data.

Course format:

This is a lecture based course with required discussion sections.  

Typical enrollees:

Students tend to take Stat 100 because they are looking for an introduction to data analysis.  Variation is a key concept in Stat 100 and it can be observed in the wide variety of students who take the course.  A recent Stat 100 class included students from across the class years (from first-years to graduate students) and studying 78 different programs and concentrations with the most common being Economics, Government, Molecular & Cellular Biology, Neuroscience, and Social Studies. 

When is course typically offered?

This course is typically offered every semester.

What can students expect from you as an instructor?

They can expect a well-structured course with lots of resources for support.  The Stat 100 teaching team includes a head instructor, preceptor, and many teaching fellows.  In addition to lecture and section, we offer both group and one-on-one office hours and weekly wrap-up sessions.

Assignments and grading:

Your grade will be based on your performance on the following key components of the course:

 

  • Weekly Quizzes: Each week you will take a short quiz, accessed on Canvas, related to the lecture material. 
    • The lowest quiz grade will be dropped.
    • No late quizzes will be accepted.
  • Problem Sets: Each week's problem set will be released on Wednesday and is due the following Tuesday by 5:00pm.
    • Don't wait until the day the problem set is due to work on it! They are designed to encourage consistent application and practice and are not structured to be completed in a single day.
    • Each problem set is equally weighted in the final grade.
    • To help with various circumstances (expected and unexpected), the lowest problem set grade will be dropped. Additionally, you have up to 4 additional extensions days that you can use as you need (e.g., 1 additional day for 4 p-set, 4 additional days for 1 p-set, ...) but the extension days must be rounded up to the nearest day (e.g., 2 extra hours = 1 extension day). If you need to use any extension days, message your Section TF the following information so that they can update Gradescope and our records
      • Which problem set you need to apply an extension to.
      •  How many days you want to use.
    • Once you have used up the 4 additional extension days, no further extensions will be granted except in the event of unexpected family circumstances or a long-term illness. In this case, please email Prof McConville and provide a note from your resident dean as documentation.
  • Exams: We will have a mid-term exam and a final exam.
    • Both exams will include an in-class component and an oral component.
  • Engagement:
    • In our class Slack workspace, you must submit at least 2 content related posts by March 3rd. These posts could be questions about the material, answers to questions, and/or links to useful resources.
    • You must drop by office hours at least once. 

Sample reading list:

Readings for the course will come from the following textbooks. Both are freely available online.

 

Enrollment cap, selection process, notification:

There is no enrollment cap for Fall 2024.

Past syllabus:

Here is the syllabus for the Spring 2024 offering of Stat 100.

Absence and late work policies:

Section attendance is required.  See the “Assignments and grading” section for the p-set late work policy.