Course Syllabus

What is this Course?

The aim of this course is to help students develops skills for computational research with focus on stochastic approaches, emphasizing implementation and examples. Stochastic methods make it feasible to tackle very diverse problems when the solution space is too large to explore systematically, or when microscopic rules are known, but not the macroscopic behavior of a complex system. Methods are illustrated with examples from a wide variety of fields, like demography, health-care, and finance. We tackle Bayesian methods of data analysis as well as various stochastic optimization methods. Topics include stochastic optimization such as stochastic gradient descent (SGD) and simulated annealing, Bayesian data analysis, Markov chain Monte Carlo (MCMC), and variational analysis. In this course we also study the broader social impact of statistical models and algorithms when deployed in real-life downstream applications. While the technical content of this course connects theory with implementation/engineering, students are also required to connect the technical materials to downstream tasks, especially focusing on assessing potential negative real-life impacts. 

Is this course right for me?

Students are expected to be proficient in python programming and have working familiarity with linear algebra, multivariable calculus and calculus-based statistics. It is also highly recommended that students complete an introductory sequence in data science before enrolling (for example, in Homework 0  students are asked to analytically derive various distributions associated with Bayesian multi-linear regression models).  

The breadth of materials covered and the mathematical maturity required by the materials makes AM207 a very time consuming course. In the past, students have reported to spending 15 to 20+ hours per week outside of class on class related activities (assignments, projects and readings).

Students who enroll in AM207 have traditionally been diverse in background -- it is entirely possible to succeed in the course with gaps in your technical preparation! However, be aware that this is a course that is both demanding in time and attention, students who lack the suggested preparations typically invest many extra hours in order to pick up skills on the fly. Students considering AM207 are encouraged to gauge their preparedness as well as their ability to commit sufficient time for this course.

You can complete the skills check exercise below to assess your preparedness:

AM207_skillcheck.ipynb 

Skill_check_Data.txt 

Learning Outcomes

After successful completion of this course, you will be able to:

  1. Build basic Bayesian and non-Bayesian statistical models for continuous, ordinal, categorical and sequential data
  2. Learn point estimates of model parameters using stochastic optimization methods
  3. Perform inference on models using sampling methods as well as variational inference approaches
  4. Evaluate the effectiveness of your inference methods
  5. Evaluate the usefulness/appropriateness of your models
  6. Implement inference methods from scratch in python
  7. Build statistical models and perform inference using python libraries
  8. Think broadly and critically about the entire modeling pipeline: from data collection to assessing broader downstream impact

General Information:

This course follows a flipped classroom structure. The lectures are pre-recorded and made available at the beginning of each week. Students are expected to watch the relevant lecture videos and study the lecture materials before the class meeting. For each video there is a concept quiz for you to check your understanding - the quizzes will be graded holistically.

For each lecture, there are two alternate time slots: TTh 09:45 AM - 11:00 AM and TTh 2:15 PM - 3:30 PM. FAS students should register for only one time slot. Extension students should only attend one time slot. 

Note that the fall DCE policy is that Extension students are not allowed on campus due to vaccination restrictions. Extension students are welcome to attend class meetings and office hours via Zoom.

Each class meeting will consists of 1) a discussion portion where students discuss the materials that they had studied, and 2) a practical exercise portion where students work in small teams on a coding or qualitative analysis exercise applying the concepts from lecture/readings to a small example. Students are expected to actively participate in both the class discussion as well as the practical exercise.

You will not be able to complete the exercise if you do not study the lecture videos and materials before class!

The in-class practical exercises will be collected at the end of each class and graded for effort. 

There will be 9 weekly individual assignments and a team project. All assignments (including the project) will emphasize both the mastery of theoretical concepts as well as python implementation. There will be a Canvas website for this course, assignments, lecture notes and all course related information/announcements will be posted online. Regular class attendance and participation is essential for this course and is expected of all FAS students. 

Extension students who attend live class meetings via Zoom have the option of completing in-class exercises with a group. Otherwise they are not expected to submit in-class exercises.

Schedule:

Course schedule

Course Materials:

You recommended to get the textbook Bayesian Data Analysis by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin (3rd Edition). We will be using this text as a reference for statistical modeling but the course will not follow the content or structure of the text faithfully! In addition to readings from the text book, relevant reference papers will be recommended for topics. To complete the assignments, you must either install (on your own machine) Jupyter Notebook with python3.7 or familiarize yourself with Deepnote.

Grading: 

Activities Percentage of Final Grade
Concept Quizzes  5%
Homework #0-8  60%
In-class Participation

 10%

(Optional for Extension students taking the Class on-demand)

Project  25%

Your lowest homework grade will be down-weighted by half. 

Homework:

Homework will be assigned weekly. You are welcome to seek help on the individual homework assignments from other students, your TFs and your instructor. While collaboration is encouraged, copy is strictly forbidden. Submissions that are highly similar will be flagged and all such submission may be returned ungraded.

Late submission policy: Each student is allowed 3 late days over the semester to be applied to any one or two homework. Outside of these allotted late days, late homework will not be accepted. Homework, like all assignments in this class, will be graded for correctness as well as clarity of exposition and presentation (a “right” answer by itself without an explanation or is presented with a difficult to follow format will receive no credit).

Project:

During the semester, you will work on a project reading, understanding and implementing a model or inference method from an staff approved research paper. The deliverable is a Jupyter notebook tutorial containing a summary of the main ideas of the paper (with concrete pedagogical examples) and code implementing the main methods of the paper. You must work in a team of size 2 to 4 people.

Extension students are welcome to form project teams with Extension or FAS students. 

 

Expectations and Policies:

Respect for Diversity: It is the mission of the teaching staff that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. We aim to create a learning environment that is inclusive and respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions for how to better our classroom community are always encouraged and appreciated.

As a large part of this course requires students to work in groups, in alignment with our teaching mission, we ask that students explicitly reflect on and implement practices for building teams that are diverse along many axes. Students who enroll in AM207 traditionally come from a wide range of technical, cultural and other demographic backgrounds, we hope that each student group can benefit from these diverse perspectives and experiences. The teaching staff is happy to help you brainstorm how to create an inclusive and productive working culture for your team.

 

Accessibility: The Extension School is committed to providing an accessible academic community. The Accessibility Office offers a variety of accommo- dations and services to students with documented disabilities. Please visit

https://www.extension.harvard.edu/resources-policies/resources/disability-services-accessibility

for more information.


Academic Integrity/Honesty: You are responsible for understanding Harvard Extension School policies on academic integrity,

https://www.extension.harvard.edu/resources-policies/student-conduct/academic-integrity,

and how to use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, submitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold academic integrity. To support your learning about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism,

https://www.extension.harvard.edu/resources-policies/resources/tips-avoid-plagiarism,

where you’ll find links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of academic citation policy. The tutorials are anonymous open-learning tools.

 

Help for the Course:

Office Hours & Office Hour Policy:

There will be two weekly virtual instructor office hours for the course. Please feel free to take full advantage of my office hours. If you wish to meet with me outside of office hours please contact me via email or speak to me in person. 

In addition to instructor office hours, there will be at least one virtual TF office hours each day, Friday through Wednesday.

Overall, there will be at least two virtual office hours per day (except for on Thursdays). 

The office hours are themed, focusing on different aspects of the homework completion process, with earlier office hours devoted to understanding background concepts, setting up problems and office hours closer to the due date devoted to interpretation, broader impact analysis:

1. Friday OHs: focus on understanding materials from the week

2. Saturday & Sunday OHs: focus on background concepts and homework problem setup

3. Monday & Tuesday OHs: focus on trouble-shooting and interpretation

4. Wednesday OHs: focus on interpretation and broader impact analysis

Questions that are not within the scope of the focus of the OH will be given lower priority and will be answered only as time allows (e.g. questions about how to set-up homework problems will be given low priority during Wednesday office hours). 

To maximize the benefit you get form office hours, we are requiring students to submit their questions (anonymously if you so wish) prior to each office hour. The staff can then structure their answers in the most productive and pedagogical way that they can see. For this reason, office hours are not drop-in. Students should arrive on time at the beginning of each office hour, drop-in questions and questions that were not previously submitted on Piazza will be given lower priority and answered only as time allows.

Piazza & Piazza Policy:

There will be a course Piazza, where students are encouraged to discuss their questions and ideas about the course material. Discussions will be moderated by the teaching staff, but the staff is not in charge of answering Piazza questions! To get your questions answered by the teaching staff you must attend office hours or make a separate appointment.

Email Policy:

The best way to get help directly from the teaching staff is through office hours and individual appointments. Due to the large size of the class, we ask that students please do not email staff with content or specific grading questions - technical questions regarding class materials, homework assignments, projects or questions about grading details (e.g. "what does lambda mean in Question 2 of Problem 1?", "why did I get 0.5 points deducted here?"). I welcome urgent questions about class policy, grading and logistics (e.g. absences, catastrophic tech failures, your TF has accidentally given you a zero for an assignment you submitted etc), these questions can be directly submitted via email to weiweipan@g.harvard.edu.

Grading Questions:

The grading scheme for every component of the course is formative, that is, your grade is not the result of adding up a bunch of little points that can be deducted for minor mistakes/omissions. Rather, you should treat your grade like a general signal beacon (what did you do well on, where can you improve). As such, grades are not appealable. In fact, it is actually not worth your time and generally not beneficial to your final course grade to argue for every point. If you feel, however, that a significant grading mistake has been made on an assignment you can request to having the grading TF reevaluate your work. Such requests should be sent directly to weiweipan@g.harvard.edu.

 

Course Summary:

Date Details Due