Course Syllabus

Class will meet in Lyman 425, 10:30-11:45 am MWF.

Teaching staff

Instructor: Vinothan N. (Vinny) Manoharan
Office: https://harvard.zoom.us/my/vinny
Telephone: 617-495-3763
email: vnm@seas.harvard.edu
Office Hours: 3:00-4:00 pm on Mondays (unless otherwise posted), or by appointment

Teaching Fellow: Solomon Barkley
Office: Main class zoom meeting
Telephone:
email: barkley@g.harvard.edu
Office Hours: 12:00-1:00 pm on Fridays, or by appointment

Course aims

This is a course about data in physics experiments, and, in particular, what you do with it after you acquire it. That includes parsing it, visualizing it, and, most importantly, drawing conclusions from it. Most physics courses start from general physical laws (for example, Maxwell's equations) and derive specific predictions from them. That process is called deductive inference. But as a PhD student you are expected to contribute to the discovery of new physical laws. This course aims to teach you the techniques for reasoning from the data to determine the validity of a particular theory or model, or to determine the most likely value of a parameter (for example, the percentage of dark matter in the universe) for a given model. This process is called statistical inference. It is fundamentally different from deductive inference but just as important, and all experimentalists need to be familiar with it.

Doing statistical inference on modern data sets requires a computer and tools more powerful than a spreadsheet. This course therefore covers not only statistical methods, but also the methods of dealing with data on the computer—including loading, filtering, plotting, visualizing, and simulating it. We'll do everything in Python, because it is a general-purpose language, it is easy to learn, and it has powerful tools for data analysis. It's also free.

This course assumes nothing about your ability to program. We will start from the very basics and build up to advanced calculations.

Learning objectives

The main objective is to prepare you for research. By the end of the course, you should be both competent and confident in using the tools of statistical inference to analyze experimental results and derive conclusions from them. You should also be able to critically analyze published results that rely on either frequentist or Bayesian analysis. The frequentist viewpoint is favored in most particle physics experiments, while the Bayesian viewpoint is nowadays commonly used in biophysics and astrophysics. As a practicing physicist, you need to understand both.

Modern statistical inference relies heavily on computation. By the end of the course, you should be able to program proficiently in Python and follow good programming practices, including vector-based computation, modular code, and revision control. Through the final project, you'll become familiar with tools for collaborating on code, and you'll learn how to write well-documented code that can be easily shared with others.

Another objective is for you to become familiar with the types of data and data analysis used in other subfields. To this end, many of our classes will include discussions, so that you can learn from your classmates. Participation is therefore essential to your learning in this course.

Is this course for you?

If you are an experimental physicist who is familiar with the process of obtaining experimental data—including designing experiments to minimize systematic error, doing experiments, and estimating uncertainties—then yes, this course is for you.

If, however, you do not have any background in doing experiments, then the course is probably not for you. I would argue that one should first learn how to do experiments before learning how to analyze the data from them. All the fancy data analysis methods in the world won't help you if you cannot critically evaluate the methods used to obtain the data.

More specific advice: If you are

  • An experimentalist with good background in numerical and computational techniques: You might find the early part of the course slow-going, in which case you might prefer to take a course such as ENG-SCI 255 or APMTH 207. Both of these courses deal with statistical inference, though in different contexts (not necessarily physics). Both also assume that students come in with experience with programming. Another course with a statistical inference component (in a biological context) is MCB198/AM215.
  • A theorist: If you already have significant experimental experience, you'll find it useful. Otherwise I would recommend that you take a laboratory course such as PHYSICS 191R or PHYSICS 247R first.
  • An undergraduate: as above, if you have experimental experience (in a research context), then yes. If you are not doing research, then no.
  • Interested in data science: Our approach is different from that of data science, in that we are generally testing mechanistic models or theories. Students interested in data science might want to take APCOMP 209.

Outline of topics

List subject to change:

  • Introduction to Bayesian and frequentist inference
  • Bayes' theorem and how to apply it
  • Bayesian parameter estimation and hypothesis testing
  • Frequentist parameter estimation and hypothesis testing
  • The maximum-entropy approach
  • Linear and nonlinear model fitting
  • Markov-chain Monte Carlo methods
  • Time-series analysis

If time permits, we might also discuss the following:

  • Causal inference
  • Hierarchical Bayesian models
  • Machine learning and physics

Textbook

There are two textbooks. Both should be available at the COOP. Both can also be purchased as eBooks:

  1. Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support, by Phil Gregory (Cambridge University Press). See also errata for the paperback edition and errata for the original printing. Note also that you can get the eBook version through the Harvard Library (http://dx.doi.org.ezp-prod1.hul.harvard.edu/10.1017/CBO9780511791277).
  2. A Student's Guide To Python for Physical Modeling, by Jesse M. Kinder and Philip Nelson (Princeton University Press). Note: please use the 2018 edition, second printing (or more recent) and not the 2015 edition. See also the book webpage for errata, examples, and updates.

Other sources, which will be placed on reserve at Cabot library, include

  • Statistics for Nuclear and Particle Physicists, by Louis Lyons (Cambridge University Press)
  • Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, by R.J. Barlow (Wiley)
  • Statistical Data Analysis, by Glen Cowan (Oxford University Press)
  • Data Analysis: A Bayesian Tutorial, by D.S. Sivia with J. Skilling (Oxford University Press)
  • Causality: Models, Reasoning, and Inference, by Judea Pearl (2nd ed, Cambridge University Press); available online through the library

Also, the book Effective Computation in Physics, by Kathryn D. Huff and Anthony Scopatz, might prove helpful for learning more advanced Python programming techniques and coding practices. It is available online through the library.

Assignments and grading

Homeworks: Homeworks are assigned weekly and due on Wednesdays. These assignments will involve coding and inference. There will also be some short assignments that consist of brief presentations or peer reviews of code. Extensions on homeworks are at the discretion of the TF.

Participation: Attendance, participation, and discussion are essential to this course. Please note that class begins at 10:30 am sharp.

Project: During the second half of the course, you will do a final project involving the analysis of actual data (either obtained by you or available elsewhere). Ideally, the project is ambitious enough that it could eventually lead to a publication, but not so ambitious that it will take you more than a month to do it. The project will be structured so that you will get feedback at each step.

Homeworks will count toward approximately 40% of your grade, participation 20%, and the final project 40% (values subject to change).

Section

I will usually lecture on Mondays and Wednesdays. Fridays are reserved for section, with a couple of exceptions for make-up lectures (these will be announced).  Whereas the lecture component will give background and information on the analysis techniques, section will cover implementation. We will focus on good programming practices and learning to use the most recent Python tools. You should bring your computer to section.

Course Summary:

Date Details Due