E-PSCI 102: Data Analysis and Statistical Inference in the Earth and Environmental Sciences

EPS/ESE 102: 
Data Analysis and Statistical Inference in the Earth and Environmental Sciences

Course description:

A practice and application-oriented course covering statistical inference, hypothesis testing, regressions, Monte Carlo methods, analysis of variance (ANOVA), time series analysis, and data filtering and visualization.  We will also provide an introduction to machine learning (ML) methods and Bayesian analysis.  The course emphasizes hands-on learning using real data drawn from atmospheric and geophysical observations.  Students will take measurements using smartphone sensors and provided instruments to reinforce the lecture material and to complete two projects.  Coding will be conducted in R and Python. Syllabus download.

Instructors:                 Roger R. Fu                Email:  rogerfu@fas.harvard.edu        Office hours:  3-4 pm Wednesdays (https://harvard.zoom.us/j/95221949006?pwd=ZjhMUkFWM0xOZTIxVXVrRkZBdkdRdz09)

                                    Steven C. Wofsy       Email:  steven_wofsy@harvard.edu    Office hours:  By appt.

Teaching Fellow:        Alec Brenner            Email:  alecbrenner@g.harvard.edu     Office hours:  TBD

Meetings:                   Wednesday and Friday.  1:30-2:45 PM.

Prerequisites:              Mathematics at the level of Math 21a,b is preferred, although students with single variable calculus preparation are encouraged to contact the instructors.  No programming experience required. 

Grading:                      40% Problem sets

                                    35% Two data acquisition and analysis projects

                                    15% Short response questions (1 per class)

                                    10% Class participation

Late policy:                50% penalty for late assignments except with permission of instructor ahead of due date.   

                                   No assignments accepted after two weeks.

Text:                          The Statistical Sleuth, by Ramsey and Schafer – available as an e-book.

                                   We anticipate access to the book through Harvard College Library

===================================================================================

Class Calendar (subject to minor changes)

Date

Content

RRF

SCW

1/27/21

Introduction and overview. Part 1. Frequentist conception of statistics. Part 2.

x

x

1/29/21

Download lecture 1 here:   Statistical inference: Concepts of model, predictor, response, parameter, maximum likelihood estimate.

Lecture 1 short problem for 03 Feb. 2021

 

x

2/3/21

Linear systems: Deterministic and Stochastic (Markov Chains)

Lecture 2 short measurement and analysis (sound) for 05 Feb 2021

x

2/10/21

Hypothesis testing 1: T distribution, T test

x

 

2/12/21

Hypothesis testing 2: Central limit theorem; Kolmogorov-Smirnov test

x

2/17/21

Regressions 1: Ordinary least squares.  

Explanation of option 1 for midterm project: magnetic field mapping. 

GPS walking measurement scripts

x

 

2/19/21

Regression 2: Type II regressions (MLE and Major axis). Roll out of mid-term project option 1: Magnetic fields mapping and fitting

x

x

2/24/21

Regression 2: Type II regressions (York fit). Roll out of mid-term project option 1: Magnetic fields mapping and fitting

x

x

2/26/21

Regressions 3 and hypothesis testing with chi-squared distribution.  Roll out of mid-term project option 2: HazeL measurements of atmospheric particulates. 

x

x

3/3/21

Model selection, AIC/BIC, ANOVA, overfitting

 

x

3/5/21

Bootstrap resampling

x

 

3/10/21

Regularization

x

x

3/12/21

Markov Chain Monte Carlo: Example applications

Code: Integral  MCMC

 

x

3/17/21

Machine learning 1: Introduction to concepts and regression problem

x

 

3/19/21

Machine learning 2: Components of a CNN 1

x

3/24/21

Machine learning 3: Components of a CNN 2 x

3/26/21

Mid-term project presentations

4/2/21

Final Project roll out  

GHG and T data set. (intro)

x

x

4/7/21

Data conditioning: Filtering, smoothing, and interpolation:

Locally-weighted least squares (loess/lowess), penalized splines, Savitzky-Golay

Noisy signal data set     Next_Lecture_Problem    Savitsky_Golay coefficients 

Weekly data fom Mauna Loa    Monthly data from Mauna Loa

 

x

4/9/21

Time series 1: Autocorrelation in data

x

4/14/21

Time series 2: Moving average, red shifted noise

x

4/16/21

Data conditioning 1: Denoising

x

4/21/21

Time Series Examples: paleoclimate, spectral analysis

x

 

4/23/21

Bayesian inference 1; Bayes theorem, basic examples and intuition, Bayesian conception of parameters and probability

x

4/28/21

Bayesian inference 2; Choice of priors, limitations and when is Bayesian inference most appropriate

x

Course Summary:

Course Summary
Date Details Due