Course Syllabus

Picture of Harvard Summer Shield Logo

 Syllabus

CS S109A

Introduction to Data Science

Syllabus ā€“ Summer 2018

Pavlos Protopapas and Kevin Rader 

Lectures: Northwest Science Building B108. Mondays & Wednesdays 12:00 PM - 03:00 PM.

Labs: Northwest Science Building B108. Fridays 12:00 PM - 03:00 PM.

Official Course Github Site

Welcome to  S109A, Introduction to Data Science. This course is the first half of a one-year introduction to data science. The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods.

The material of the course is divided 3 modules. Each module will integrate the five key facets of an investigation using data:

  1. data collection - data wrangling, cleaning, and sampling to get a suitable data set;
  2. data management - accessing data quickly and reliably;
  3. exploratory data analysis - generating hypotheses and building intuition;
  4. prediction, statistical learning, and inference; and
  5. communication - summarizing results through visualization, stories, and interpretable summaries.

Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS S109A for credit.

Course Logistics

Prerequisites

You are expected to have programming experience at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended). HW0 is designed to test your knowledge on the prerequisites. Successful completion of this assignment will show that this course is suitable for you. HW0 will not be graded but you are required to submit.

Course Components

Lectures

The class consists of two weekly lectures and one lab, which is designed as a class activity. They are held Mon and Wed 12-3pm in Northwest Science Building B108, live-stream feed and taped version will also be available (videotaped will be available within 24 hours) . We will have quizzes after each lecture is released online to assess and challenge your understanding of the material and to help us identify gaps.

Labs

Attendance to labs is optional but strongly encouraged.  Labs are designed as hands-on in-class activities. The instructor will go over practice problems similar to the homework problems and review difficult material. Labs will be held on Fri 12-3pm in Northwest Science Building B108.

Office Hours

On-campus OH will be at the Lobby of the IACS in Maxwell Dworkin, 33 Oxford Street, unless otherwise stated below. Online OH will be via Zoom at: https://harvard-dce.zoom.us/j/7607382317

Pavlos: Mondays 4:30-6:00pm MD G-109 [on-campus and online].

Kevin: Mondays 3-4:30pm (after class) [on-campus and online].

Patrick: Mondays 6-7pm [on campus and online].

Brandon: Mondays 8-9pm [online].

Richard: Tuesdays 5-6pm [on campus and online].

Sol: Tuesdays 6-7pm [online].

Nick: Thursdays 9-10am [on campus and online].

Evan: Thursdays 10-11am [online].

David: Fridays 3-4pm MD G-111 (after lab) [on campus and online].

Joe: Sundays 10-11am [on-campus and online].

Will: Sundays noon-1pm [on-campus and online].

Assignments

There will be an initial self-assessment homework called HW0 and 6 more graded weekly homework assignments. You will be working in Jupyter Notebooks which you can run on your own computer. HW0 will be published on June 15.

Quizzes

Quizzes will be taken at the end of class and the material will be based on what was discussed in lecture.  40% of the quizzes will be dropped from your grade.

Final Project

There will be a final group project (2-4 students) due Thurs, Aug 9.  Look at Project Guidelines for more details.

Recording

Lectures and labs will be live-streamed, and will be recorded and made available 24 hours later via Canvas.

 

Recommended Textbook

An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani.

The book is available here.  There will be assigned readings from the text leading up to each lecture:

Free electronic version: http://www-bcf.usc.edu/~gareth/ISL/ (Links to an external site).

HOLLIS: http://link.springer.com.ezp-prod1.hul.harvard.edu/book/10.1007%2F978-1-4614-7138-7

Amazon: https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370 (Links to an external site).

Course Policies

Getting Help

For questions about homework, course content, package installation, JupyterHub, and after you have tried to troubleshoot yourselves, the process to get help is:

  1. Post the question in Piazza and hopefully your peers will answer.  Note that in Piazza questions are visible to everyone.
  2. Go to Office Hours, this is the best way to get help.
  3. For private matters send an email to the Helpline: [cs109a2018summer@gmail.com]. The Helpline is monitored by all the teaching staff.
  4. For personal and private matters send an email to the instructors.

Questions on Graded Homework and Regrading Policy

We take great care in making sure all homework are graded properly. However if you feel that your assignment was not fairly graded you may:

  1. Contact the grader by emailing the helpline with subject line "Regrade HW1: Grader=johnsmith"  within 2 days.
  2. If still unhappy with the initial response, then submit a reason via email to the Helpline with subject line "Regrade HW1: Second request" within 2 days of receiving the initial response.  Note: once regrading is done, you may receive a grade that is higher or lower than the initial grade.

Late Day Policy

You are allowed up to 3 days of late homework submissions, maximum of 1 day on any single assignment, no questions asked. No homework will be submitted more than 24 hours late.  Solutions will be posted one day after the due date. Late homework submissions will not be accepted after 24 hours past the due date. If you exceed your 3 late days, 1 point (20%) will be deducted for late days after that. Late minutes count as a whole day, e.g. if you submit 30 minutes late, this will count as a 1 day.

Communication from Staff to Students

Class announcements and official communication from staff will be through Canvas. All homework and quizzes will be posted and submitted in Canvas.

MAKE SURE you have your settings set so you can receive emails from Canvas. No official communication or announcements will be done via Piazza.

Submitting an assignment

You are to work all homework in a Jupyter Notebook. When you are done, convert your notebook in a pdf and submit both the .ipynb file and the .pdf file. You can submit multiple times up to the deadline.

You are encouraged but not required to submit in pairs. We will be using the Groups function in Canvas to do this, details to be announced later.  One assignment will be completed individually without any collaboration with peers.

All assignments will due on Tuesdays at 11:59pm in Canvas and will be posted one week in advance.

Collaboration Policy

We encourage you to talk and discuss the assignments with your fellow students (and on Piazza), but you are not allowed to look at any other students assignment or code outside of your pair.  Discussion is encouraged, copying is not allowed.

Grading Guidelines

Homework will be graded based on 1) how correct your code is  (the Notebook cells should run, we are not troubleshooting code), 2) how you have interpreted the results - we want text not just code, it should be a report, and 3) how well you present the results. The scale is 1-5.

For more details, check out The CS109A Grade

Software

We will be using Jupyter Notebooks, Python 3 and various python modules. You can access the notebook viewer either in your own machine by installing the Anaconda platform (Links to an external site) which includes Jupyter/IPython as well all packages that will be required for the course, or by using the SEAS Jupyter Hub from Canvas. Details in class.

Grading Score

Your final score for the course will be computed using the following weights:

Paired Homeworks 40%   

Individual Homework 20%

Quizzes  15%

Project  25%

Total  100%  

 

Student Support Tips

Instructor Support Tips

 

Course Summary:

Date Details Due