CS 109A, STAT 121A, AC 209A
Introduction to Data Science
Pavlos Protopapas, Kevin A. Rader, Rahul Dave and Margo Levine
Git Repository for Lecture and Lab Material: https://github.com/cs109/a-2017
Class Time: Mon and Wed 1:00‐2:30 pm in Harvard's Northwest Building (NW), B-103
Labs: Thur 4:00-5:30 pm and Fri 10:00-11:30 am in Northwest (NW) Basement Lobby (content is identical, students may only attend one).
For Instructor Office Hours, TF Office Hours, and Sections see Home Page in Canvas.
Welcome to CS109a/STAT121a/AC209a, also offered by the DCE as CSCI E-109A, Introduction to Data Science. This course is the first half of a one‐year introduction to data science. The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods.
The material of the course is divided 3 modules. Each module will integrate the five key facets of an investigation using data:
- data collection ‐ data wrangling, cleaning, and sampling to get a suitable data set;
- data management ‐ accessing data quickly and reliably;
- exploratory data analysis – generating hypotheses and building intuition;
- prediction or statistical learning; and
- communication – summarizing results through visualization, stories, and interpretable summaries.
Only one of CS 109a, AC 209a, or Stat 121a can be taken for credit. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109a, AC 209a, or Stat 121a for credit.
You are expected to have programming experience at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended). HW0 is designed to test your knowledge on the prerequisites. Successful completion of this assignment will show that this course is suitable for you. HW0 will not be graded but you are required to submit.
The class consists of two weekly lectures and one lab, which is designed as a class activity. Attendance to lectures is mandatory. They are held Mon and Wed 1:00pm ‐ 2:30 pm in Northwest Building (NW), Lecture Hall B-103. We will have in class quizzes to assess your understanding of the material and to help us identify gaps.
Attendance to labs is optional but strongly encouraged. Labs are designed as hands-on in-class activities.. The instructor will go over practice problems similar to the homework problems and review difficult material.
Two lab sessions with identical content are held Thur 4:00-5:30 pm and Fri 10:00-11:30 am in NW Basement Lobby. You should plan to attend one of the two.
Lectures and labs are supplemented by 1 hour sections led by teaching fellows. There are two types of sections:
- a) Standard Sections: which will be a mix of review of material and practice problems similar to the HW. All 3 sessions are identical. The first on is on 9/11.
- b) Advanced Sections which will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods. The material covered in the Advanced Sections is required for all AC 209A students. For dates see the Course Calendar. We are offering 3 times at the moment but will adjust depending on attendance. All 3 sessions are identical. The first on is on 9/20.
There will be an initial self-assessment homework called HW0 and 8 more graded homework assignments. Some of them will be due in a week and some of them in two weeks. You will be working in Jupyter Notebooks which you can run in your own environment or in the SEAS JupyterHub cloud (accessed from Canvas).
Quizzes will be taken at the end of class and the material will be based on what was discussed in lecture.
40% of the quizzes will be dropped from your grade.
There will be one midterm (take-home) to be done individually (see Calendar for dates)
There will be a final group project (2-4 students) due during Exams period. More details to come in November. See Calendar for specific dates.
Lectures will be recorded and made available real time for DCE students and 24 hours later for in-campus students via Canvas.
Labs will also be videotaped only for distant students.
An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani.
The book is available here:
Free electronic version: http://www-bcf.usc.edu/~gareth/ISL/ (Links to an external site).
For questions about homework, course content, package installation, JupyterHub, and after you have tried to troubleshoot yourselves, the process to get help is:
- Post the question in Piazza and hopefully your peers will answer. Note that in Piazza questions are visible to everyone. The TFs monitor the posts but will respond no earlier than 24 hours from the posting time.
- Go to Office Hours, this is the best way to get help.
- For private matters send an email to the Helpline: firstname.lastname@example.org. The Helpline is monitored by all the TFs.
- For personal matters send an email to the instructors.
Questions on Graded Homework and Regrading Policy
We take great care in making sure all homework are graded properly. However if you feel that your assignment was not fairly graded you may:
- Contact the grader by emailing the helpline with subject line "Regrade HW1: Grader=johnSeitz" within 3 days.
- If still unhappy with the initial response, then submit a reason via email to the Helpline with subject line "Regrade HW1: Second request" within 3 days of receiving the initial response. Note: once regrading is done, you may receive a grade that is higher or lower than the initial grade.
Late Day Policy
You are allowed up to 6 days of late homework submissions, maximum of 2 days on any single assignment, no questions asked. No homework will be submitted more than 48 hours late. Solutions will be posted a week after the due date. Any other late homework submissions will not be accepted without a written note from UHS or your resident dean’s office. If you exceed your 6 late days, 1 point will be deducted for late days after that. Late minutes count as a whole day, e.g. if you submit 30 minutes late, this will count as a 1 day.
Communication from Staff to Students
Class announcements and official communication from staff will be through Canvas. All homework and quizzes will be posted and submitted in Canvas.
MAKE SURE you have your settings set so you can receive emails from Canvas. No official communication or announcements will be done via Piazza.
Submitting an assignment
Your final score for the course will be computed using the following weights:
Homework will be graded based on 1) how correct your code is (the Notebook cells should run, we are not troubleshooting code), 2) how you have interpreted the results - we want text not just code, it should be a report, and 3) how well you present the results. The scale is 1-5
We will be using Jupyter Notebooks, Python 3 and various python modules. You can access the notebook viewer either in your own machine by installing the Anaconda platform (Links to an external site) which includes Jupyter/IPython as well all packages that will be required for the course, or by using the SEAS Jupyter Hub from Canvas. Details in class.
Accommodations for students with disabilities
Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with Kevin by the end of the third week of the term: Friday, September 15. Failure to do so may result in us being unable to respond in a timely manner. All discussions will remain confidential.
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.