Active EXT enrollment is now attached to the following site:
Instructors: Hanspeter Pfister (Computer Science), Mark Glickman (Statistics), Verena Kaynig-Fittkau (IACS)
Welcome to Data Science 2 (DS2)! The course is listed as CS109b, STAT121b, and AC209b, and offered through the Harvard University Extension School as distance education course CSCI E-109b. All lectures and labs will be recorded and the videos will be available for registered students on Canvas within 24 hours after meeting times.
The requirements for these four labelings of the course are the same, except that for students registered for AC209b, who will be receiving graduate-level credit, homeworks and the final project will be held to a higher standard.
What is this class about?
Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, basic Bayesian methods, nonlinear statistical models, unsupervised learning, and deep learning. The major programming languages used will be R and Python.
This course can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or CSCI E-109a. Students who have previously taken CS 109, AC 209, Stat 121, or CSCI E-109 cannot take this class for credit.
An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer: New York, 2013)
The class meets twice a week for lectures. Attending lectures is a crucial component of learning the material presented in this course. At the end of each lecture we will ask you to fill out and submit a one-minute paper to collect feedback.
Distance students can view a video of the lectures within 24 hrs. There will be no live video feed for distance students.
Lectures are supplemented by weekly optional programming labs. Labs are an important aspect of the course, as we will supplement material from lectures with examples, discuss programming environments (e.g., R), and teach you important skills. At the end of each lab we will ask you to fill out and submit a one-minute paper to collect feedback.
Distance students can view a video of the labs within 24 hrs. There will be no online labs for distance students.
Two midterm exams will cover material from lectures, assigned readings, labs, and homework assignments. If you do not keep up with the readings, come to lecture, and complete the homework and labs you will be at a severe disadvantage during the midterms. The midterms will be timed open-book take-home exams.
Towards the end of the course you will work on a month-long project in deep learning. The goal of the project is to train a deep learning network on a dataset that we will provide to you. You will train the network, assess its performance, debug and improve it, and communicate your results. Everybody will be working on the same project.
You will work closely with other classmates in a 3-4 person project team. You can come up with your own teams and use Piazza to find prospective team members. If you can’t find a partner we will team you up randomly. We recognize that individual schedules, different time zones, preferences, and other constraints might limit your ability to work in a team. If this the case, ask us for permission to work alone.
The homework is going to provide an opportunity to learn advanced data science skills and to bolster your understanding of the material. See the homework as an opportunity to learn, and not to “earn points.” The homework will be graded to reflect this objective.
The course schedule includes required readings in the course textbook. The goal of the reading assignments is to prepare for class, to familiarize yourself with new terminology and definitions, and to determine which part of the subject needs more attention. The homework assignments may contain questions about these readings. When answering questions about the reading material please be brief and to the point!
This course can be taken for a letter grade only, there is no pass/fail option. The course grade comprises:
- Two Midterms (20% each)
- Homework Assignments (40%)
- Project (20%)
There are several mandatory class meetings such as guest lectures, project demos, etc. Please check the schedule, plan accordingly, and do not miss these classes.
Any concerns about grading errors must be noted in writing and submitted to your TF within one week of receiving your grade.
Project Peer Assessment
In the professional world, three important features affect your productivity and success: your own effort, the effort of people you depend on, and the way you work together. For this reason we have chosen a team-based approach that values all three of those features. During the team-based project you will provide an assessment of the contributions of the members of your team, including yourself. Your teammates’ assessment of your contributions and the accuracy of your self-assessment will be considered as part of your project grade.
All assignment-related due dates are final. Homework assignments will be posted on the website on Mondays and will be due the following Monday (listed in the course schedule). No homework assignments or project milestones will be accepted for credit after the deadline. If you have a verifiable medical condition or other special circumstances that interfere with your coursework please let us know as soon as possible.
We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board.
The midterm must be completed entirely on your own, and may not be discussed with anybody else!
You may discuss your homework and labs with other people, but you are expected to be intellectually honest and give credit where credit is due. In particular:
- you have to write your solutions entirely on your own;
- you cannot share written materials or code with anyone else;
- you should not view any written materials or code created by anyone else for the assignment;
- you should list all your collaborators (everyone you discussed the assignment with) in your submission;
- you may not submit the same or similar work to this course that you have submitted or will submit to another; and
- you may not provide or make available solutions to individuals who take or may take this course in the future.
If the assignment allows it you may use third-party libraries and example code, so long as the material is available to all students in the class and you give proper attribution. Do not remove any original copyright notices and headers.
Any student receiving accommodations through the Accessible Education Office should present their AEO letter to the Head TF as soon as possible. Failure to do so may prevent us from making appropriate arrangements.
All course materials, including handouts, slides, and midterms, will be posted on Canvas.
We use Piazza as our discussion forum and for all announcements, so it is important that you are signed up as soon as possible. Piazza should always be your first resource for seeking answers to your questions. You can also post privately so that only the staff sees your message.
The staff will hold weekly office hours, either in person or via Zoom (or Skype) for distance education students. Office hour times and locations will be listed in Canvas. Office hours provide you with an opportunity to review and discuss course material as well as provide further guidance for your homework directly with your teaching fellow with maybe a handful of classmates present. Online students can make special arrangements directly with their assigned Teaching Fellows to meet on Zoom or Skype.
Some of the material in this course is based on other classes. We have also heavily drawn on materials and examples found online and tried our best to give credit by linking to the original source. Please contact us if you find materials where the credit is missing or that you would rather have removed.
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.