Course Syllabus

Course github: https://github.com/cs109/2018-cs109b. Lab material and Advanced Section material is there. We recommend cloning the repository in your local machine. Links to this content are also in Modules.
Course Calendar

Spring 2018

Instructors

Pavlos Protopapas (Computer Science), Mark Glickman (Statistics)

Welcome to Data Science 2 (DS2)! The course is listed as CS109b, STAT121b, and AC209b, and offered through the Harvard University Extension School as distance education course CSCI E-109b. 

The requirements for these four labelings of the course are the same, except that for students registered for AC209b there may be additional work.

What is this class about?

Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, basic Bayesian methods, nonlinear statistical models, unsupervised learning, and topic models. The final module will consist of multiple deep learning subjects such as CNNs, RNNs and Autoencoders. The major programming languages used will be R and Python.

Prerequisites

This course can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or CSCI E-109a. Students who have previously taken CS 109, AC 209, Stat 121, or CSCI E-109 cannot take this class for credit.

Recommended Textbooks

ISLR.jpg

ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer:  New York, 2013)


DLBook.jpg

DL: Deep Learning by Goodfellow, Bengio and Courville.

Free electronic versions are available (ISLR, DL) or hardcopy through Amazon (ISLR, DL).

Course Components

Lectures

The class meets twice a week for lectures. Attending lectures is a crucial component of learning the material presented in this course. At the end of each lecture we will ask you to take a short graded quiz on the material presented in class. 

There will be live video feed only for distance education students for lectures, labs, and advanced sections.

Recordings for all other students will be available within 24 hrs.

Labs

Lectures are supplemented by weekly programming labs. Labs are an important aspect of the course, as we will supplement material from lectures with examples, discuss programming environments (e.g., R), and teach you important skills.

Labs will be live-streamed to distance education students.

Midterm

One midterm exam will cover material from lectures, assigned readings, labs, and homework assignments. If you do not keep up with the readings, come to lecture, and complete the homework and labs you will be at a severe disadvantage during the midterm. The midterm will be timed open-book take-home exam.

Project

Towards the middle of the course you will work on a project. The goal of the project is to have a complete end to end data science process encompassing both semesters of subject material while working as a 3-4 person team. We will supply a small set of project choices. Teams may propose a different project with sufficient notice and will be subject to approval by the course staff.

Students can form their own teams and use Piazza to find prospective team members. If a student can’t find a partner, we will team you up randomly. We recognize that individual schedules, different time zones, preferences, and other constraints might limit the student’s ability to work in a team. If this the case, ask us for permission to work alone.

Students in 109b can be part of a team of 209b students. The entire team will be evaluated to 209b standards. Teams can be a mix of DCE and non-DCE students.

Reading Assignments

The course schedule may include readings in the course textbooks. The goal of the reading assignments is to prepare for class, to familiarize yourself with new terminology and definitions, and to determine which part of the subject needs more attention. The homework assignments may contain questions about these readings. When answering questions about the reading material please be brief and to the point!

Grading

This course can be taken for a letter grade only -- there is no pass/fail option. The course grade comprises:

  • Homework Assignments (45%)
  • One Midterm (20%)
  • Project (25%)
  • Quizzes (10%) of which you can drop 40%

Any concerns about grading errors must be noted in writing and submitted to the Helpline within 3 days of receiving your grade.

Homework Grading

The homework provides an opportunity to learn advanced data science skills and to bolster your understanding of the material. See the homework as an opportunity to learn, and not to “earn points.” The homework will be graded to reflect this objective. You have the option to submit in pairs by making Groups in Canvas. More details in each assignment.

Project Peer Assessment

In the professional world, three important features affect your productivity and success: your own effort, the effort of people you depend on, and the way you work together. For this reason we have chosen a team-based approach that values all three of those features. During the team-based project you will provide an assessment of the contributions of the members of your team, including yourself. Your teammates’ assessment of your contributions and the accuracy of your self-assessment will be considered as part of your project grade.

Course Policies

No Late Policy

Homework assignments will be posted on the website on Wednesdays and will be due the following Wednesday (listed in the course schedule). No homework assignments or project milestones will be accepted for credit after the deadline. If you have a verifiable medical condition or other special circumstances that interfere with your coursework please let us know as soon as possible by sending an email to the Helpline.

Collaboration Policy

We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board.

The midterm must be completed entirely on your own, and may not be discussed with anybody else!

Homework may be completed in collaboration with at most one other student for part or all of the homework. If you are submitting the same homework you are required to do so in a group. Homework may not be divided up - all collaborators must work on all problems. If you worked with someone but submitted different papers, please include the name in the Comments (details to follow)

You are expected to be intellectually honest and give credit where credit is due. In particular:

  • you have to write your solutions entirely on your own or with your collaborator;
  • you should not view any written materials or code created by anyone else for the assignment;
  • you should list your collaborator.
  • you may not provide or make available solutions to individuals who take or may take this course in the future.

If the assignment allows it you may use third-party libraries and example code, so long as the material is available to all students in the class and you give proper attribution. Do not remove any original copyright notices and headers.

Accessibility

Any student receiving accommodations through the Accessible Education Office should present their AEO letter to the Head TF as soon as possible. Failure to do so may prevent us from making appropriate arrangements.

Course Resources

Online Materials

All course materials, including handouts, slides, problem sets, and midterms, will be posted on Canvas.

Discussion Forum

We use Piazza as our discussion forum. All official announcements via Canvas so make sure you have your Canvas notifications turned on. Piazza should always be your first resource for seeking answers to your questions. 

Getting Help

For questions about homework, course content, package installation, and after you have tried to troubleshoot yourselves, the process to get help is:
1. Post the question in Piazza and hopefully your peers will answer. Note that in Piazza questions are visible to everyone. You can also post privately so that only the staff sees your message.
2. Go to TF Office Hours.
3. For admin issues and requests for regrades and extensions send an email to the Helpline:
FAS student Helpline: cs109b2018@gmail.com
DCE student Helpline: cse109b2018@gmail.com
4. For personal matters send an email to either or both of the instructors.

Instructor Office Hours

Pavlos: Monday 3-4pm MD G109 

Mark: By appointment

Staff Office Hours

Most TF Office Hours will available in person and concurrently online via Zoom for distance education students and local students. Office hour times and locations will be listed in Canvas. Office hours provide you with an opportunity to review and discuss course material as well as provide further guidance for your homework directly with your teaching fellow with maybe a handful of classmates present.

Credits

Some of the material in this course is based on other classes. We have also heavily drawn on materials and examples found online and tried our best to give credit by linking to the original source. Please contact us if you find materials where the credit is missing or that you would rather have removed.

Course Summary:

Date Details Due