Course Syllabus

CS 222

CS 222 -- Algorithms at The Ends of the Wire

also CSCI E-210 (Extension School)

 

Welcome


Preliminary Syllabus for 2020


Instructor: Michael Mitzenmacher
E-mail: michaelm AT eecs.harvard.edu
Office: Maxwell Dworkin 331
Phone: 496-7172
Office Hours: TBD (subject to change, depending on conflicts). Or by appointment.

Teaching Assistant: TBD
E-mail: TBD
Office Hours: TBD

Syllabus: www.eecs.harvard.edu/~michaelm/CS222/syllabus.html

Handouts: www.eecs.harvard.edu/~michaelm/CS222/class.html

Objectives

This course is loosely based on two themes: connecting theory and practice, and how to deal with really big data (especially over networks). The topics change from offering to offering, and the below is subject to change. The course will consist of multiple independent units, covering the major themes of data sketches and streaming algorithms, information theory/coding algorithms, compression, and hashing-based algorithms generally.

Although the course will emphasize theoretical foundations, the course is meant to show the synthesis of theory and practice; we will often read pairs of papers, one from the "theory community" and one from the "systems community", on the same theme. The course is also meant to promote skills required of graduate students, such as criticial and creative reading and analysis of papers, and research.

The main work of the course will consist of the following: reading and analyzing a number of current and classic research papers; homework problems based on the material; participating in a mock program committee; and undertaking a final research project. (I have been told the workload of the class is reasonable but non-trivial. If you were looking for a graduate class with no work requirements, please look elsewhere.)

During the semester, you will frequently be reading two research papers to prepare for each class. This is more work than it sounds like! You must come to class prepared consistently; if your schedule will not permit that, you should not take the class.

Given the current non-physical nature of classes, we will be experimenting with different formats this year. There will be less emphasis on lectures and homework, and more empahsis on other activities, most notably the final project. The intention is to make recorded lectures from a previous year available.

As an example of how to deal with this format change, I am hoping to arrange several online "interviews" with colleagues who at the interface of theory and practice, to discuss how they see the connections between them and how they have worked to bring the areas together. I am hoping they will offer perspective on the benefits and challenges of this type of research. This may replace more in-class lecture discussion.

Another variation of format for the class is that we will run a "mock program committee". I will choose papers from the arxiv and from recent conferences thematically related to the course, and the class will act as a program committee to choose the best ones. This will give you an idea of how program committees work (or don't), and let you see some more up-to-date research, as class reading will be more focused on "classics". It seems a useful for exercise gaining an understanding how program committees function, which seems helpful to graduate students.

Finally, a major component of the class will be a final project, which you will work on throughout the entire course. The hope is that this project may form the foundation of either a research paper or, for undergraduates, a senior thesis. (In previous years some small number of projects do continue with additional work to become research papers. Expectations, however, are realistic; research is exploration, and this project is understood to be the beginning, not necessarily the end, of such an exploration.) Although you will need to obtain approval for your project choice, the topic of the final project will primarily be up to you. This project can either be theoretical or implementation based in nature. Generally people work in pairs for the final project, but you can work alone or, with permission, a group of three or larger. For graduate students, your project can be related to your main line of research; it should not, however, be something you were already working on.

Prerequisites

In my version of Harvard-speak, "prerequisite" means highly desirable background. You will decide if you are ready to take the course. Keep in mind that you will be responsible for reading (and being able to discuss and review) a large number of modern research papers, many of which will have some advanced mathematics (primarily probability related) included.

Students should have taken at least CS 124 or its equivalent. Students should be able to program in a standard programming language, at the level to independently produce working simulations of various processes. Knowledge of probability will be extremely helpful; if your probability background is weak you should expect to refresh your probability skills on your own. Generally, mathematics will be fundamental to the course, so you should expect to spend time learning some additional mathematics on your own if necessary. Similarly, some prior knowledge of networks and network issues will be very helpful. For students wishing to review important aspects of probability, there are many books available. Sheldon Ross has written several excellent introductory books which should be available in the library. My personal favorite is "Introduction to Probability Models." A more advanced book for those with more background is "Elements of Information Theory" by Cover and Thomas. Another good book is "Information Theory, Inference, and Learning Algorithms" by David Mackay, which has the benefit of being online: This link should work. Of course, my completely biased opinion is that the best book for a computer scientist to buy is by Mitzenmacher and Upfal, "Randomized Algorithms and Probabilistic Analysis." I'd recommend students with less background in probability get one (or more) of these books as a reference.

Assessment

Your performance will be measured in five ways. (The percentage contributions to your grade given below are approximate and subject to change.)

  • Problem sets (10%): There will be 2-3 short problem sets. Generally they are meant to ensure that introductory material and the major ideas are being absorbed. They will generally be due one to two weeks after they are given out. These sets will involve both mathematical and/or theoretical work and implementation work. These assignments are governed by the collaboration policy, given below.

     

  • Paper summaries (15%): You will also have to regularly turn in paper summaries in a form to be discussed for papers that we read during the semester. This year, I hope to use a discussion board format, where people will post their summaries and then comment on other summaries or build up larger discussion. The point of the summaries is really to ensure that you gain some understanding of the paper and concepts. You will be allowed to skip two (or possibly more, we'll see how the new format works) summaries of your choice during the course of the semester. Summaries will be due before the class in which the paper is discussed.

    Summaries are to be approximately 250-350 words (about 1 page). Longer is not better. Again, for summaries this year, we will be doing something different than previous years -- we will be putting them in a discussion board, and you will be expected to read some other summaries and use them as a guide to understanding the papers. These will serve as a basis for in-class discussion, which I expect to be done more in zoom subgroups among students than in a lecture-style format since we will not be physically together in lectures.

     

  • Class participation (5%): You will be expected to discuss readings, and solve problems based on the reading. At times you will have to work together in groups in class to answer questions posed.

     

  • PC participation (20%): You will be expected to write well-thought out reviews for papers you are assigned as part of the mock PC, and engage in discussions to choose the final selection of papers from the PC.

     

  • Final Project (50%): The final project will be your major output in the course. The goal of the final project is to develop a full understanding of an important open research area, and, to the extent possible, to work on an open research problem. The final project will include a major final paper.

     

At this time there is no intention to hold a final or midterm exam.

All assignments will be due at the beginning of class on the appropriate day. Late assignments are not acceptable without the prior consent of the instructor. Consent will generally be given for suitable reasons. Being busy in other classes is generally not a suitable reason.

 

Collaboration policy

I would like to emphasize the rules on working with others on homework assignments. For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with other students currently taking the class in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer. Paper summaries are meant to be done almost entirely on your own. Brief discussions with other students after you have all read the paper are permissible, but the paper summary should be entirely your own work.

If you collaborate with other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.

Under no circumstances may you use solution sets to problems that may have been distributed by the course in past years, or the homework papers of students who have taken the course past years. Nor should you look up solution sets from other similar courses.

Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action.

 

Required Text

Generally papers will be made available either in class or at the class Web site. No text is required.

 

Equipment Requirements

I've been told to put on the syllabus that ipads with stylus or related devices are required for homeworks, office hours, class discussions etc.  So I am doing so!  This will allow you to borrow equipment from Harvard -- the contact number for this is (617) 495-7777.  Of course if you don't manage to borrow one we will find ways to deal but it will certainly be a lot better if you do -- it's a lot easier to work on problems with peers (or with me) with a device.  

 

Class Information/Notes

Class notes, homework assignments, and other information will be made available on the Web. For access go to the class web site. We will also use Piazza to disseminate further information. In many cases, the class web site may be the only location where information is posted or available, so look in from time to time!

 

Incomplete List of Possible Topics

The following is an incomplete list of topics based on previous course offerings. Again, the topics may change somewhat for the coming year.

 

  • Summarization and sketching algorithms
    • Data sketches
    • Bloom filters and variants
    • Data streams and streaming algorithms
    • Machine-learning predictions and sketching
  • Information retrieval and the Web
    • Ranking documents: PageRank, Kleinberg's algorithm
    • Other uses of link information
    • Link prediction
  • Compression and Coding
    • Basics of information theory
    • Huffman compression (non-adaptive, adaptive)
    • Arithmetic coding
    • Lempel-Ziv compression and its variants
    • Burrows-Wheeler compression
    • JPEG/MPEG/Audio
    • Erasure and error-correcting codes
    • Codes in network transmission and data storage

 

Course Summary:

Date Details Due