Course Syllabus

Lab Session

Time: M 5:30-7:20
Location:    FXB G11

Mandatory during problem set assignments

 

Instructor Information

Curtis Huttenhower
Associate Professor of Computational Biology
Department of Biostatistics
617-432-4912
chuttenh@hsph.harvard.edu
Office Hours: M 10:00-11:00, SPH1 413

Eric Franzosa
Research Scientist
Department of Biostatistics
franzosa@hsph.harvard.edu
Office Hours: M 10:00-11:00, SPH1 412

 

Teaching Assistant

Xue Zou
xuz943@mail.harvard.edu
Office Hours: W 12:30-1:30, CLSB 11021

 

Genomic Data Manipulation will present a practical introduction to the tools and techniques needed to obtain, analyze, and interpret a variety of modern genome‐scale data types. The course will provide a brief overview of Python programming and statistical methods for high‐dimensional data analysis, geared toward biological investigators interpreting their own data or integrating it with results from public repositories. We will discuss the types of experimental results commonly encountered in genomic data analysis (protein‐protein interaction networks, gene expression, high‐throughput sequencing, etc.) and freely available online sources for these data. The course will include several weeks of seminar‐format discussions on current research in genomic data analysis and conclude with a final project of your choice analyzing real‐world experimental data. Undergraduate‐level statistical expertise is strongly recommended, although no prior programming experience is assumed. Previously listed as BIO508.

 

Texts and Reading Materials 

Required: Bioinformatics and Functional Genomics, Pevsner (3rd edition)

Required: Practical Computing for Biologists, Haddock and Dunn

Recommended: Principles of Biostatistics, Pagano and Gauvreau (2nd edition)

 

Grading Criteria

  • 10% participation
  • 50% problem sets (6x)
  • 15% presentations
  • 25% final project

 

Course Objectives

At the end of the course, the student will be able to:

  • Process and manipulate an array of 'omics data types (sequence, expression, structure, etc.) and understand how they may be generated or obtained from public repositories.
  • Apply basic computational and statistical tools to analyze these data types.
  • Demonstrate the use of general computational tools (incl. Python) for manipulating genomic/experimental data.
  • Critically analyze broad areas of current research in quantitative biology.

 

Outcome Measures

Assignments to measure the students’ competence in the course objectives above:

  • Biweekly problem assignments through the first 3/4 of the course, focusing on performing a few specific manipulations of real 'omics datasets.
  • Paper presentations at the course midpoint demonstrating a grasp of the issues arising in current 'omics publications.
  • A final project of the student's choice, requiring substantial practical manipulation of one or more 'omics datasets using a range of tools discussed in the course.

 

Additional Information

  1. Scientific computing and computational experiments in biology, including good practices, testing positive and negative controls, and appropriate computational environments.
  2. Introduction to biological sequence analyses and to Python as a computing environment.
  3. Introduction to biological network analyses, continued Python practice.
  4. High-throughput sequencing technologies and data, quantitative biology in a command line environment.
  5. Genome sequencing, assembly, and annotation.
  6. File and data manipulation in Python, initial "journal club" research presentations.
  7. "Journal club" research presentations.
  8. Metagenomics and introduction to statistics for molecular biology.
  9. Regular expressions, overview of descriptive statistics and inference for quantitative biology.
  10. Gene expression data acquisition, normalization, clustering, enrichment testing, and interpretation.
  11. Transcriptional regulatory module discovery and analysis, epigenetics.
  12. Comparative genomics, molecular evolution, phylogenetics, and proteomics.
  13. Synthetic genetic screens, interaction networks, and epistasis; initial final project presentations.
  14. Final project presentations.

 

Course Evaluations

Completion of the evaluation is a requirement for each course. Your grade will not be available until you submit the evaluation. In addition, registration for future terms will be blocked until you have completed evaluations for courses in prior terms.

Course Summary:

Date Details Due