BST 281: Genomic Data Manipulation

Lab Session

Time: R 3:45-5:15
Location:    FXB G03

Optional, with a focus on problem sets and refreshers on technical material


Instructor Information

Curtis Huttenhower
Professor of Computational Biology
Department of Biostatistics
Office Hours: F 11:00-12:00, SPH1 413

Eric Franzosa
Research Scientist
Department of Biostatistics
Office Hours: F 11:00-12:00, SPH1 413


Teaching Assistant

Mike MacArthur
Office Hours: F 9:00-10:00, FXB second floor atrium


Genomic Data Manipulation will present a practical introduction to the tools and techniques needed to obtain, analyze, and interpret a variety of modern genome‐scale data types. The course assumes prior basic familiarity with Python programming and command line environments, and we will take advantage of them during the course to understand statistical methods for molecular data analysis, geared toward biological investigators interpreting their own data or integrating it with results from public repositories. We will discuss the types of experimental results commonly encountered in genomic data analysis (high‐throughput sequencing, gene expression, protein‐protein interaction networks, etc.) and freely available online sources for these data. The course will include several weeks of seminar‐format discussions on current research in genomic data analysis, a midterm presenting published research in journal club format, and conclude with a final project of your choice analyzing real‐world experimental data. Undergraduate‐level statistical expertise and applied computing skills are strongly recommended.


Texts and Reading Materials 

Required: Introduction to Genomics, Lesk (3rd edition)

Required: Bioinformatics and Functional Genomics, Pevsner (3rd edition)

Recommended: Principles of Biostatistics, Pagano and Gauvreau (2nd edition)


Grading Criteria

  • 10% participation
  • 50% problem sets (6x)
  • 15% presentations
  • 25% final project


Course Objectives

At the end of the course, the student will be able to:

  • Process and manipulate an array of 'omics data types (sequence, expression, structure, etc.) and understand how they may be generated or obtained from public repositories.
  • Apply basic computational and statistical tools to analyze these data types.
  • Demonstrate the use of general computational tools (incl. Python) for manipulating genomic/experimental data.
  • Critically analyze broad areas of current research in quantitative biology.


Outcome Measures

Assignments to measure the students’ competence in the course objectives above:

  • Biweekly problem assignments through the first 3/4 of the course, focusing on performing a few specific manipulations of real 'omics datasets.
  • Paper presentations at the course midpoint demonstrating a grasp of the issues arising in current 'omics publications.
  • A final project of the student's choice, requiring substantial practical manipulation of one or more 'omics datasets using a range of tools discussed in the course.


Additional Information

  1. Scientific computing and computational experiments in biology, including good practices, testing positive and negative controls, and appropriate computational environments.
  2. Biological sequence analysis, including genome assembly and annotation.
  3. Meta'omics and microbial community sequencing.
  4. Comparative genomics, molecular evolution, and phylogenetics.
  5. Biological networks and protein-protein interactions.
  6. Genetic screens and epigenetic interactions, initial "journal club" research midterm presentations.
  7. "Journal club" research midterm presentations.
  8. Basic descriptive and inferential statistics for molecular biology.
  9. Genetic association testing.
  10. Gene expression data acquisition, normalization, clustering, enrichment testing, and interpretation.
  11. Transcriptional regulatory module discovery and analysis, epigenetics.
  12. Proteomics and metabolomics, with a focus on mass spectrometry.
  13. Systems biology, biophysics, dynamical systems modeling, and high-dimensional data visualization.
  14. Final project presentations.


Course Evaluations

Completion of the evaluation is a requirement for each course. Your grade will not be available until you submit the evaluation. In addition, registration for future terms will be blocked until you have completed evaluations for courses in prior terms.

Course Summary:

Date Details Due
CC Attribution This course content is offered under a CC Attribution license. Content in this course can be considered under this license unless otherwise noted.