BST 281: Genomic Data Manipulation
Lab Session
Time: | R 3:45-5:15 |
Location: | FXB G03 |
Optional, with a focus on problem sets and refreshers on technical material
Instructor Information
Curtis Huttenhower
Professor of Computational Biology
Department of Biostatistics
617-432-4912
chuttenh@hsph.harvard.edu
Office Hours: F 11:00-12:00, SPH1 413
Eric Franzosa
Research Scientist
Department of Biostatistics
franzosa@hsph.harvard.edu
Office Hours: F 11:00-12:00, SPH1 413
Teaching Assistant
Mike MacArthur
macarthur@g.harvard.edu
Office Hours: F 9:00-10:00, FXB second floor atrium
Genomic Data Manipulation will present a practical introduction to the tools and techniques needed to obtain, analyze, and interpret a variety of modern genome‐scale data types. The course assumes prior basic familiarity with Python programming and command line environments, and we will take advantage of them during the course to understand statistical methods for molecular data analysis, geared toward biological investigators interpreting their own data or integrating it with results from public repositories. We will discuss the types of experimental results commonly encountered in genomic data analysis (high‐throughput sequencing, gene expression, protein‐protein interaction networks, etc.) and freely available online sources for these data. The course will include several weeks of seminar‐format discussions on current research in genomic data analysis, a midterm presenting published research in journal club format, and conclude with a final project of your choice analyzing real‐world experimental data. Undergraduate‐level statistical expertise and applied computing skills are strongly recommended.
Texts and Reading Materials
Required: Introduction to Genomics, Lesk (3rd edition)
Required: Bioinformatics and Functional Genomics, Pevsner (3rd edition)
Recommended: Principles of Biostatistics, Pagano and Gauvreau (2nd edition)
Grading Criteria
- 10% participation
- 50% problem sets (6x)
- 15% presentations
- 25% final project
Course Objectives
At the end of the course, the student will be able to:
- Process and manipulate an array of 'omics data types (sequence, expression, structure, etc.) and understand how they may be generated or obtained from public repositories.
- Apply basic computational and statistical tools to analyze these data types.
- Demonstrate the use of general computational tools (incl. Python) for manipulating genomic/experimental data.
- Critically analyze broad areas of current research in quantitative biology.
Outcome Measures
Assignments to measure the students’ competence in the course objectives above:
- Biweekly problem assignments through the first 3/4 of the course, focusing on performing a few specific manipulations of real 'omics datasets.
- Paper presentations at the course midpoint demonstrating a grasp of the issues arising in current 'omics publications.
- A final project of the student's choice, requiring substantial practical manipulation of one or more 'omics datasets using a range of tools discussed in the course.
Additional Information
- Scientific computing and computational experiments in biology, including good practices, testing positive and negative controls, and appropriate computational environments.
- Biological sequence analysis, including genome assembly and annotation.
- Meta'omics and microbial community sequencing.
- Comparative genomics, molecular evolution, and phylogenetics.
- Biological networks and protein-protein interactions.
- Genetic screens and epigenetic interactions, initial "journal club" research midterm presentations.
- "Journal club" research midterm presentations.
- Basic descriptive and inferential statistics for molecular biology.
- Genetic association testing.
- Gene expression data acquisition, normalization, clustering, enrichment testing, and interpretation.
- Transcriptional regulatory module discovery and analysis, epigenetics.
- Proteomics and metabolomics, with a focus on mass spectrometry.
- Systems biology, biophysics, dynamical systems modeling, and high-dimensional data visualization.
- Final project presentations.
Course Evaluations
Completion of the evaluation is a requirement for each course. Your grade will not be available until you submit the evaluation. In addition, registration for future terms will be blocked until you have completed evaluations for courses in prior terms.
Course Summary:
Date | Details | Due |
---|---|---|
Mon Jan 28, 2019 | Calendar Event Scientific Computing and Computational Experiments | 3:45pm to 5:15pm |
Wed Jan 30, 2019 | Calendar Event Biological Sequences: Concepts, Technologies and Data Sources | 3:45pm to 5:15pm |
Thu Jan 31, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Feb 4, 2019 | Calendar Event Biological Sequences: Computational Methods for Alignment and Mapping | 3:45pm to 5:15pm |
Wed Feb 6, 2019 | Calendar Event Genomes Assembly, Annotation and Algorithms | 3:45pm to 5:15pm |
Thu Feb 7, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Feb 11, 2019 | Calendar Event Metagenomics: Prokaryotic 'Omics and Amplicon Sequencing Techniques | 3:45pm to 5:15pm |
Assignment Problems 01: Quantitative biology and sequence analysis | due by 11:59pm | |
Wed Feb 13, 2019 | Calendar Event Metagenomic: Shotgun Metagenomics and Metatranscriptomics | 3:45pm to 5:15pm |
Thu Feb 14, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Wed Feb 20, 2019 | Calendar Event Comparative Genomics, Molecular Evolution | 3:45pm to 5:15pm |
Thu Feb 21, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Fri Feb 22, 2019 | Assignment Problems 02: Microbial 'omics | due by 11:59pm |
Mon Feb 25, 2019 | Calendar Event Biological Networks: Introduction, Graph Model, Cell Circuits and Network Biology | 3:45pm to 5:15pm |
Wed Feb 27, 2019 | Calendar Event Biological Networks: Protein-Protein Interactions | 3:45pm to 5:15pm |
Thu Feb 28, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Mar 4, 2019 | Calendar Event Biological Networks: Genetic Screens and Epistatic Interactions | 3:45pm to 5:15pm |
Wed Mar 6, 2019 | Calendar Event Midterm: Journal Club Paper Presentations | 3:45pm to 5:15pm |
Thu Mar 7, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Mar 11, 2019 | Calendar Event Midterm: Journal Club Paper Presentations | 3:45pm to 5:15pm |
Wed Mar 13, 2019 | Calendar Event Midterm: Journal Club Paper Presentations | 3:45pm to 5:15pm |
Thu Mar 14, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Fri Mar 15, 2019 | Assignment Midterm: Journal Club Presentation | due by 11:59pm |
Assignment Problems 03: Comparative genomics and biological networks | due by 11:59pm | |
Mon Mar 25, 2019 | Calendar Event Quantitative Methods: Descriptive Statistics | 3:45pm to 5:15pm |
Wed Mar 27, 2019 | Calendar Event Quantitative Methods: Inference and Hypothesis Testing | 3:45pm to 5:15pm |
Thu Mar 28, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Apr 1, 2019 | Calendar Event Quantitative Genetics: Introduction to Linkage and Association | 3:45pm to 5:15pm |
Wed Apr 3, 2019 | Calendar Event Quantitative Genetics: Genome-Wide and Family-Based Association Studies | 3:45pm to 5:15pm |
Thu Apr 4, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Apr 8, 2019 | Calendar Event Gene Expression: Microarray and RNA-seq Data Acquisition and Normalization | 3:45pm to 5:15pm |
Quiz M11.1: Microarrays | due by 11:59pm | |
Quiz M11.2: RNA-seq | due by 11:59pm | |
Wed Apr 10, 2019 | Calendar Event Gene Expression: Clustering, Enrichment Analyses and Interpretation | 3:45pm to 5:15pm |
Thu Apr 11, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Apr 15, 2019 | Calendar Event Transcriptional Regulation: Regulatory Modules, Binding Sites, microRNAs | 3:45pm to 5:15pm |
Wed Apr 17, 2019 | Calendar Event Epigenetics | 3:45pm to 5:15pm |
Thu Apr 18, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Fri Apr 19, 2019 | Assignment Problems 04: Quantitative methods, transcriptomics, and genetics | due by 11:59pm |
Mon Apr 22, 2019 | Calendar Event Proteomics | 3:45pm to 5:15pm |
Wed Apr 24, 2019 | Calendar Event Metabolomics | 3:45pm to 5:15pm |
Thu Apr 25, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon Apr 29, 2019 | Calendar Event Systems Biology: Biophysics and Dynamical Systems Modeling | 3:45pm to 5:15pm |
Wed May 1, 2019 | Calendar Event Scientific Data Vizualization | 3:45pm to 5:15pm |
Thu May 2, 2019 | Calendar Event Lab | 3:45pm to 5:15pm |
Mon May 6, 2019 | Calendar Event Final: Project Presentations | 3:45pm to 5:15pm |
Wed May 8, 2019 | Calendar Event Final: Project Presentations | 3:45pm to 5:15pm |
Fri May 10, 2019 | Assignment Problems 05: Regulatory sequences, proteomics, metabolomics, and systems biology | due by 11:59pm |
Mon May 13, 2019 | Calendar Event Final: Project Presentations | 3:45pm to 5:15pm |
Wed May 15, 2019 | Calendar Event Final: Project Presentations | 3:45pm to 5:15pm |
Assignment Final Project: Group Data Packet | due by 11:59pm | |
Assignment Final Project: Individual Write-up | due by 11:59pm | |
Calendar Event Biological networks | Codecademy 3: conditionals and control flow | ||
Calendar Event Biological sequences: concepts and data | ||
Calendar Event Comparative genomics, molecular evolution | ||
Calendar Event Course overview | Python setup | ||
Calendar Event Epigenetics | IPython regulation and comparative genomics | ||
Calendar Event Gene expression: clustering, enrichment analyses, and interpretation | IPython transcriptional analysis | ||
Calendar Event Gene expression: microarray and RNA-seq data acquisition and normalization | IPython transcriptional analysis | ||
Calendar Event Genetic interaction analysis and networks | IPython protein and genetic networks | ||
Calendar Event High-throughput sequencing technologies and data| Codecademy 8-9: loops and practice | ||
Calendar Event Introduction to Python: environment, data types, operators, and functions | IPython for biological sequences | ||
Calendar Event Journal club paper presentations | ||
Calendar Event Journal club paper presentations | ||
Calendar Event Journal club paper presentations | ||
Calendar Event Metagenomics | ||
Calendar Event Programming for bioinformatics: file utilities and command environments | Codecademy: command line | ||
Calendar Event Programming for bioinformatics: modules, files, and data I/O | Codecademy 12: I/O | ||
Calendar Event Programming for bioinformatics: regular expressions | RegexOne | ||
Calendar Event Project presentations | ||
Calendar Event Project presentations | ||
Calendar Event Project presentations | ||
Calendar Event Project presentations | ||
Calendar Event Project questions and wrapup | ||
Calendar Event Protein structure, proteomics, and protein-protein interaction networks | IPython protein and genetic networks | ||
Calendar Event Python: control flow, defining functions and references | Codecademy 4-6: functions, lists and dictionaries, practice | ||
Calendar Event Quantitative methods: descriptive statistics | IPython for biostatistics | ||
Calendar Event Quantitative methods: inference and hypothesis testing | IPython for biostatistics | ||
Calendar Event Scientific computing and computational experiments | Codecademy 1-2: syntax, strings, and output | ||
Calendar Event Sequencing and genomes: assembly and algorithms | ||
Calendar Event Transcriptional regulation: regulatory modules, binding sites, microRNAs |
