G8325 Topics in Advanced Statistics
Modern Statistical Learning and Computing, with application to biology

In this course we will explore some current statistical learning and computing methodologies with case studies from current computational biology literature. Through these case studies, we will also have a brief survey of the major research areas in statistical applications to biology. Specific topics on computing and simulations are to be covered along the way.

Developments in biotechnology over the past two decades provide biologists with new tools to study the biological activities of various organisms (including human beings) with more details, on a larger scale, and in real time. Data generated from such experiments are usually of unprecedented scales and complexities, the analysis task of which offer new challenges to the field of statistics. New methods have been developed to address these difficult tasks, which is also facilitated by recent advancements in computing powers.

The course will start with several introductory lectures on biology, introductory statistical computing, and important general concepts in statistical learning. Each week we will start with a statistical methodology topic and followed by discussion on a current paper from the computational biology literature. Topics to be covered include but not limited to: EM algorithms, Monte Carlo optimization, bootstrap methods, clustering and Classification, tree-based learning methods, etc. Biological topics to explore include but not limited to: gene expression analysis, haplotype inference, phylogeny analysis, sequence analysis, etc.