Notes
Slide Show
Outline
1
Backward Haplotype Transmission Association (BHTA) Algorithm
  • — A Fast Multi-point Screening Method for Complex Traits


  • Tian Zheng
  • Department of Statistics
  • Columbia University
2
Acknowledgements



  • This is a joint work with Professor Shaw-Hwa Lo at Columbia University.
3
Outline
  • Introduction
  • BHTA-based methods
    • Algorithm details
    • Marker selection procedures
    • Examples
  • Summary and Future efforts
4
Example of Complex Traits: BBS syndrome
5
Screening for Complex Traits
  • Complex diseases —“… are caused by multiple genes interacting with each other and with environmental factors to create a gradient of genetic susceptibility to disease.” (Weeks and Lathrop 1995)
  • “…there seems to be a great need for the development of multilocus tests of association that make use of haplotype information, since these might prove to be much more efficient.”—J. Pritchard and M. Przeworski (June 2001). Am. J. Hum. Genet. 69: 1-14
6
Current Multi-locus Approaches for complex traits
  • Marker-by-marker strategies
  • Sums of single-marker statistics
    • Hoh, J et al (2001) Genome Res. 11: 2115-2119
  • Multifactor-dimensionality reduction (MDR)
    • Richie, MD et al (2001) Am. J. Hum. Genet. 69:138-147
7
Ideal Marker Screening Method
  • Start with a large set of candidate genetic markers;
  • Screen out the markers with little information regarding the disease traits;
  • Take into account of possible interactions among the disease genes;
  • Time and Memory efficient.
8
BHTA—Introduction
  • Backward Haplotype Transmission Association (BHTA) algorithm—A fast multi-point screening algorithm based on haplotypic transmission disequilibrium.
    • Multipoint Screening
    • Fast and memory-efficient
    • Use haplotypic transmission information—taking into account interactions among disease susceptibility loci
    • Automatically select a set of important markers as screening result
9
BHTA Algorithm Details
  • Notations & statistics formulations
  • Screening mechanisms
  • Two-step marker selection procedure based on BHTA algorithm


10
Haplotype Transmission Disequilibrium
  • Assume a parent of a patient has two haplotypes:
11
Haplotype Transmission Disequilibrium (continued)
12
Notation for Case-Parent Trio Haplotype Data
  • Data: a random sample of n patients and their parents.
    • 2n parent-patient transmission pairs;
    • Each pair consists of two haplotypes;
    • one transmitted and the other untransmitted.
  • For     pair, let      be the haplotype transmitted to the diseased child, and       be the untransmitted
13
Information Measure: Haplotype transmission Disequilibrium (HTD)
  • Define counts:



  • HTD is defined to measure the amount of linkage/LD information contained in the set of markers being tested


14
Idea of screening
  • Assume m markers                                     are being screened;
  • The      marker         is to be evaluated;
  • Consider                      (the     -deleted set)
  • The amount of information contributed by              can be evaluated using the HTD difference—the information drop, which can used to measure the importance of          to current marker set.
15
Marker Importance Measure
  • Mathematical inference and formulation of Haplotype Transmission Association (HTA)





  •    where                                                is the haplotype risk ratio.
16
Properties of HTA Statistic
17
BHTA Screening algorithm
  • Backward Haplotype Transmission Association algorithm:
    • Step0: Start with
    • Step1: Calculate HTA for each        retained in
    • Step2: If all HTA are negative, stop; otherwise delete the marker with highest HTA;
    • Step3: If             , stop; otherwise continue to step1;
    • Stop: return all markers retained in        as selected markers
18
Example 1: a model with no marginal effects (Simulated)
19
Information Flow during BHTA
20
Issue of sparseness
  • If the number of possible haplotypes is significantly larger than the number of observations, markers are deleted randomly.
  • This is because the counts of haplotype transmissions are mostly 0's or 1's, hence the HTA scores based on these counts all cluster around zero, which makes the selection non-informative.
21
Two-step BHTA-based marker selection procedure
  • Step 1: Randomly select k (15, for instance) markers out of the original set of K markers. Run BHTA on the selected markers and record the markers returned.
  • Step 2: Repeat step 1 B times (e.g. B=5000).
  • Selection: Markers whose returning frequencies are more than the  quartile plus 1.5 times IQR (inter-quartile range) will be selected in the resulting set.
22
Relative Chance of Deletion During BHTA
23
Two-Step Selection Procedure Results on Example 1
24
Example 2: Simulated BBS-type Complex Traits
25
Summary
  • BHTA based marker selection methods
  • are haplotype-based so that they are capable of handling possible complicated interaction among disease loci;
  • are formulated without specific disease model pre-assumed;
  • can be applied to large-scale studies with hundreds of candidate markers;
  • are time efficient—algorithm complexity is on the order of nm. (n is the number of patients, and m is the number of markers).
26
Current Efforts
  • Variance of HTA and corresponding adjustments to the screening algorithm;
  • Extension to genotype data;
  • Extension to mapping of quantitative trait loci;
  • Extension to gene expression data analysis.