Notes
Slide Show
Outline
1
A New Approach Toward Complex Traits

  • Shaw-Hwa Lo, Tian Zheng
  • Columbia University
  • Thursday, August 05, 2004
2
Talk outline
  • The dilemma and the solution
  • The information-driven screening
  • Illustration using a genome scan on IBD
  • Conclusion
  • Reference and current efforts
3
The dilemma
  • Complex traits —“… are caused by multiple genes interacting with each other and with environmental factors to create a gradient of genetic susceptibility to disease.” (Weeks and Lathrop 1995)
4
Screening problem
  • Difficulties with such a screening
    • Large number of markers
    • Mild individual effects of markers
    • Possible gene-gene interactions
    • Moderate size of patients
5
An information-driven solution
  • A backward selection algorithm that
    • applies an association information measure with respect to the disease on a set of markers under study;
    • reduces the set by deleting an unimportant marker that contributes the least to the association information contained in the set;
    • stops when deleting any of the remaining markers will result in a substantial loss of information.
  • Algorithm is ran on random marker subsets of size that can be handled by the size of patients.
  • Final selection is based on aggregated screening results on a large number of random subsets.
6
Notation for Case-Parent Trio Haplotype Data
  • Data: a random sample of n patients and their parents.
    • 2n parent-patient transmission pairs;
    • Each pair consists of two haplotypes;
    • one transmitted and the other untransmitted.
  • For     pair, let      be the haplotype transmitted to the diseased child, and       be the untransmitted
7
Information Measure: Haplotype transmission Disequilibrium (HTD)
  • Define counts:



  • HTD is defined to measure the amount of association information contained in the set of markers being tested


8
Idea of screening
  • Assume m markers                                     are being screened;
  • The      marker         is to be evaluated;
  • Consider                      (the     -deleted set)
  • The amount of information contributed by              can be evaluated using the HTD difference—the information drop, which can used to measure the importance of          to current marker set.
9
Marker Importance Measure
  • Haplotype Transmission Association (HTA)



  • Expectation of HTA reflects the importance of the marker under evaluation:



  • where                                                        is the haplotype risk ratio.
10
Properties of HTA Statistic
11
BHTA Screening algorithm
  • Backward Haplotype Transmission Association algorithm:
    • Step0: Start with
    • Step1: Calculate HTA for each        retained in
    • Step2: If all HTA are negative, stop; otherwise delete the marker with highest HTA;
    • Step3: If             , stop; otherwise continue to step1;
    • Stop: return all markers retained in        as selected markers
12
Example 1: a model with no marginal effects (Simulated)
13
Information Flow during BHTA
14
Issue of sparseness
  • If the number of possible haplotypes is significantly larger than the number of observations, markers are deleted randomly.
  • This is because the counts of haplotype transmissions are mostly 0's or 1's, hence the HTA scores based on these counts all cluster around zero, which makes the selection non-informative.
15
Two-step BHTA-based marker selection procedure
  • Step 1: Randomly select k (15, for instance) markers out of the original set of K markers. Run BHTA on the selected markers and record the markers returned.
  • Step 2: Repeat step 1 B times (e.g. B=5000).
  • Selection: important markers are selected based on the distribution of the return frequencies.
16
Relative Chance of Deletion During BHTA
17
Two-Step Selection Procedure Results on Example 1
18
Example using
Inflammatory Bowel Disease
  • IBD consists principally two chronic idiopathic inflammatory diseases of the gastrointestinal tract: ulcerative colitis (UC) and Crohn’s disease (CD);
  • Genetic basis: relatives of individual with either CD or UC are at increased risk for developing either form of IBD;
  • There have been several loci with relevance to IBD etiology identified since 1996, through different studies, IBD1-IBD7.
19
Data
  • 235 case-parent trios
  • 402 microsatellite markers on all 23 chromosomes with an average of 12 CM inter-marker distance.
20
Implementation of BHTA
  • Step I: Imputation, haplotype inference and marker dichotomization
  • Step II: BHTA screening on 10 independent imputations (each with 10,000 screenings)
  • Step III: Aggregation of return results and selection of important markers
21
Selecting the important markers
22
Another method to identify the important group (interesting group, Efron (JASA, 04)

f (z) = p g (z) + (1- p) h (z),

false discovery rate fdr (z)= g(z)/ f(z) < .1

g(z) is a normal density with mean= 1090 and
SD= 82.
23
Selecting the important markers
24
Results
  • 48 markers were identified as important markers, spread across many of the 23 chromosomes;
  • Our selected markers overlap with all previously reported IBD loci, except IBD6;
  • The importance of each marker is further evaluated by its return frequencies.
25
 
26
 
27
Selected markers at IBD loci
28
 
29
Conclusion
  • The major weakness of conventional approaches are due in part to the fact that only fractional information from the data is used;
  • Our new approach intends to draw substantially more information from data;
  • More understanding of complex traits can be derived if data already collected could be suitably reanalyzed by this approach;
  • This new approach will be useful in the future when the information of a large number of dense markers becomes available;
  • Information about gene-gene interactions can be derived from joint return patterns in BHTA results.
30
Reference and current efforts
  • Backward haplotype transmission association (BHTA) algorithm—a fast multiple-marker screening method. Hum Hered 53:197-215
  • Extensions to case-control studies and quantitative traits are currently under study.