|
1
|
- Shaw-Hwa Lo, Tian Zheng
- Columbia University
- Thursday, August 05, 2004
|
|
2
|
- The dilemma and the solution
- The information-driven screening
- Illustration using a genome scan on IBD
- Conclusion
- Reference and current efforts
|
|
3
|
- Complex traits —“… are caused by multiple genes interacting with each
other and with environmental factors to create a gradient of genetic
susceptibility to disease.” (Weeks and Lathrop 1995)
|
|
4
|
- Difficulties with such a screening
- Large number of markers
- Mild individual effects of markers
- Possible gene-gene interactions
- Moderate size of patients
|
|
5
|
- A backward selection algorithm that
- applies an association information measure with respect to the disease
on a set of markers under study;
- reduces the set by deleting an unimportant marker that contributes the
least to the association information contained in the set;
- stops when deleting any of the remaining markers will result in a
substantial loss of information.
- Algorithm is ran on random marker subsets of size that can be handled by
the size of patients.
- Final selection is based on aggregated screening results on a large
number of random subsets.
|
|
6
|
- Data: a random sample of n patients and their parents.
- 2n parent-patient transmission pairs;
- Each pair consists of two haplotypes;
- one transmitted and the other untransmitted.
- For pair, let be the haplotype transmitted to
the diseased child, and be
the untransmitted
|
|
7
|
- Define counts:
- HTD is defined to measure the amount of association information
contained in the set of markers being tested
|
|
8
|
- Assume m markers are
being screened;
- The marker is to be evaluated;
- Consider
(the -deleted set)
- The amount of information contributed by can be evaluated using the
HTD difference—the information drop, which can used to measure the
importance of to current
marker set.
|
|
9
|
- Haplotype Transmission Association (HTA)
- Expectation of HTA reflects the importance of the marker under
evaluation:
- where
is the haplotype risk ratio.
|
|
10
|
|
|
11
|
- Backward Haplotype Transmission Association algorithm:
- Step0: Start with
- Step1: Calculate HTA for each
retained in
- Step2: If all HTA are negative, stop; otherwise delete the marker with
highest HTA;
- Step3: If , stop;
otherwise continue to step1;
- Stop: return all markers retained in as selected markers
|
|
12
|
|
|
13
|
|
|
14
|
- If the number of possible haplotypes is significantly larger than the
number of observations, markers are deleted randomly.
- This is because the counts of haplotype transmissions are mostly 0's or
1's, hence the HTA scores based on these counts all cluster around zero,
which makes the selection non-informative.
|
|
15
|
- Step 1: Randomly select k (15, for instance) markers out of the original
set of K markers. Run BHTA on the selected markers and record the
markers returned.
- Step 2: Repeat step 1 B times (e.g. B=5000).
- Selection: important markers are selected based on the distribution of
the return frequencies.
|
|
16
|
|
|
17
|
|
|
18
|
- IBD consists principally two chronic idiopathic inflammatory diseases of
the gastrointestinal tract: ulcerative colitis (UC) and Crohn’s disease
(CD);
- Genetic basis: relatives of individual with either CD or UC are at
increased risk for developing either form of IBD;
- There have been several loci with relevance to IBD etiology identified
since 1996, through different studies, IBD1-IBD7.
|
|
19
|
- 235 case-parent trios
- 402 microsatellite markers on all 23 chromosomes with an average of 12
CM inter-marker distance.
|
|
20
|
- Step I: Imputation, haplotype inference and marker dichotomization
- Step II: BHTA screening on 10 independent imputations (each with 10,000
screenings)
- Step III: Aggregation of return results and selection of important
markers
|
|
21
|
|
|
22
|
|
|
23
|
|
|
24
|
- 48 markers were identified as important markers, spread across many of
the 23 chromosomes;
- Our selected markers overlap with all previously reported IBD loci,
except IBD6;
- The importance of each marker is further evaluated by its return
frequencies.
|
|
25
|
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
- The major weakness of conventional approaches are due in part to the
fact that only fractional information from the data is used;
- Our new approach intends to draw substantially more information from
data;
- More understanding of complex traits can be derived if data already
collected could be suitably reanalyzed by this approach;
- This new approach will be useful in the future when the information of a
large number of dense markers becomes available;
- Information about gene-gene interactions can be derived from joint
return patterns in BHTA results.
|
|
30
|
- Backward haplotype transmission association (BHTA) algorithm—a fast
multiple-marker screening method. Hum Hered 53:197-215
- Extensions to case-control studies and quantitative traits are currently
under study.
|