|
1
|
- — A Fast Multi-point Screening Method for Complex Traits
- Tian Zheng
- Department of Statistics
- Columbia University
|
|
2
|
- This is a joint work with Professor Shaw-Hwa Lo at Columbia University.
|
|
3
|
- Introduction
- BHTA-based methods
- Algorithm details
- Marker selection procedures
- Examples
- Summary and Future efforts
|
|
4
|
|
|
5
|
- Complex diseases —“… are caused by multiple genes interacting with each
other and with environmental factors to create a gradient of genetic
susceptibility to disease.” (Weeks and Lathrop 1995)
- “…there seems to be a great need for the development of multilocus tests
of association that make use of haplotype information, since these might
prove to be much more efficient.”—J. Pritchard and M. Przeworski (June
2001). Am. J. Hum. Genet. 69: 1-14
|
|
6
|
- Marker-by-marker strategies
- Sums of single-marker statistics
- Hoh, J et al (2001) Genome Res. 11: 2115-2119
- Multifactor-dimensionality reduction (MDR)
- Richie, MD et al (2001) Am. J. Hum. Genet. 69:138-147
|
|
7
|
- Start with a large set of candidate genetic markers;
- Screen out the markers with little information regarding the disease
traits;
- Take into account of possible interactions among the disease genes;
- Time and Memory efficient.
|
|
8
|
- Backward Haplotype Transmission Association (BHTA) algorithm—A fast multi-point
screening algorithm based on haplotypic transmission disequilibrium.
- Multipoint Screening
- Fast and memory-efficient
- Use haplotypic transmission information—taking into account
interactions among disease susceptibility loci
- Automatically select a set of important markers as screening result
|
|
9
|
- Notations & statistics formulations
- Screening mechanisms
- Two-step marker selection procedure based on BHTA algorithm
|
|
10
|
- Assume a parent of a patient has two haplotypes:
|
|
11
|
|
|
12
|
- Data: a random sample of n patients and their parents.
- 2n parent-patient transmission pairs;
- Each pair consists of two haplotypes;
- one transmitted and the other untransmitted.
- For pair, let be the haplotype transmitted to
the diseased child, and be
the untransmitted
|
|
13
|
- Define counts:
- HTD is defined to measure the amount of linkage/LD information contained
in the set of markers being tested
|
|
14
|
- Assume m markers are
being screened;
- The marker is to be evaluated;
- Consider
(the -deleted set)
- The amount of information contributed by can be evaluated using the
HTD difference—the information drop, which can used to measure the
importance of to current
marker set.
|
|
15
|
- Mathematical inference and formulation of Haplotype Transmission
Association (HTA)
- where
is the haplotype risk ratio.
|
|
16
|
|
|
17
|
- Backward Haplotype Transmission Association algorithm:
- Step0: Start with
- Step1: Calculate HTA for each
retained in
- Step2: If all HTA are negative, stop; otherwise delete the marker with
highest HTA;
- Step3: If , stop;
otherwise continue to step1;
- Stop: return all markers retained in as selected markers
|
|
18
|
|
|
19
|
|
|
20
|
- If the number of possible haplotypes is significantly larger than the
number of observations, markers are deleted randomly.
- This is because the counts of haplotype transmissions are mostly 0's or
1's, hence the HTA scores based on these counts all cluster around zero,
which makes the selection non-informative.
|
|
21
|
- Step 1: Randomly select k (15, for instance) markers out of the original
set of K markers. Run BHTA on the selected markers and record the
markers returned.
- Step 2: Repeat step 1 B times (e.g. B=5000).
- Selection: Markers whose returning frequencies are more than the quartile plus 1.5 times IQR
(inter-quartile range) will be selected in the resulting set.
|
|
22
|
|
|
23
|
|
|
24
|
|
|
25
|
- BHTA based marker selection methods
- are haplotype-based so that they are capable of handling possible
complicated interaction among disease loci;
- are formulated without specific disease model pre-assumed;
- can be applied to large-scale studies with hundreds of candidate
markers;
- are time efficient—algorithm complexity is on the order of nm. (n is the
number of patients, and m is the number of markers).
|
|
26
|
- Variance of HTA and corresponding adjustments to the screening
algorithm;
- Extension to genotype data;
- Extension to mapping of quantitative trait loci;
- Extension to gene expression data analysis.
|