Backward Haplotype Transmission Association (BHTA) Algorithm
—A Fast Multi-point Screening Method for Complex Traits.
Job interview talk, Spring 2002
(PDF, 910KB)
Some graphs were found to have errors later. For more updated results, check the later BHTA presentation below.
Backward Haplotype Transmission Association (BHTA) Algorithm
—A Fast Multi-point Screening Method for Complex Traits. Invited seminar talk, UAB, Fall 2003
(PowerPoint converted web pages)
Abstract: Family-based association methods have shown great promise in detecting genetic factors for complex traits of common human diseases during the past decade. Yet the successes in identification of disease susceptibility genes have been restricted largely to simple Mendelian cases. This is because complex diseases such as diabetes, asthma and heart disease, usually involve multiple, interacting genetic determinants, which can be distributed widely across our genome. Current genome scans for the susceptibility loci of complex diseases, involving hundreds of markers, usually carry out hundreds to thousands of individual marker-wise tests, which fail to take into account the possible interactions among the disease susceptibility loci, and the significance is also difficult to establish due to the small sample size and the effects from multiple comparisons. As a result, researchers are calling for haplotype based methods, which should be more informative and cost-efficient. At the same time, current haplotype algorithms can only possible deal with small number of markers, say 4-5, due to the computation-intensive nature of these methods. This is far from enough to fulfill the needs in the field.In my talk, I will introduce an original haplotype-based algorithm--Backward Haplotype Transmission Association Algorithm, which we proposed to address some of the current issues in the field of association mapping. It could help researchers identify a small set of "important" markers from the original large set of markers at many candidate regions. The mechanism of this algorithm will be discussed in details and the performance of this method will be demonstrated using a couple of computational examples, using both real medical data and simulated data under a newly found complex disease model.
A Haplotype-Based Multiple Marker Screening Method.
Presented by Shaw-Hwa Lo
Invited talk, Harvard University [date?]
(DVI, 51KB)Abstract: In this talk we will introduce a haplotype-based algorithm -- Backward Haplotype Transmission Association (BHTA) Algorithm, to address some of the current issues in the area of association mapping, dealing with complex traits. The proposed algorithm intends to select a small set of "important" markers from a very large set of markers throughout the genome. The mechanism of the algorithm will be presented and its performance will be demonstrated by a couple of examples, including one real data set with 31 markers for a study on the Tourette syndrome. This talk is based on a joint work with Tian Zheng.
Haplotype Transmission Association (HTA)
—An "Importance" Measure for Selecting Genetic Markers.
Joint Statistical Meeting 2002, New York
(PDF, 126KB)
Abstract: Family-based association methods have shown great promise in detecting genetic factors for human diseases during the past decade. Yet, the successes have been restricted largely to simple Mendelian cases, because complex diseases such as diabetes, asthma etc, usually involve multiple, interacting genetic determinants, which can be distributed widely across our genome. Current genome scans for the susceptibility loci of complex diseases, involving hundreds of markers, usually carry out up to thousands of marker-wise tests, which fail to detect the possible interactions among the disease genes, and the significance is also difficult to establish due to the moderate sample size and the effects from multiple comparisons. In this paper, we propose a haplotype-based statistic, haplotype transmission association (HTA), which can be proven a measure of the amount of linkage/linkage disequilibrium information contributed by each candidate marker. Using screening based on the properties of HTA, a large set of candidate markers can be reduced to a small set of "important" ones. The resulting marker set is more informative and further detailed studies carried out on it will be more cost-efficient.
Greedy Learning from Multiple Gene Profiles
for Selecting Informative Genes in Molecular Cancer Classification.
Presented by Xin Yan
at Joint Statistical Meeting 2003, San Francisco
(PowerPoint converted web pages) Abstract: Gene-expression data derived from microarrys provides a promising tool for molecular cancer classification. However, due to the large dimensions and intrinsic variations of the experiment itself, a feature extraction or a subset of “Informative Genes” is always desirable before classification analysis. In past a few years, a lot of marginal statistical methods have been applied to gene expression data despite the fact that gene interactions are non-negligible. Here we present a new, multi-gene method where we tried to make use of information given by the data as much as possible. A gene profile association score (GPAS), together with a screening process, was used to pull out important genes. Using genes picked up by GPAS, empirically, we found out a great improvement in accuracy of classification results for independent cancer samples. In addition, while agreed with marginal methods, such as t-test, gene voting scheme and correlation with the outcomes, GPAS also showed preferences on a few large P-valued genes, which might be interesting for biological explanations.
Detecting genetic association in case-control studies using BGTA method.
Presented by Hui Wang
at Joint Statistical Meeting 2003, San Francisco
(PowerPoint converted web pages)
Information-driven marker selection for large scale genomic studies on complex traits.
ENAR, Spring 2004
(The presentation file can not be located) Abstract: The mapping of complex traits is one of the most important and central areas of human genetics today. Many current genomic studies in this field involve large number of markers, covering almost all regions of the genome. Before any valid test or detailed study can be performed on the data, a screening procedure must be taken, where markers contributing noise and little information to the data are identified and excluded. Methods based on tests of each marker’s marginal association with the trait have been used in most current efforts. This strategy of marker-by-marker screening has the attractive advantage of easy implementation. However, valuable information regarding functional interaction among the genomic loci (represented by markers) is ignored. This is especially important when dealing with multifactorial traits due to many epistatic genes, each of modest effect. Genes that contribute substantially to an interactive control of the trait under study might be missed by marker-by-marker methods due to their weak marginal effects. In this talk, we present a selection procedure based on an information-driven screening algorithm, which can be applied to a variety of data types. The marker selection procedure starts with random subsets of the large marker set (thousands of loci). For each subset, the screening algorithm deletes markers that contribute only noise to this ‘local’ set and drives the total set transmission disequilibrium information to a ‘maximum’, and return the markers retained as the ‘local’ important set. Markers with significant high return frequencies are selected for further detailed studies. The properties and performance of this selection procedure will be demonstrated using simulated complex disease models. An application to real medical data set will also be presented.
A New Approach Toward Complex Traits.
International Conference on Analysis of Genomic Data.
(PowerPoint converted web pages)
Invited talk presented by Shaw-Hwa Lo
Boston, May 10-11, 2004
“How many people do you know in prison?”: Using overdispersion in count data to estimate social structure in networks.
Invited seminar talk.
Department of Statistics, Columbia University
(PDF presentation, 1,567KB )
(Do not try to print this file)
Nov 29, 2004 Abstract: In recent years, researchers have been studying a wide range of social networks using a variety of data sources, including public records, studies of limited populations, and snowball or network samples. Killworth et al. (1998) and McCarty et al. (2001) developed and evaluated a method using simple random sample survey data to estimate the sizes of subpopulations. Here we show how, using a multilevel overdispersed Poisson regression model, these data can also be used to estimate aspects of social structure in the population. We apply the method to the Killworth et al. data and find that names (e.g., "Michael,'' "James,'' "Jennifer'') appear to be approximately evenly distributed throughout the acquaintanceship network, whereas some other subpopulations (e.g., "American Indians,'' ``gun dealers'') show high levels of non-uniformity. Our work goes beyond previous research in this field by using variation as well as average responses to learn about the social network.
Information-driven marker selection for large-scale genomic studies.
Department of Biostatistics, Yale University. April 2005.
A Nonparametric Multipoint Screening Method for QTL Mapping.
Invited talk
International Conference on Statistics
in Honour of Professor Kai-Tai Fang's 65th Birthday
Hong Kong, June 2005. (PDF presentation)
Abstract: It is believed that most human disorders are polygenic, which means the variation in the quantitative traits of such orders can not be attributed to a single gene. Rather, multiple genes, with complicated interactions, may contribute to the spectrum of variation of such traits. To study the genetics of such traits, one should inspect multiple loci simultaneously. In this talk, we present an efficient and robust statistical screening algorithm for the mapping of quantitative trait loci (QTL). The algorithm is based on a measure of association between the trait and the genotypes on multiple marker loci under investigation. Through the use of multi-loci genotypes, one can take into consideration both the marginal and joint association information with respect to the trait. The algorithm evaluates the genes in an iterative fashion and screens out those marker loci that do not contain much information w.r.t. the trait. We will show the advantages of this method through theoretical justification and simulation studies. Also given in JSM 2005
Information-driven screening strategies for complex traits
University seminar on Genetic Epidemiology, Columbia University.
November 2005. (PDF presentation)
Evaluating the repeatability of two studies of a large number of objects: modified Kendall rank-order association test.
Invited Talk
ICSA 2006 applied statistics symposium
June 2006. (see below for an updated set of slides)
Design and Analysis of "How Many X" surveys.
Invited Talk
Seattle, JSM 2006. (PDF presentation)
Abstract
We consider issues in the design and analysis issues of "How many X's do you know?" surveys as a means of studying social networks. We first discuss the analysis of such data using multilevel regression so that the properties of a network can be studied in relation to the characteristics of the individuals in the network and subgroups of interest in the population. In designing new surveys of this type, we must consider network effects and response effects, including the depth of social connections, the effects of imperfect recall, the choice of groups of known size to use for normalization, and methods for learning about small and moderately-sized subpopulations.
Evaluating the repeatability of two studies of a large number of objects: modified Kendall rank-order association test.
Department of Statistics, Harvard University, 2006
[PDF presentation]
Abstract: Assessing the reproducibility of research studies can be difficult, especially when the number of objects involved is large. In such situations, there is only a small set of those objects that are truly relevant to the scientific questions. For example, in microarray analysis, despite data sets containing expression levels for tens of thousands of genes, it is expected that only a small fraction of these genes are regulated by the treatment in a single experiment. In such cases, it is acknowledged that reproducibility of two studies is high only for objects with real signals. One way to assess reproducibility is to measure the associations between the two sets of data. The traditional association methods suffered from the lack of adequate power to detect the real signals, however. We present in this talk the use of a modified Kendall rank-order test of association, based on truncated ranks. Simulation results show that the proposed procedure increases the capacity to detect the real signals considerably. Applications to gene expression analysis and genetic epidemiology will be discussed.
Studying co-regulation and inter-regulation of genes via eQTL mapping. Presented at HangZhou Conference, Tsinghua, JSM 2007, IEEE BIBE07, Biostat Upenn, Purdue. [PDF presentation]
Abstract: eQTL mapping is to find loci on human genome that have demonstrated linkage to or association with the expression of a gene in microarray hybridization experiments. Such identified loci may contain important information on the regulatory factors of the given gene under study. In this talk, I will discuss co-regulation and inter-regulation patterns identified via similar strategies.
Constructing gene association networks for complex human disorders using BGTA.
ICSA applied statistics symposium 2008 [PDF presentation]
How many people do you know: efficiently estimating personal network size. Department of Statistics, Rutgers University 2009 [PDF presentation]
Posters
Zheng T, Wang S, Cong L, Ding Y, Ionita-Laza I, Lo SH (2006) Joint study of genetic regulators for Expression Traits Related to Breast Cancer. GAW 15 contribution. [PDF]
Cong L, Zheng T, Ionita-Laza I, Ding Y, Lo SH (2006) A Comprehensive Analysis of Association between Several Candidate Genes and Rheumatoid Arthritis in Samples from North American Rheumatoid Arthritis Consortium (NARAC). GAW 15 contribution. [PDF]
Ding Y, Zheng T, Ionita-Laza I, Cong L, Lo SH (2006) Constructing Association Network in Two-Stage Analysis: An application of Backward Genotype-Trait Association (BGTA) Algorithm to NARAC Data. GAW 15 contribution. [PDF]