Selected Research Projects

Click to see the relevent papers.


We develop classification methods as well as their corresponding theory under various contexts.


We study various statistical aspects of network type data including modeling, estimation and algorithms.

Nonparametric statistics

We study nonparametric estimation methods with applications.

Tuning parameter selection

We investigate the tuning parameter selection problem under various contexts.

Variable selection

We propose different methods for performing variable selection in various contexts.

Selected Publications

Click on the ‘Project’ button under each paper to view other related papers.

Complete Publication List

(2018). A Kronecker Product Model for Repeated Pattern Detection on 2D Urban Images. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Journal Link

(2017). How Many Communities Are There?. Journal of Computational and Graphical Statistics.

Code Project Journal Link

(2016). Neyman-Pearson Classification under High-Dimensional Settings. Journal of Machine Learning Research.

Project Journal Link


  • nproc: given a sample of class 0 and class 1 and a classification method, the package generates the corresponding Neyman-Pearson classifier with a pre-specified type-I error control and Neyman-Pearson Receiver Operating Characteristic (NP-ROC) Bands. Relevant paper.
  • SIS: an R package for implementing different Sure Independence Screening methods. Relevant paper.
  • RAMP: an R package for fitting the entire solution path for high-dimensional regularized generalized linear models with interactions effects under the strong heredity constraint. Relevant paper.
  • FANS: matlab code for implementing the FANS (Feature Augmentation via Nonparametrics and Selection) classification method for high-dimensional data. Relevant paper.
  • CLBIC: R code for implementing Composite Likelihood BIC for selecting the number of communities. Relevant paper.
  • apple: an R package for calculating the Approximate Path for Penalized Likelihood Estimators for Generalized Linear Models. Relevant paper.
  • ROAD: a matlab package designed for the Regularized Optimal Affine Discriminant method for high-dimensional classification. Relevant paper.
  • xtab: an R function for generating latex tables from a data matrix.


I am teaching GR5205/GU4205 Linear Regression Models in Fall 2018.

I have been teaching the following courses at Columbia University.

  • GR6102: Statistical Modeling and Data Analysis (II)
  • GR6101: Statistical Modeling and Data Analysis (I)
  • W2024: Applied Linear Regression Analysis
  • W4315: Linear Regression Models
  • G8325: Advanced Topics in Statistics (Statistical Analysis for Network Data)
  • G8325: Advanced Topics in Statistics (High-dimensional Variable Selection)
  • W1211: Introduction to Statistics (with calculus)

Recent Posts

Random thoughts and notes

More Posts

I tried to use jemdoc after updating my Mac OS and it does not work anymore. I tried to reinstall it, however, the installation instructions provided on jemdoc’s website do not work due to the new file management system on mac. In addition, the original jemdoc only works with Python 2.7. If you are using Python 3, it won’t work with jemdoc. Following is a simple fix. Download Jemdoc Mathjax Unzip the file.


This semester I am teaching Linear Regression Model for a class of around 150 students. As the classroom is very large, it is impossible to write on the blackboard and I need to rely on slides. However, sometimes, I would like to show the derivations. I found the following works well with me and would like to share with everyone. Install Zoom on the iPad Pro, and on the computer which is connected to the projector.


##Habanaero server Login: ssh interactive job: srun --pty -t 1-00:00 -A stats /bin/bash ##multicore version srun --pty -t 1-00:00 -c 16 -A stats /bin/bash module load R/3.4.1 batch job submission: sbatch -a 1-5 check job status squeue -u fy2158 Example .sh file #!/bin/sh #SBATCH --account=stats # The account name for the job. #SBATCH --job-name=test # The job name. #SBATCH -c 1 # The number of cpu cores to use.



  • 212 851 2139
  • 1255 Amsterdam Avenue, Room 1012, Department Of Statistics, New York, NY 10027-5997
  • Wednesday 1pm to 1:30pm or email for appointment