Selected Research Projects

Click to see the relevent papers.


We develop classification methods as well as their corresponding theory under various contexts.


We study various statistical aspects of network type data including modeling, estimation and algorithms.

Nonparametric statistics

We study nonparametric estimation methods with applications.

Tuning parameter selection

We investigate the tuning parameter selection problem under various contexts.

Variable selection

We propose different methods for performing variable selection in various contexts.

Selected Publications

Click on the ‘Project’ button under each paper to view other related papers.

Complete Publication List

(2018). A Kronecker Product Model for Repeated Pattern Detection on 2D Urban Images. IEEE Transactions on Pattern Analysis and Machine Intelligence.

PDF Journal Link

(2017). How Many Communities Are There?. Journal of Computational and Graphical Statistics.

PDF Code Project Journal Link

(2016). Neyman-Pearson Classification under High-Dimensional Settings. Journal of Machine Learning Research.

PDF Project Journal Link


  • nproc: given a sample of class 0 and class 1 and a classification method, the package generates the corresponding Neyman-Pearson classifier with a pre-specified type-I error control and Neyman-Pearson Receiver Operating Characteristic (NP-ROC) Bands. Relevant paper.
  • SIS: an R package for implementing different Sure Independence Screening methods. Relevant paper.
  • RAMP: an R package for fitting the entire solution path for high-dimensional regularized generalized linear models with interactions effects under the strong heredity constraint. Relevant paper.
  • FANS: matlab code for implementing the FANS (Feature Augmentation via Nonparametrics and Selection) classification method for high-dimensional data. Relevant paper.
  • CLBIC: R code for implementing Composite Likelihood BIC for selecting the number of communities. Relevant paper.
  • apple: an R package for calculating the Approximate Path for Penalized Likelihood Estimators for Generalized Linear Models. Relevant paper.
  • ROAD: a matlab package designed for the Regularized Optimal Affine Discriminant method for high-dimensional classification. Relevant paper.
  • xtab: an R function for generating latex tables from a data matrix.

Recent Posts

Random thoughts and notes

##Habanaero server Login: ssh interactive job: srun --pty -t 1-00:00 -A stats /bin/bash ##multicore version srun --pty -t 1-00:00 -c 16 -A stats /bin/bash module load R/3.4.1 batch job submission: sbatch -a 1-5 check job status squeue -u fy2158 Example .sh file #!/bin/sh #SBATCH --account=stats # The account name for the job. #SBATCH --job-name=test # The job name. #SBATCH -c 1 # The number of cpu cores to use.


Mac has native python installed with version 2.7. However, we sometimes want to use python 3. The following are easy steps to run python 3 along the system default version 2.7.

Step 1: Install brew if not yet done.

/usr/bin/ruby -e "$(curl -fsSL"

Step 2: Install pyenv and python 3.6.0.

brew install pyenv
pyenv install 3.6.0
virtualenv -p /Users/yangfeng/.pyenv/versions/3.6.0/bin/python3.6 myenv
. ./myenv/bin/activate && python -V


Finally finished converting my website using the hugo academic theme!



I am teaching GR5205/GU4205 Linear Regression Models in Fall 2018.

I have been teaching the following courses at Columbia University.

  • GR6102: Statistical Modeling and Data Analysis (II)
  • GR6101: Statistical Modeling and Data Analysis (I)
  • W2024: Applied Linear Regression Analysis
  • W4315: Linear Regression Models
  • G8325: Advanced Topics in Statistics (Statistical Analysis for Network Data)
  • G8325: Advanced Topics in Statistics (High-dimensional Variable Selection)
  • W1211: Introduction to Statistics (with calculus)