# Selected Research Projects

Click to see the relevent papers.

We develop classification methods as well as their corresponding theory under various contexts.

We study various statistical aspects of network type data including modeling, estimation and algorithms.

We study nonparametric estimation methods with applications.

We investigate the tuning parameter selection problem under various contexts.

We propose different methods for performing variable selection in various contexts.

# Selected Publications

Click on the ‘Project’ button under each paper to view other related papers.

Complete Publication List

(2017). How Many Communities Are There?. Journal of Computational and Graphical Statistics.

(2017). Model Selection for High Dimensional Quadratic Regression via Regularization. Journal of the American Statistical Association.

(2016). Neyman-Pearson Classification under High-Dimensional Settings. Journal of Machine Learning Research.

(2011). Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models. Journal of the American Statistical Association.

# Recent Manuscripts

(2018). Large-Scale Model Selection with Misspecification. arXiv preprint arXiv:1803.07418.

(2018). Partial Distance Correlation Screening for High Dimensional Time Series. arXiv preprint arXiv:1802.09116.

(2018). Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm. arXiv preprint arXiv:1802.02557.

(2017). A note on estimation in a simple probit model under dependency. arXiv preprint arXiv:1712.09694.

# Software

• nproc: given a sample of class 0 and class 1 and a classification method, the package generates the corresponding Neyman-Pearson classifier with a pre-specified type-I error control and Neyman-Pearson Receiver Operating Characteristic (NP-ROC) Bands. Relevant paper.
• SIS: an R package for implementing different Sure Independence Screening methods. Relevant paper.
• RAMP: an R package for fitting the entire solution path for high-dimensional regularized generalized linear models with interactions effects under the strong heredity constraint. Relevant paper.
• FANS: matlab code for implementing the FANS (Feature Augmentation via Nonparametrics and Selection) classification method for high-dimensional data. Relevant paper.
• CLBIC: R code for implementing Composite Likelihood BIC for selecting the number of communities. Relevant paper.
• apple: an R package for calculating the Approximate Path for Penalized Likelihood Estimators for Generalized Linear Models. Relevant paper.
• ROAD: a matlab package designed for the Regularized Optimal Affine Discriminant method for high-dimensional classification. Relevant paper.
• xtab: an R function for generating latex tables from a data matrix.

# Recent Posts

Random thoughts and notes

### slurm manager tips

##Habanaero server Login: ssh fy2158@habanero.rcs.columbia.edu interactive job: srun --pty -t 1-00:00 -A stats /bin/bash ##multicore version srun --pty -t 1-00:00 -c 16 -A stats /bin/bash module load R/3.4.1 batch job submission: sbatch -a 1-5 helloworld.sh check job status squeue -u fy2158 Example .sh file #!/bin/sh #test.sh #SBATCH --account=stats # The account name for the job. #SBATCH --job-name=test # The job name. #SBATCH -c 1 # The number of cpu cores to use.

### Using Python 3 in virtualenv on Mac

Mac has native python installed with version 2.7. However, we sometimes want to use python 3. The following are easy steps to run python 3 along the system default version 2.7.

Step 1: Install brew if not yet done.

/usr/bin/ruby -e "\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"


Step 2: Install pyenv and python 3.6.0.

brew install pyenv
pyenv install 3.6.0
virtualenv -p /Users/yangfeng/.pyenv/versions/3.6.0/bin/python3.6 myenv
. ./myenv/bin/activate && python -V


### My new website made by hugo

Finally finished converting my website using the hugo academic theme!

# Teaching

I have been teaching the following courses at Columbia University.

• GR6102: Statistical Modeling and Data Analysis (II)
• GR6101: Statistical Modeling and Data Analysis (I)
• W2024: Applied Linear Regression Analysis
• W4315: Linear Regression Models
• G8325: Advanced Topics in Statistics (Statistical Analysis for Network Data)
• G8325: Advanced Topics in Statistics (High-dimensional Variable Selection)
• W1211: Introduction to Statistics (with calculus)