Thursday, July 20, 2006

Fitting constrained least square regression in R

Prepared the following for an email inquiry.

Yes. There is something in R for such tasks. This requires a special package called mgcv, which should be installed in standard R configuration. See http://sekhon.berkeley.edu/library/mgcv/html/pcls.html. Especially, check out the option Ain and bin.

Here is an example I wrote:
(copy-paste these lines into R console)

## load the special package first.
library(mgcv)
options(digits=3)

## generate some fake data
x.1<-rnorm(100, 0, 1)
x.2<-rnorm(100, 0, 1)
x.3<-rnorm(100, 0, 1)
x.4<-rnorm(100, 0, 1)
y<-1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01)
## make your own design matrix with one column corresponding to the intercept
x.mat<-cbind(rep(1, length(y)), x.1, x.2, x.3, x.4)

## this is the regular least-square regression
ls.print(lsfit(x.mat, y, intercept=FALSE))
## since you already have an X column for intercept, so no need for lsfit to assume another intercept term.

## the penalized constrained least square regression

M<-list(y=y,
w=rep(1, length(y)),
X=x.mat,
C=matrix(0,0,0),
p=rep(1, ncol(x.mat)),
off=array(0,0),
S=list(),
sp=array(0,0),
Ain=diag(ncol(x.mat)),
bin=rep(0, ncol(x.mat)) )

pcls(M)

Monday, July 03, 2006

News Headline starts with "Statisticians"

This piece of news caught my attention simply because it has one of my favorite words, "statistician," in its headline.

This news article outlines a research conducted by a statistician in Oxford University on "how interconnected the human life is".

See the following argument:

It's simple math. Every person has two parents, four grandparents and eight great-grandparents. Keep doubling back through the generations — 16, 32, 64, 128 — and within a few hundred years you have thousands of ancestors.

It's nothing more than exponential growth combined with the facts of life. By the 15th century you've got a million ancestors. By the 13th you've got a billion. Sometime around the 9th century — just 40 generations ago — the number tops a trillion.

But wait. How could anybody — much less everybody — alive today have had a trillion ancestors living during the 9th century?

The answer is, they didn't. Imagine there was a man living 1,200 years ago whose daughter was your mother's 36th great-grandmother, and whose son was your father's 36th great-grandfather. That would put him on two branches on your family tree, one on your mother's side and one on your father's. In fact, most of the people who lived 1,200 years ago appear not twice, but thousands of times on our family trees, because there were only 200 million people on Earth back then.

Simple division — a trillion divided by 200 million — shows that on average each person back then would appear 5,000 times on the family tree of every single individual living today.

But things are never average. Many of the people who were alive in the year 800 never had children; they don't appear on anybody's family tree. Meanwhile, more prolific members of society would show up many more than 5,000 times on a lot of people's trees.


Keep going back in time, and there are fewer and fewer people available to put on more and more branches of the 6.5 billion family trees of people living today. It is mathematically inevitable that at some point, there will be a person who appears at least once on everybody's tree.