Traffic map update

Commenters pointed out that the map to which I linked yesterday actually shows the number of people entering each station, not, as implied by the visual structure of the map, the traffic on the subway lines between the stations. I agree with the commenters that line width doesn’t seem like a good way to show information that is at the station level. Better to use differently-sized circles or something like that.

But this sets up a fun statistical problem: estimate the traffic on the subway lines given the data on the number of people entering each station (along with any other available data, and whatever modeling assumptions are needed to complete the picture). I guess there must be people at the transportation dept. doing this sort of thing, but I wouldn’t be surprised if they’re using deterministic solve-for-x algorithms that could be improved by a more statistical approach.

P.S. Richard Clegg writes in:

As you surmised this is a well-studied problem. Actually in the field of road transport this would be broken into two separate but related problems — the origin-demand matrix estimation problem (given a set of observations what set of demands from origin to destination best explain them) and the related traffic assignment problem (given an origin demand matrix and a network with limited capacity on links how does one assign traffic onto network links).

In particular the traffic assignment problem has some attractive statistical properties if certain assumptions are made.

I replied:

About 25 yrs ago I worked on finite-element methods for thermal models, so I figured the mathematics would be similar. As noted on blog, I suspect that inclusion of some stochastic elements to the problem could improve things as well as extend the range of problems to which these methods could be applied.

And Clegg added the following:

For the origin-demand matrix problem there are a variety of approaches both frequentist and Bayesian — I am far from an expert here (but hope to be more expert soon since I am involved with a grant proposal on the subject which I am hoping will be funded). For the traffic assignment problem there are a number of approaches, “deterministic” and “stochastic” to varying degrees. In the stochastic approach you make certain assumptions about how users disperse across routes of different costs (by assuming an error distribution on the user’s perception of route costs — as it turns out, a Gumbell distribution often produces “nice” answers). There are even the so-called “doubly stochastic” problems where the demand from each origin to each destination is assumed to have a distribution and then users perceives routes imperfectly according to another distribution. If you google “Stochastic user equilibrium” you will find more about the problem than you ever wanted to know.

Sounds good. I also expect there’s some room for improvement using hierarchical modeling.