Nations of Europe: adding priors to multidimensional scaling

| 13 Comments

Yesterday we were looking at the musical taste proximity between European countries. But what about the proximity between European nations in terms of the genes? The field of population genetics investigates this problem. I have taken some Y chromosome data, and computed the distance between two nations based on their genetic distance.

The result, obtained with MDS is as follows:
from_geo.png

I've color-coded different language groups. We can see that North Africans are quite different, and that within Europe, there is a clear gradient from the East to West, with several clusters. The islands of Gotland and Sardinia are composed of a diverse mix from different populations.

The interesting point, however, is that I've initialized the positions of points to the geographic positions, which can roughly be interpreted as a prior. This is a bit unusual: usually the points are randomly initialized, or initialized with some sort of a linear dimension reduction technique, such as with Torgerson's procedure.


Multidimensional scaling is that old way of embedding a set of points described in terms of their similarities into a lower-dimensional space so that the Euclidean distances in this space reflect the similarities. While there are closed-form solutions to the problem when the transformation is linear, usually based on the SVD, one can achieve lower stress by allowing nonlinear transformations such as SMACOF.

SMACOF is a deterministic hill-climb, but it depends on the starting point. The starting point can either be considered to be a nuisance, but it can also be considered to be the equivalent of a regularization term or a prior. Csiszár, for example, pointed out that iterative scaling finds the point from the set of solutions that satisfy constraints that is closest to the starting point. While this doesn't exactly fit the regularization term or prior setting, it is nevertheless a very appropriate way of stabilizing MDS: with so many co-dependent parameters, a MDS posterior distribution seems incomprehensible (although Fig 3 in Jackman's Multidimensional Analysis of Roll Call Data via Bayesian Simulation does provide an interesting visualization, but with the use of informative priors). In this sense, SMACOF and iterative scaling can be seen as the updating of the prior.

This is the original placement:
geo.png

Two dimensions are insufficient to show the complexity of the data. Here you can see the stresses (green means too far, red is too close):
from_geo_stress.png

Finally, this is what comes out from initializing the points randomly:
from_random.png
Definitely not as easy to understand as the geographic original - yet it has better "stress" than the prior-based one. The sensible geographic prior nicely helps orient the result.

In summary, the benefit of priors goes beyond the Bayesian methodology.

13 Comments

This is interesting!

It is not surprining to se norway and sweden very close, but that germany and dutch is closer to norwegian thin is danish, is surprising!
Even more surprising is that iceland is closer to danish than it is to norwegian, given that iceland was formed by immigrants from Norway. Or did there come immigrants from other places too?

Is this overinterpretation? How representativ is the samples from each country? (n´s) JHow are the samples taken? Any way to show sampling uncertainty on the plot? (maybee bootstrap, other ideas). With bootstrap I would show each country as a circle an d not as a point, or something.

It seems to me that a Bayesian prior and initial values for MDS are in principle two very different things. The initial values for MDS do not change the global optimal solution; the merely change which local solution you may settle on. A prior, however, actually changes the "optimal" solution (whatever posterior quantities you want).

If one extends the logic of initial values as priors, one could say that initial values of an MCMC chain are priors, and we could ditch the idea of convergence altogether. This seems unsatisfying, though.

To me they look pretty similar, if you rotate the second plot about 90 degrees anti-clockwise. I'm curious why you think that the one using geographic starting points is better: is it just that it matches geographic data better?

Bob

Very nice result, but aren't you being a bit unfair to the randomly-initialised example, in that a simple rotation through 90° anti-clockwise would make it much more comprehensible? The Irish, North African, and Finnish points would be in positions much more in keeping with geography.


Are you able to apply a rotation and reflection at the end of the process that will minimise the mismatch between the least-stressed result and the map of Europe?


Also, could you supply a faint grey outline map of Europe that is itself stretched and compressed to most approximately map onto the least-stress outcome? That may slightly aid comprehensibility without compromising the stress of the Bayesian result: you are stressing the map instead.

very nice, is there a set of equations / piece of code to take a closer look at this? thanks.

Kjetil, it's a tiny bit of evidence, the sample sizes are about 100 per country. Perhaps it was a group of Danes that went first to Norway and then to Iceland. Anyway, I know too little about this.

Richard, MDS has so many parameters that looking for a globally optimal solution without heavy assumptions is quite difficult. There are many ways of doing it, a formal prior distribution you suggest is quite likely superior, but would be harder to implement. One could also include distance from the geographic positions as an additional type of stress. On the other hand, the MCMC initialization restrains the result considerably less than the gradient descent initialization. I would maintain that using geography as the basis for the prior is a good idea.

Derek&Bob, I find initializations easier and cleaner to deal with than some sort of a post-rotation. Moreover, I will post a pretty animation that roughly shows the interpolation from the prior to the eventual solution. I will also post the code, so that the visualization can be improved by others.

Patoche, I will provide that in a few days.

Hi, Aleks. You've piqued my interest. Do you have the musical taste analysis online somewhere?

Bill, I don't have good data, but it would be a fun experiment.

It would be interesting to have more ideas for inference in MDS.

(1) I know it's not the point of the exercise, but I would question some of your color choices. The Finno-Ugric languages are a group at the same level as the Indo-European languages are. By making them all pink while still using different colors for the different IE subgroups, you're implying that Finnish and Hungarian are more closely related than, say, Irish and Greek, which in fact they are not.



I would also question grouping the Baltic languages together with the Slavic ones. And finally, why the heck is Ossetian pink? Ultimately it's an Iranian language, albeit with numerous borrowings from its neighbors (but none of those neighbors are Finno-Ugric).



(2) How the heck did Ossetian end up so close to Italian? That's just weird. Is there some interesting connection I don't know, or is it just a random artifact of the small sample?

About Richard comment. Indeed priors and initial values are different things. Setting a prior is a part of the Bayesian method, and initializing is a technical solution for non convex problems.
However, when initialization is required, I often use the prior max, because I hope that the MAP is close to it, and because my local optimizer is indeed "local".

Could we go further and study more formally a link between prior and init ? The question is : Imagine we have a global optimizer and are able to find the real MAP x1 with a prior p1. With a local optimization we'll find a local optimum x2!=x1. What would be the equivalent "prior" p2 leading to x2 as a real MAP, with the same likelihood function ?

It seem to me that p2(x)=p1(x).f_{N(init,optimizer)}(x) with f a characteristic function of a set of neighbors of the initialization point - with a definition of neighborhood defined by the "locality" properties of the local optimizer. (there are several choice for p2, eg a dirac on x2 !)

Is it possible to study a particular local optimizer for having information about the neighborhood? Would it be useful?
Eg in some pb, restricting the space of search can be nice, if the posterior is symmetric due to symmetries of the model (like mixture). In other pb, it lead to sub optimal value.

Pierre

ubs, I merely assigned colors to the categories in the paper that the data came from. Ideally, the perceptual color differences should reflect some sort of a meaningful metric of language similarity. I was also surprised about 47 Ossetians being close to 99 Italians, and just made sure that this appears in the data and isn't an artifact of the MDS.

Pierre, you raise very interesting questions! Many interesting models are not convex, and even for convex ones, the initialization is key to efficient fitting or MCMC sampling.

If you want to visualize how similar the two solutions are, it makes sense to apply a Procrustes rotation (http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/procrustes.html)

Leave a comment

Subscribe to Entry

Email:

Recent Comments

  • Erik: If you want to visualize how similar the two solutions read more
  • Aleks: ubs, I merely assigned colors to the categories in the read more
  • PierreD: About Richard comment. Indeed priors and initial values are different read more
  • ubs: (1) I know it's not the point of the exercise, read more
  • kjetil Halvorsen: It would be interesting to have more ideas for inference read more
  • Aleks: Bill, I don't have good data, but it would be read more
  • Bill Harris: Hi, Aleks. You've piqued my interest. Do you have the read more
  • Aleks: Kjetil, it's a tiny bit of evidence, the sample sizes read more
  • patoche: very nice, is there a set of equations / piece read more
  • derek: Very nice result, but aren't you being a bit unfair read more
  • Bob O'H: To me they look pretty similar, if you rotate the read more
  • Richard Morey: It seems to me that a Bayesian prior and initial read more
  • kjetil Halvorsen: This is interesting! It is not surprining to se norway read more