« Anova for economists | Main | Second edition of Intro Stats »

February 26, 2006

Interesting Cases, Support Vectors, and Ape Art

What makes an observation interesting? Through the example of devious quizzes that ask you to distinguish ape art from modern art, we will investigate the fundamental idea of support vector machines: a SVM is a classifier specified in terms of weights assigned to interesting observations. This is different from most regression models in statistics, which are specified in terms of weights assigned to variables or interactions.

Building predictive models from data is a frequent pursuit of statisticians and even more frequent for machine learners and data miners. The main property of a predictive model is that we do not care much about what the model is like: we primarily care about the ability to predict the desired property of a case (instance). On the other hand, for most statistical applications of regression, the actual structure of the model is of primary interest.

Predictive models are not just an object of rigorous analysis. We have predictive models in our heads. For example, we may believe that we can distinguish good art from bad art. Mikhail Simkin has been entertaining the public with devious quizzes. A recent example is An artist or an ape?, where one has to classify a picture based on whether it was painted by an abstractionist or by an ape. Another quiz is Sokal & Bricmont or Lenin?, where you have to decide if a quote is from Fashionable Nonsense or from Lenin. There are also tests that check your ability to discern famous painters, authors and musicians from less famous ones. The primary message of the quizzes is that the boundaries between categories are often vague. If you are interested in how well test takers perform, Mikhail did an analysis of the True art or fake? quiz.

These quizzes bring us to another notion, which has rocked the machine learning community over the past decade: the notion of a support vector machine. The most visible originator of the methodology is Vladimir Vapnik. There are also close links to the methodology of Gaussian processes, and the work of Grace Wahba.

A SVM is nothing but a hyperplane in some space defined by the features. The hyperplane separates the cases of one class (ape pictures) from the cases of another class (painter pictures). Since there can be many hyperplanes that do separate one from the other, the optimal one is thought to be equidistant from the best ape picture and the worst painter picture. Using the `kernel trick' we can conjure another space where individual dimensions may correspond to interactions of features, polynomial terms, or even individual instances.

svm.png

In the above image, we can see the separating green hyperplane halfway between the blue and red points. Some of the points are marked with yellow dots: those points are sufficient to define the position of the hyperplane. Also, they are the ones that constrain the position of the hyperplane. And this is the key idea of support vector machines: the model is not parameterized in terms of the weights assigned to features but in terms of weights associated with each case.

The heavily-weighted cases, the support vectors, are also interesting to look at, because of pure human curiosity. An objective of experimental design would be to do experiments that would result in new support vectors: otherwise the experiments would not be interesting - this flavor of experimental design is referred to as `active learning' in the machine learning community. The support vectors are the cases that seem the trickiest to predict. My guess is that Mikhail intentionally selects such cases in his quiz as to make it fun.

Posted by Aleks at February 26, 2006 9:22 AM

RSS feed for this entry.

Trackback Pings

TrackBack URL for this entry:
http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi/347

Comments

The property of learning a function parametrized by weights on examples is a general property of optimizing functionals which are the sum of an empirical loss functional and the norm in a Reproducing Kernel Hilbert Space. The SVM is one example, but there are many others. The business about the examples closest to the boundary, and actually the whole geometric conception of SVMs in terms of distance to the hyperplane ("margin"), is a bit of a red-herring. In practice, SVMs are nearly always used in a context where "errors" are allowed but penalized, and in this case, all points which are errors are also support vectors. In this framework, the nice geometric notion of the support vectors being objects closest to the boundary is lost --- there's still a boundary, but there can be errors which are arbitrarily close to it, and the margin is not well-defined. I prefer to think of the SVM as arising from a particular choice of loss function (the hinge loss) in a functional optimization problem.

Loved the ape quiz.

Posted by: rif at February 27, 2006 1:59 PM.

The complexity of the "statistical learning" approach is staggering and comparable to the scope of the statistical school. But a statistician would use the likelihood function instead of an empirical loss functional, the prior instead of a regularizer, probability of having generated the data instead of hinge loss. These tools are analogous, and differences may be irrelevant. RKHS is very nice, but it takes one or two lectures to explain it properly.

However, expressing the model in terms of weights assigned to cases is something that one doesn't see too often in statistics, and would be interesting to see more often.

Posted by: Aleks [TypeKey Profile Page] at February 28, 2006 9:18 AM.

I agree completely with you. Have you considered presenting something like gaussian process regression? It's fairly simple, it's the same equations as regularized least squares "under the covers", but it's a nice bayesian interpretation, it gives you a confidence interval on your outputs, AND the function you learn is expressed in terms of weights assigned to the observations, just like an SVM.

Posted by: rif at February 28, 2006 11:59 AM.

I'm going to ignore the serious statistical isues here and just focus on the fun stuff. I'm pleased to say that I got 100% correct on the Artist or Ape quiz, and got 83% correct on the True Art or Fake quiz. I'm ashamed to admit that I got exactly 50% on the Sokal and Bricmont or Lenin quiz.


But as far as telling us anything, two of these quizzes really don't.


(1) Artist or Ape is uninformative because I'm pretty sure the author of the quiz chose the Artist pictures that look _most_ like an ape might have drawn them, and chose the Ape pictures to look _most_ like an artist might have drawn them. This doesn't really tell us anything about whether modern artists draw like apes draw.


(2) True Art or Fake is uninformative because the "Fake" art was generated by someone who had seen the True Art and was deliberately trying to compose something that looked similar. The "skill" in drawing a Mondrian isn't in drawing the lines and coloring in the squares, it's conceiving of the idea of making a pattern like that out of lines and squares. I happen to not like Mondrian, but I give the guy some credit: until he came along, nobody was making (or at least selling or showing) art like that. Sure, NOW you can copy him and make something that looks pretty similar, but you're still copying him. If I stand exactly where Ansel Adams stood to take his famous Half Dome picture, and expose the film at the same time of day and develop and print it the same way, I'll have something that is nearly indistinguishable from an Ansel Adams picture, but I will not have demonstrated that Adams was as talentless as I am.


Sokal and Bricmont or Lenin seems more fair, though. It suffers somewhat from the same shortcoming as (1) above --- these are the S-B quotes that are most like Lenin quotes and vice versa --- but given that S-B is just one book about one subject, the fact that there are _any_ non-trivial S-B quotes that sound just like non-trivial Lenin quotes is already noteworthy.


If I were to try to put all of these thoughts into a scholarly statistical context, I would say that there is a strong selection bias in the quizzes. Using these results to say that artists paint like apes, or that modern artists are no more talented than non-artists, is like comparing the temperate of the warmest winter days and the coolest summer days and saying that winter temperatures are about the same as summer.


All of that said, the quizzes are a ton o' fun and I'll be looking for more.

Posted by: pnprice at February 28, 2006 9:07 PM.

The bias asserted by rif in the previous comment does not exist.

Sokal & Bricmont's views on philosophy of science are identical to those of Lenin.

Not all abstract paintings look like a work of an ape: many contain geometric figures, which apes are unable to draw. However, the paintings within the branch of abstract art, called "Abstract Expressionism", all look like they were produced by an ape.

Rif's comments about imitating Mondrian are irrelevant as I did not imitate him (or any other artist). Instead I presented the paintings creared using The new method in Abstract Art, invented by myself.

Regarding warmest winter and coolest summer days: in some places indeed there is no difference between the seasons of the year. In San Francisco you don't swim in summer and don't ski in winter. One can say that there is no winter or summer in San Francisco (just like in the San Francisco Museum of Modern Art there is no art).

Posted by: Mikhail Simkin at April 2, 2006 10:46 PM.

Pretty interesting stuff, the use of dots inspired me to make a painting

Posted by: Abstract at May 30, 2008 2:36 AM.

Post a comment




Remember Me?

(you may use HTML tags for style)