Learning maximum likelihood

Posted on April 7, 2009 9:45 PM by Andrew

Antonio Ramos writes:

In most social sciences, maximum likelihood is taught before topics such as multilevel modeling and or Bayesian statistics more generally. However, in the preface of your multilevel book you say that basic statistics and regression classes are sufficient for one to study your book. So may question is: should we learn maximum likelihood first or it is just historical convention, without much pedagogical basis?

My reply: Maximum likelihood is fine; we discuss it in chapter 18, I believe it is, where we discuss Bayesian methods as a generalization of maximum likelihood. All of this is important to learn, but I think you can get started in serious applied statistics (which is what our book is about) without necessarily already knowing it.

3 thoughts on “Learning maximum likelihood”

Antonio Ramos on April 7, 2009 10:53 PM at 10:53 pm said:

Great advice; unusual; totally focused on the applied side of statistics.
mjm on April 8, 2009 1:14 AM at 1:14 am said:

I think it helps that I learned how to write down a (log) likelihood function first in ML class, before more tedious algebra induced by (adding) multiplying a (log) prior…
Keith O'Rourke on April 8, 2009 4:17 AM at 4:17 am said:

mjm: I would like to expand on your comment

The likelihood function (as a function) and its role in extracting/summarizing sample information (formalized by the assumed data model), its contrast to other sources of sample information and prior information (prior/model conflict) and its combination (i.e. no pooling, partial pooling, complete pooling) and [some explicit consideration and plotting of this] are, I believe, really worth knowing about for perceptively applying statistics.

But thats not whats taught/seemingly picked up through the topic of "maximum" likelihood. Rather it seems to give a sense of a thoughtless route to get a good estimate – the "mle" and a SE for it. Even when the (log) likelihood function is very non-quadratic and means of reducing from p-dimensions in the likelihood function to a 1-dimensional estimate is ussually taken as estimated likelihood (rather than integrated or profile likelihood) in spite of the Neyman-Scott lessons from long ago.

But admittedly the literature has not emphasized this and some did not think it is important once you have the "BLUE", "MINQUE" "3rd order asymptotics CIs" or posteriors…

Keith

Comments are closed.