Any suggestions on how to learn applied statistics?

David VandenBos writes:

I stumbled upon your blog a few weeks ago . . . However, a good amount of your technical articles go over my head because of my lack of statistics education/training/experience. Do you have any basic reading suggestions for learning applied statistics? My organization captures tons of info and safely tucks it away into databases, but I’m really interested in learning how to get it out and make use of it.

Does anybody have any suggestions? I like my book with Jennifer but maybe there’s something more basic to start with? There’s also this online book on statistical graphics by Rafe Donahue which is actually fun to read.

P.S. I don’t think any of the usual intro stat books would be good here. I think they focus too much on conventional topics and not enough on applied statistics. Not really the fault of these books: they’re designed for the undergraduate curriculum, not for practitioners.

16 thoughts on “Any suggestions on how to learn applied statistics?

  1. I have to look back too far for my specific suggestions to be relevant today, but I can see that my choices have something in common: they all use a LOT of examples, and the examples are done in the type of excruciating detail that enables you to replicate them, even if you are an idiot.

    Dog eared books on my shelf include

    *Leslie Kish, Survey Sampling (1964?), although that may be cause I took sampling from Kish, Groves, Frankel and Kalton, not quite realizing what a treat that was.

    *Nick Thomopoulos, Forecasting (1980's). Basic forecasting techniques and completely worked out examples. I've never met Nick, but I later had one of Nick's students as a boss and currently another works for me.

    *Murray Siegel, Statistics, Schaum's outline series, about 1957. This is a guilty pleasure, since it's basically a crib sheet for intro stat books. But there are some good problems in there, and I can often adapt these to provide advice to others. The nice part is that the worked out answers are there, not just the answers.

    Julian Simon's book on Resampling Statistics (1973 and later) shows how to perform the standard analyses using resampling techniques, and there's an unpublished paper of his taking Mosteller's 50 probability problems and showing how to solve those by resampling. Simple resampling techniques are often a good way to solve applied problems AND to convince yourself that the formulaic answer you've derived to an unusual problem is at least in the ballpark and doesn't contain some horrible error.

    I've also found statistical software manuals very helpful, notably the first SPSS manual and Lee Wilkinson's SYSTAT manuals. In theory, you are supposed to know the technique before you try to apply it, but I think that's backwards. Read how to use the technique, and then learn some theory.

    A good manual will have some good examples and some good data sets. They will also have some classic references at the end of the chapter, which will use terminology close to the terminology in the software. Plus, it's fun to play around with the data sets. Create some missing data holes. Impute to fill them in. Corrupt the data by multiplying some observations by 100. Log the data. See how that changes things.

    There's a general theme here: applied stats mean you have to apply something, and you are often working without a net. So, it's critical that you be able to get a lot of practice (a) diagnosing what the data need, and (b) being able to correctly calculate what needs calculating. A lot of stat articles and books aren't nearly clear enough when you actually get down to calculating. [my favorite complaint is authors who "omit subscripts for clarity". Who the heck are they kidding?]

    But, of course, the books listed above are outdated.

    One current example is Brady West et al's book on Linear Mixed Models. http://www-personal.umich.edu/~bwest/almmussp.htm… The book consists of a theory chapter, and then several examples, each of which is worked out in 5 software packages: SAS, R, SPSS, Stata and HLM. This has the advantage of enlarging the book's market, but also enables you to see how the same concepts are handled in different software under different names.

    I haven't read all of Gelman and Hill yet, but will note that I have loaned it out to 4 co-workers. Three of them returned the book after buying their own copy so they could mark it up. That's a pretty good compliment.

  2. I can't recommend a book, but I think before you start reading statistics, you should read something about scientific methodology more generally – why people do randomized controlled trials, when to control for variables other than your main independent one, etc. After all, statistics is just a bunch of techniques.

    (This is assuming the reader doesn't already know about these things, which may be wrong.)

  3. I have and like your book, but it's definitely not for faint hearted complete newbies. So far, I have also found john fox' book(an r and s companion to applied regression) very helpful. Depending on where the reader comes from, a book like jim lindsey's intro to stats may be useful as a first "deprogrammer" – I remember how hard it was for me to leave the club of the p value worshippers.

  4. I recommend Data Analysis Using SQL and Excel by Gordon Linoff. I find that getting the data out of a database and into a form that can be used for statistical analysis is often the hardest part. There are a lot of subtle things that can go wrong with SQL queries.

    My usual workflow is as follows: figure out what data is needed for the analysis, and then write the SQL queries to get the data out of the database. The data is then moved to Excel for further cleaning (sometimes dates and null values come out weird from the database) and some analysis (often using pivot tables (which are similar to grouping in SQL) ). Once the data set is set up properly, then it can be moved into Stata or some other statistics program.

    Here are links to the books Amazon page and companion page.

    http://www.amazon.com/Data-Analysis-Using-SQL-Exc

    http://www.data-miners.com/sql_companion.htm

  5. I just taught a course to a group of 1st language Spanish students so they are able to write articles for scientific journals that have clear writing and useful statistics. They reminded me of the young factory engineers that I used to teach applied descriptive statistics in the tradition of Juran and Demming.

    What worked:
    -baby steps and good simple examples
    -simple online references using ASQC and wiki pages for definitions
    -challenging fuzzy cause/effect assertions with Ishikawa diagrams
    -thinking about a research project as a PDCA Shewart cycle
    -forming a testable hypothesis using one well described effect and one speculative but well defined variable
    -having a field notes discipline and check sheets that help collect, preserve and organize actual data
    -start with paper, pencil & calculator do NOT mess with Microsoft Excel
    -the current jump-bar for communicating clearly with numbers:
    "The best stats you've ever seen" http://www.ted.com/talks/lang/eng/hans_rosling_sh

  6. I'm curious this. I've been studying stats in grad school for about 5 years now. I still feel like a beginner. My experience is that it takes a good amount of experience to understand even some basic things. I often see several colleagues copying and pasting code from the UCLA stat site, tweaking this, and looking for a low p-value. Even running a simple logistic regression — to assess diagnostics takes some investment of time and was not trivial to me. For more advanced things, there are several nuances that became obvious only after a LOT of reading!

    With that said, I have also found that experts often have difficulty explaining concepts to students. Naturally, they may not realize what is not obvious in the beginning.

    A few books of exceptional clarity are:
    Agresti, Intro to Categorical Data Analysis (and any book by Agresti)
    De Leeuw, Introducing Multilevel Modeling
    Fitzmaurice: Applied Longitudinal Analysis

    An important feature of the above list is that code to reproduce the analyses is available.

  7. A book like jim lindsey's intro to stats may be useful as a first "deprogrammer" – I remember how hard it was for me to leave the club of the p value worshippers.

  8. I know the question is about which books are good, but I'm surprised that no one has suggested that maybe they just hire a professional statistician. If they want to get the most out of their data, it will take a lot (and I mean a lot!) of reading, not to mention practice, to get to the point of doing some good. If we think of our skills as statisticians are so simple for others to obtain by simply reading a good book, how can we expect to get jobs?

  9. Anon – or by simply taking a course or two!

    Also believe that in many courses there is an implicit message that "you should be able to do the correct analyses on your own".

    And I do know of departments that instead of recruiting a statisticians they hire someone from their discipline who has taken a couple good statistics courses.

    On there otherhand, I think it is hard for anyone (even us statisticians) to apply statistics well.

    Keith

  10. The UCLA statistical computing pages are a resource I go back to again and again. Great examples, plus screencasts and lots of other resources. Very, very much applied.

  11. Good afternoon,
    Thanks for all the great suggestions. Just few thoughts and comments to round-out the discussion. I'm in the process of following your leads on books/web resources.
    By way of background, my grad program (Master of Public Admin) prepared me with one quantitative methods course. As you can guess, not nearly enough for anything more than a cursory understanding (e.g. I know the difference between mean, median and mode). My guess is this level of just enough education to get you into trouble is not limited to the program I attended.
    Anon and Keith-I really like your suggestion. In my field (fire service) we frequently recommend that you let the pros do what they're trained for. You usually get much better (cheaper, faster, more accurate) results with less angst along the way. Unfortunately, because we're small government, permission to hire is more a political decision than a logical one. What I'd really like to do is have a pro help us set up a system for our regular, important analysis that can be maintained by us in the longer-term. But, that's still a wish list.
    Most of what we're trying to accomplish is similar to basic econ questions-providing the best use of limited resources (firefighters, fire engines, etc…) based upon a reasonable analysis of historical data and modeling of potential future need. Any Grad student's looking for a project?

    Thanks again for all your feedback, I really appreciate your help.
    David

  12. Late to the game but I liked Statistics Hacks by O'Reilly Press as an intro for people with little background. Very basic, but totally applied if you will.

Comments are closed.