Contest for developing an R package recommendation system

After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth.

I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output.

P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that:

– print() doesn’t give enough information
– summary() gives everything to a zillion decimal places and gives useless things like p-values
– plot() gives a bunch of residual and diagnostic plots but no graphs of the fitted model and data.

I like display() because it gives the useful information that’s in summary() but without the crap. I like coefplot() too, but it still needs a bit of work to be generally useful. And I’d also like to have a new function that automatically plots the data and fitted lines.

8 thoughts on “Contest for developing an R package recommendation system

  1. This seems like a variant on taskviews, based on popularity. Their system will help you find a package to perform a task based on what others are using, while in a taskview one or more experts make recommendations. A combination of both concepts would be better: I prefer to use packages with many users (easier to get help), but a recommendation of a specific package by an expert would carry a lot of weight, irregardless of its user base.

  2. Fascinating and some insurmountable opportunities for career management here – getting your package automatically recommended likley will get you more speaking invites – but how long will it be before this has any real weight in tenure decisions…

    On the other hand, it might be the way to have the most impact on statistical practice.

    But there needs to be some awareness that such prediction algorithym based recommendation systems may unfairly exclude many from the "action" (which, of course, happens fairly often in informal recommendation systems)

    And I would not be too surprised to hear that groups of people agree to recommend others packages if they recommend theirs…

    K?

  3. "I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output."

    I.e. Friends don't let friends use summary(lm)?

  4. I do regressions in R and I don't use display() or coefplot(), because I don't usually care too much about the coefficients or their significance, I care about the predictive accuracy. So I do n-fold cross-validation and measure the task-relevant metrics on the results.

  5. Andrew — I wanted to take a look at display() and coefplot(), but could not install or load your 'arm' package.

    > library(arm)
    Error: package 'lme4' required by 'arm' could not be found
    starting httpd help server … done
    Error: package 'lme4' required by 'arm' could not be found
    Error: package 'lme4' required by 'arm' could not be found
    Warning: dependency ‘lme4’ is not available
    trying URL 'http://cran.opensourceresources.org/bin/macosx/leopard/contrib/2.11/arm_1.3-06.tgz'
    Content type 'application/x-tar' length 201808 bytes (197 Kb)
    opened URL

  6. James,

    If you have MAC OS DVD, you should install developer X code packages from it. Otherwise, install them from here.

    http://r.research.att.com/tools/

    After this, do the followings in R:

    <pre>
    install.packages("lme4", type = "source")
    </pre>

    Then you will have lme4 in R and you can install arm without a problem!

  7. I'm actually also curious as to why coefplot() doesn't work on lmer fits. Is this functionality simply not built in yet, or was it left out intentionally for a good reason that I'm not seeing?

    Thanks for the help.

Comments are closed.