New R GUI: is this the wave of the future?

Ian Fellows writes:

Being as you are an R user at the intersection of the social sciences and statistics, I thought some recent work I’ve done might be of interest to you. SPSS has long dominated the teaching and practice of statistics in the social sciences (at least among non-statisticians). I’ve created a new menu driven data analysis graphical user interface aimed at replacing SPSS (or at least that’s the long term lofty goal). It has just been released under GPL-2 on CRAN. Feel free to check out some screen shots in the online wiki manual (not yet complete).

I don’t know SPSS, but just yesterday someone told me that people can run R from SPSS and get a convenient menu system, so if this freeware would have the same capacity, that would be great. Here’s the description:

Deducer 0.1 has been released to CRAN

Deducer is designed to be a free, easy to use, alternative to proprietary software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and data analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is to two fold.

1. Provide an intuitive interface so that non-technical users
can learn and perform analyses without programming getting
in their way.
2. Increase the efficiency of expert R users when performing
common tasks by replacing hundreds of keystrokes with a few
mouse clicks. Also, as much as possible the GUI should not
get in their way if they just want to do some programming.

Deducer is integrated into the Windows RGui, and the cross-platform Java console JGR, and is also usable and accessible from the command line. Screen shots and examples can be viewed in the online wiki manual

Comments and questions are more than welcome. A discussion group has been created for any questions or recommendations.

Deducer Features:

Data manipulation:
1. Factor editor
2. Variable recoding
3. data sorting
4. data frame merging
5. transposing a data frame
6. subseting

Analysis:
1. Frequencies
2. Descriptives
3. Contingency tables
a. Nicely formatted tables with optional
i. Percentages
ii. Expected counts
iii. Residuals
b. Statistical tests
i. chi-squared
ii. likelihood ratio
iii. fisher’s exact
iv. mantel haenszel
v. kendall’s tau
vi. spearman’s rho
vii. kruskal-wallis
viii. mid-p values for all exact/monte carlo tests
4. One sample tests
a. T-test
b. Shapiro-wilk
c. Histogram/box-plot summaries
5. Two sample tests
a. T-test (student and welch)
b. Permutation test
c. Wilcoxon
d. Brunner-munzel
e. Kolmogorov-smirnov
f. Jitter/box-plot group comparison
6. K-sample tests
a. Anova (usual and welch)
b. Kruskal-wallis
c. Jitter/boxplot comparison
7. Correlation
a. Nicely formatted correlation matrices
b. Pearson’s
c. Kendall’s
d. Spearman’s
e. Scatterplot paneled array
f. Circle plot
g. Full correlation matrix plot
8.Generalized Linear Models
a. Model preview
b. Intuitive model builder
c. diagnostic plots
d. Component residual and added variable plots
e. Anova (type II and III implementing LR, Wald and F tests)
f. Parameter summary tables and parameter correlations
g. Influence and colinearity diagnostics
h. Post-hoc tests and confidence intervals
with (or without) adjustments for multiple testing.
i. Custom linear hypothesis tests
j. Effect mean summaries (with confidence intervals),
and plots
k. Exports: Residuals, Standardized residuals, Studentized
residuals, Predicted Values (linear and link), Cooks
distance, DFBETA, DFFITS, hat values, and Cov Ratio
l. Observation weights and subseting
9. Logistic Regression
a. All GLM features
b. ROC Plot
10. Linear Model
a. All GLM features
b. Heteroskedastic robust tests

I’m not thrilled with how focused this all is on p-values, but I guess that doesn’t really matter. Once more capabilities are added, it’s fine that this other stuff is out there. I hope this catches on. And I hope they can set it up so it can run bayesglm(), lmer()/glmer(), and also I recommend that they set the default summaries from regressions using the display() rather than summarize() functions.

And does it make graphs? That’s key, no?

23 thoughts on “New R GUI: is this the wave of the future?

  1. For those interested and posterity, below is an e-mail I wrote in response, with Dr. Gelman's reply:

    ————————————-

    Andrew,

    Thanks for the suggestion. I was not aware of the display function until you mentioned it. I looks like a very good general tool for model summary. For the linear model I actually rolled my own summary function. Deducer does heterskedastic robust linear regression (I am suspicious of the a priori equal variance assumption), for which there is no appropriate summary function. lmer is on my short list of targets, but anything baysian is very far down the road (I haven't even done histograms yet!!!).

    After I finished writing my original e-mail I realized that I hadn't mentioned visualizations. A major oversight given your propensities. One thing I've tried hard to do is have every analysis be accompanied by at least one visualization.

    *one sample tests have histograms as well as box and jitter plots

    *t-tests and anovas have box/jitter plots

    *correlations have scatter and circle plots as well as a pretty cool correlation matrix function

    *Regressions have diagnostic, component residual, added variable and effect plots

    *Logistic regression also has ROC plots

    Next on my to do (though I'm going to take a little break) is common plotting functions. Histograms, scatter, box, jitter, etc. One thing I've found is that is is almost trivial to make a GUI dialog that works, but very time consuming to make one that works well. So these things wont be coming at light speed. I'm also aiming at a plug-in architecture, so that people can contribute their own extensions.

    Cheers,

    Ian

    p.s. I've been waiting with bated breath for your definitive smack down of box plots that you mentioned you were going to do. I'm on the fence about them. I could have missed the post though.

    ———————————-

    Hi, Ian. I'll just say that bayesglm() runs transparently, it "feels" just like running glm() except the results are more stable. So it's

    effortless to implement. For regressions, I recommend removing the diagnostic, component

    residual, added variable, and effect plots, and instead replacing them with (a) scatterplots of y vs. the linear predictor with the 45-degree line drawn in, and (b) the results of coefplot(). Residual plots etc are fine, but the first step is to understand the model. This is a point Jennifer and I discuss in chapters 3 and 4 of our book. Of course, these are just suggestions. As a freeware developer myself, I very much appreciate your efforts in this area, and I hope others follow up on it too.

    Andrew

    P.S. Thanks for the reminder. I'll try to get around to writing my boxplot smackdown.

  2. This is a great effort because it could be a source of free statistical power to people without the resources to : learn R or/and buy a SPSS like software.

  3. John Fox's Rcmdr has similar capabilities (I think;
    I don't use either), and has some degree of modularity.
    I believe he wrote it for teaching intro stats
    at McMaster University.

  4. There's just way too many steps in the installation process. And two of the installation steps requires typing commands!

    I think it looks great, but people who like GUIs aren't going to use it unless there's a combined installation package (i.e. one step: download & double-click to install).

  5. Better as "a ladder that once climbed up can be safely kicked aside"

    Learning to do some programing in R is likley as valuable as anything else actually learned in an intro stats course and though training wheels will likely do more good than harm a "keep them barefoot and pregnant interface" wont!

    (And many with Masters and even Phd's in Statistics still can't do basic programing other than calling standard routines in one or two stats programs)

    Keith
    p.s. recall the old SPlus "user friendly" GUI and spreadsheet stuff

  6. I was also thinking about Rcmdr when I started reading the post. As an end user and an R novice Rcmdr has given me the opportunity to get my data and do some data screening, and then use particular packages for what I am really interested in doing. As a stats consultant and first time methods instructor I am planning at least to always recommend to exit any SPSS box with `Paste` (SPSS users know what I am talking about).
    Again about Rcmdr, John Fox has added different plugins for specific procedures (also available at CRAN) and even though he doesn`t adhere himself to using the Rcmdr he is open to create plugins.

  7. @Ben Bolker & Manolo Romero
    Regarding Rcmdr… I really like what Dr. Fox has done with it, but have found that the GUI is a bit clunky (due to the toolkit) and doesn't really help me do my analyses faster (which is NOT one of its stated goals). For example, if I want to do 10 t-tests, I have to go through the t-test dialog 10 times, and each time it doesn't remember what my options were the last time. To do the same thing in Deducer requires 6 mouse clicks, and the result is given in a nicely formatted table. I've tried to keep in mind both expert and beginner users in the design, so that ease of use and efficiency go hand-in-hand.

    @Keith O'Rourke:
    I think there is a useful role for graphical interfaces among expert users, just as expert computer users use both the terminal and gui file system.

    @Jesse
    Also take a look at JGR. No debugging though…

  8. I prefer to script and automate things than run in a GUI. Even R's own interpreter is dangerous, because I'll forget what steps I did (yes, I know you can save your state, but that's not exactly easy to search or share with others).

    I like to script everything in files under version control. Then I load them rather than typing directly, so everything I do gets saved. I then write meta-scripts to put whole pipelines together and don't store intermediate files. I run Unix scripts the same way. At least when you're running BUGS, munging the basic data's an insignificant part of the processing time. (Speaking of BUGS, I never run that through its GUI, either.)

  9. This brings up an issue: what do readers think is the best plotting / visualization analysis package out there, one that is capable of quickly visualizing aspects of the data for exploratory data analysis but that can also produce publication-quality figure? I know Andrew likes R and many others do, but coming from CS I find it very unnatural as a programming environment compared to Matlab or Python.

    Python is excellent but its matplotlib plotting package is very incomplete compared to R, and gets many things statisticians care about (like axes) very wrong sometimes….

  10. Part of me admires the R effort, pushing the frontiers of new analysis forward instead of ossifying it to whatever is available currently in SAS (etc.)

    Part me of gets furious, and that's the part these efforts at making R general-purpose are aimed at. Academic versions of Minitab are $99 (perpetual), SAS is $125 (3 year licence and limited to 1500 observations). SPSS offers similar huge discounts.

    Note the Minitab and SAS costs are lower than the cost of a typical textbook.

    If these efforts actually succeed we will replace taxpaying businesses with huge academic discounts with R — heavily subsidized (certainly indirectly) and paying no taxes.

  11. Zbicyclist: I have no problem with software developers making money. I just don't want to have to learn Minitab or Stata or whatever just to teach a statistics course. When I use software I'm not familiar with, the result is that students don't learn much about how to use statistical software. When I teach using R, at least some of the students learn something.

  12. @zbicyclist
    God knows I'm not a free software zealot, but I have a hard time crying for multinational corporations. Even if I did this project completely on work time (I didn't) it wouldn't be subsidization, rather investment. My unit alone pays several thousand dollars a year in SPSS licensing fees, and yet we only have a couple copies of SPSS 17, the rest of the researchers are stuck with 12. It's not hard to see that just a small government user base would lead to a net savings.

    It's not like SPSS is (was?) the paragon of great software either. Version 16 was a terrible mess of bugs. There was a set of dialogs that you could go through that when you clicked save would actually delete the data set permanently. That kind of error could cost lives. see problem 20: http://support.spss.com/ProductsExt/SPSS/Patches/

    At least users of my software know there are going to be a terrible mess of bugs :)

    I'm all for commercial software, just make it good.

  13. I have to take issue with zbicyclist's comment (quite likely a commercial software developer).

    If I understand your argument correctly, you are saying that taxpayers are subsidizing the production of software that is given away, by development at public universities and with grant money.

    First, that is not an accurate view of the world (most free software is not subsidized by taxpayers). Second, the goal of public universities (in this context) is to educate students. It would be silly, dishonest, and immoral for university faculty members to make decisions that are in the best interest of private software developers. We're supposed to make decisions that are in the best interest of the students. What's next, arguing that professors should not hand out "subsidized" lecture notes, because that hurts private textbook sales?

    To carry your argument further, it is inappropriate for someone to get a degree at a public university and then work for SAS, because that might hurt SPSS.

  14. Also, it might happen that free software could help the development of companies that pay taxes (Python,PHP,Linux).

  15. I will accept Anonymous Coward's compliment that I seem like the type of person who merits being paid handsomely for my sofware development expertise, although I really do algorithmic development that others turn into code.

  16. Try Martin Theus's Mondrian for exploratory data analysis and use R for publication quality graphics. Looking for a software which does both seems to me very ambitious. Graphics software for EDA should be fast, flexible and efficient, for drawing thousands of graphics for one user. Graphics software for presentation should be precise, meticulous and comprehensive, drawing one graphic for thousands of viewers.

  17. For EDA in a highly interactive and visual context I find JMP (by SAS) to be excellent and extremely good for teaching intro stats because it is so intuitive. Have used it for years, and will be trying to couple it with R for the more specialized packages available there (bootstrapping, nomogram generation, for eg).

  18. I've played around with a few user interfaces for R. Being a windows user, ESS with EMACS was too foreign to be useful. JGR and Tinn-R seem good. For now I'm using StatET with Eclipse. It has many nice features for editing code and managing projects in R. See: http://jeromyanglim.blogspot.com/2009/03/user-int

    However, it would be great if a GUI could be developed that combined the benefits of a great code editor such as StatET with Eclipse and the benefits of a menu-driven GUI. In particular I am thinking about a menu-driven system which can generate code. This could be something like R Commander or the above project. But it would need to integrate with a good code editor.

    This would be particularly beneficial for non-everyday users and learners of R. At present R places rather large demands on the analysts memory for key words and arguments. Improvements to the code editor and providing an optional menu-driven interface would facilitate the process of acquiring this vocabulary, as well as providing a quick refresher if a keyword is forgotten.

Comments are closed.