Software for Research, Part 3: [R], RStudio and ggplot2 for Statistics

1 Comment

[R] is an excellent open-source statistics language. It's cross-platform and free and I think it will eventually displace proprietary stat's packages due to its rapid development, speed and ease of use. So there's no time like the present to get used to using it.

This figure from the site would appear to support that view (number of posts in main discussion groups per month)


Like all new programs, it has a little bit of a learning curve, especially if you're not used to using the command line. But don't let that turn you off, for any sort of statistics beyond the most basic, you're going to end up working with scripts anyway, it's just the most efficient way to run analyses. Graphical menus while useful to begin with quickly become a hindrance.

There are a few tricks for making your first analysis with [R] a bit simpler, this post will cover some of these tips (and be expanded over time to cover more).
But first, here's the reason why most people use [R], very pretty plots, note too that it's a very weird scale on this plot (a reciprobit plot), something that many programs just can't do:


Installing [R]
The installation procedure varies a bit between different operating systems, it's clearly described from the main [R] site, pick a local mirror and download for Linux/Mac/Windows from there.
Also really worth downloading RStudio, a nice user interface for [R] from their site.

Using [R] and ggplot
There are a number of ways to work with [R], if you're familiar with other statistics programs you may have used scripting options before. [R] is much more powerful than just running sequential scripts, it is a programming environment in its own right.
While you can work in the R console directly, most people tend to use a type of 'scrapbook workflow' in which a text editor is used to write a 'source file'. This is then run by [R], either by copying into the [R] console or by using a plugin that links your text editor to [R}

R works nicely with a text editor that does code colouring, my favourite on OS X is TextMate, it has a bundle specifically to interact with [R], here.

Here is a full list of text editors that play nice with R on all platforms.

If you haven't worked in this way before, it can be a little daunting to start off with as you need to track files as well as graphical windows as well as using commands. Eventually you end up with a workflow that suits you best, but there is a program called RStudio which does this all for you, so it's a great place to start.

I've been using RStudio for since the start of October (2011) and have been very impressed with it's clean workflow and way in which the source file and console are integrated. It also has a web interface which runs on a linux server, try the one running on my test server if you'd like to try [R] without installing it yourself:

username: rstudio_test

password: testing


More about RStudio
It's rare that a software project makes a large impact early on, but the concept, programming and support which accompany RStudio are superlative. I've had a few questions along the way and my thanks go to Josh Paulson, one of the RStudio developers, who has been extremely helpful. The online support docs are also in good order.

A software project also needs recognition and I was pleasantly surprised to hear a colleague say the other day over coffee, "Have you tried RStudio?". Clearly the developers are filling a niche with a lot of demand!

So to get back to the advantages of running RStudio server side:

The best thing about the server version is that you can pick up your session from where you left off no matter which computer you are using. In fact, it's such a comfortable way of working with [R] that I would recommend it for routine use. It also opens up the possibility of sharing analyses easily (although user management needs to be done manually at the moment). will eventually offer a hosted solution but if you'd like an account on my server, drop me a message in the contact form.

Getting help - Stackoverflow

There are a number of [R] mailing lists but I have found one of the most useful resources to be Stackoverflow, responses are usually very quick and the format of the site makes it a great reference too.


Michael MacAskill

Hi, great post, although I've stumbled across it a few years late. Any chance of sharing the ggplot code you used to get the scales right for the reciprobit plot?



11-08-2014 12:31 am