We use R a lot. R takes care of many our basic data management needs. R is an awesome statistical analysis package. R allows you to produce exceptional data graphics. The only problem is … R has a wicked learning curve. In this post we provide tips on learning R for the first time and pointers to some of the most useful books and documentation we’ve come across.
R’s wicked learning curve is probably not surprising given that R was written by and for academic statisticians in a loosely coupled, open source environment. It is helpful to understand a little of the history of R and how it relates to both S and S-PLUS.
R, the statistical software environment, is an implementation of S, the statistical programming language. The S language was developed at ATT Bell Labs by John Chambers and others in the late 1970′s and early 1980′s. The goal of the language was “to turn ideas into software, quickly and faithfully.” Back in the days when coding statistical analyses involved making calls to Fortran subroutines, the S language provided a way for statisticians to harness numerical analyses without becoming full time programmers.
The two main implementations of the S programming language are open source R and commercial S+. The most obvious difference between them, besides price, is the more integrated nature of S+, complete with IDE (Integrated Development Environment). In R’s favor is the large and growing community of R developers writing packages that continually enhance R’s functionality. For scientific applications, where the data and analyses are less routine than in the business world, we favor R.
Learning R for the First Time
When approaching R for the first time it is important to let go of some of what you know about programming. To many programmers, R has annoyingly unexpected behavior: R has several different object types that behave differently in different situations; R remembers things you wouldn’t expect; R package methods, being developed by individuals, don’t always agree on argument names or behavior. Frustrating if you are only concerned about writing code.
However, R has an incredible amount of statistical and data visualization smarts that make doing statistics easy. So it is important to learn about R from the point of view of a statistician rather than the point of view of a programmer. Our favorite introduction to statistics with R is John Verzani’s “Using R for Introductory Statistics” which is available in print or as a PDF.
We recommend going through the entire book a chapter at a time. It is important to understand the statistical concepts built into R before attempting to harness them to do work. For many tasks, there is an R function that already does what you want. Those who refuse to read up and learn about this powerful tool will end up writing hundreds of lines of ‘programmer code’ where only a line or two of ‘R code’ is needed.
While you are going through Verzani’s examples you should take extra time to examine R’s built in documentation. Like Unix man pages, help in R is easily accessible, ASCII formatted and informationally dense. Reading the help for each function you use will soon get you familiar with the full power of R’s functions. You can access R’s help facility with > ?help.
The last thing you’ll need while getting started is a list of the most important commands. Our favorite list is the R Reference Card
described below. Print out all four pages and tape them to your desk
while working through Verzani’s book. If you’re really intent on
learning R, the payoff will be commensurate with the time you invest in
- Buy or download Verzani’s “Using R for Introductory Statistics“
- Download and print out the R Reference Card
- Work through the examples in Verzani’s book, using ?help to learn more.
- Explore the functions listed in the reference card.
The rest of this post will be a compilation of R resources that we
have used and heartily recommend. If you have your own favorites,
please add them as a comment.
- Using R for Introductory Statistics (pdf)
- John Verzani’s primer is an excellent place to begin learning about both statistics and R.
- Applied Spatial Data Analysis with R
- Authors Bivand, Pebesma and Gómez-Rubio are key developers of R’s
spatial capabilities. This book, released in 2008, explains many of the
newer developments that enable R to take on some of the spatial
analyses that had previously been the domain of GIS systems. If you
work with spatial data, this book is a must.
Official R Web Pages
- R Home page
- Ths starting point for documentation, downloads, packages, etc.
Unofficial Web Sites
- Quick R
- The documentation on this site, maintained by Rob Kabacoff, is the
simplest to navigate, easiest to understand compilation of R
documentation we have come across. If you’re looking for web-based
documentation, come here first.
- R Graph Gallery
- Romain Francois maintains this site of amazing data visualizations
created with R. Each visualization comes with the code that generated
it so this is an excellent way to get inspired about data graphics.
Romiain also maintains a related blog highlighting his R development efforts.
- Using Color in R
- Earl Glynn’s slide presentation is an excellent review of the use of
color in scientific graphics in general and R in particular. Color
blindness and palettes for dichromats are covered on slides 33-37.
- R Reference Card
- This reference card is one that you will eventually want to have
memorized. In the mean time, print it out and post it on your wall or
have it open in another window while you’re learning R.
- R Color Chart
- If you are a data-vis perfectionist you will want to have this chart handy.
- Data Analysts Captivated by R’s Power (New York Times — Jan 6, 2009)