Marie-Hélène Burle   email msb2@sfu.ca   twitter MHBurle   github prosoitos

Surviving R:
Finding information, trouble-shooting, getting help

Table of Contents

Clarifications

R, RStudio, and the tidyverse

  • R vs RStudio
  • Base R vs tidyverse (we will see the differences between tibbles vs data frames in a concrete example)
  • Core tidyverse vs other tidyverse packages

R, RStudio, and the tidyverse are 3 totally different things!

Because many RStudio users never open R directly, they do not realize that R is doing the work. R (and I mean R, not RStudio) needs to be updated regularly and you certainly cannot uninstall it.

  • R is a software environment and programming language. Without it, nothing works
  • RStudio is an IDE (integrated development environment) allowing a nicer experience while working in R
  • the tidyverse is an R meta-package (a set of R packages)

So… you can use the tidyverse without using RStudio, you can use RStudio without using any of the tidyverse packages, but you cannot use RStudio without using R. As soon as you use RStudio, you are using R.

Updating

Since all 3 are totally distinct, they need to be updated independently:

  • updating RStudio will in no way bring you to an up to date version of R
  • updating R will not update the tidyverse
  • etc.

Your task: update R, RStudio (if you use it), and all your packages

Warning: there is always a risk to break something with updates. Do them when you have time, don't do them just before an important event during which you will need R.

If you don't want to take this risk now, then don't! but do your updates when you get home.

Some packages will not install or update unless you have administrative privilege:

Run install.packages() directly in R (not RStudio), in a session with administrative privilege.

Working directory, working environment, history

What are all these weird dot files?

.RData/.Rda
.Rds
.Rhistory
.Rproj

What should I save?

Change your RStudio default options now

What is what in R?

  • Data types
  • Data class
  • Functions

Learn the vocabulary!

You really need it:

  1. to google the right terms and find answers to your problems
  2. to understand the help files

Common sources of problems

  • Working environment and file locations

    Where is R running? You might not be where you think you are…

  • File extensions not visible in finder or windows explorer

Go to Finder or Windows explorer now and make your file extensions visible

  • Transformation not saved in an object
  • Unexpected coercion

    str() is your friend.

  • NA

    Think about what they mean:

    • your missing data might not be true NA
    • different NA mean different things and need to be handled differently

Finding information

In R itself

R, as well as any package you have installed, comes with a lot of documentation. This can be invaluable if you do not have internet access.

Manuals

General manuals on R can be found by running:

help.start()

Packages

To get information on a package called <package>, you can run:

package?<package>

For instance package?data.table.

And to get a list of the functions in a package, run:

help(package = <package>)

For instance help(package = data.table).

Note that, for this to work, the package doesn't need to be loaded. But of course, it does need to be installed on your machine.

In the case of packages hosted on CRAN, a pdf containing the information for all the package functions can also be download from the web. Such package manuals are easy to find by googling CRAN <package>.

Functions

Any serious package contains the documentation of every function in help files.

You can get a detailed description of a function called <function> by running:

?<function>

For instance ?map.

For this to work, the package containing this function needs to be loaded. So the above line will only work if you have previously loaded the package purrr. Alternatively, you can run ?purrr::map.

How to make sense of the function help files?

Let's walk through some help files together.

Vignettes

Packages also sometime contain additional information in "vignettes": tutorials on how to use the package.

List vignettes

To list all the vignettes for a package called <package>, run:

vignette(package = "<package>")

For instance vignette(package = "dplyr").

Note that, for this to work, the package doesn't need to be loaded. But of course, it does need to be installed on your machine.

To list all the vignettes from the loaded packages:

vignette(all = F)

To list all the vignettes from all installed packages:

vignette()

Open a vignette

Once you have found the name of a vignette pertaining to the topic you are interested in, you can open it with:

vignette("<vignette>")

For instance vignette("two-table").

Versions information

R and all loaded packages:

sessionInfo()

or

sessioninfo::session_info()

One package only:

packageVersion("<package>")

R only:

version

Online

Online books

Several excellent books on R are-on top of their paper version-available as bookdowns. There are also great manuals and tutorials.

Getting started with R and the tidyverse

The book R for Data Science by Garrett Grolemund and Hadley Wickham is a must read for all beginner/intermediate R users, as well as advanced users not familiar with the tidyverse. This book will get you started with good habits and is an excellent introduction to R.

Go to this book right now (you can find it by googling "r for data science") and bookmark the following chapters:

  • 3 Data visualisation
  • 5 Data transformation
  • 10 Tibbles
  • 11.3.4 Dates, date-times, and times
  • 12.3.1 Gathering
  • 18 Pipes
  • 20 Vectors

Writing readable and well-formatted code

While syntax matters greatly in code execution (e.g. missing quotes, commas, or parenthesis will affect the meaning of your code), R will equally run formatted and non-formatted code.

Code, however, should not simply be written for the machine and should be made as human readable as possible. This is key, for instance, for code sharing and code review. While there are no official R formatting guidelines, Hadley Wickham wrote a short book on R formatting. Google's R Style Guide offers another popular (and quite similar) set of recommendations. Whichever formatting rules you choose, it is important that you commit to them for the sake of formatting consistency.

Of note, when you work on someone else's code, you should adopt their style, again, for the sake of consistency.

Understanding R as a programming language

The book Advanced R by Hadley Wickham will give you a better understanding of R as a programming language and help you get to the next level of R writing. Don't get turned off by the term "advanced". The book is very readable and is useful for R users at all levels to better understand the various types of data, the functioning of R, etc.

The first edition of that book, which focuses on base R rather than on the tidyverse, is also well worth a read.

Writing your own packages

The book R packages by Hadley Wickham will get you started if you want to write your own packages.

The on-line manual Writing R Extensions by the R Core Team gives a more dense and exhaustive documentation if you need something that is not in Hadley's book.

GIS in R

The tutorials An Introduction to Spatial Data Analysis and Visualisation in R by Guy Lansley and James Cheshire as well as the book Geocomputation with R by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow will teach you how to map data and conduct spacial data analysis in R or how to bridge R and QGIS.

Cheatsheets

Who doesn't love cheatsheets? Good news: RStudio and others created great cheatsheets on the tidyverse and a few other packages. If you use the tidyverse, those are absolute must have.

Go to that page right now (you can find it by googling "rstudio cheatsheet") and download:

  • Work with Strings Cheat Sheet
  • Data Import Cheat Sheet
  • Data Transformation Cheat Sheet
  • RStudio IDE Cheat Sheet
  • Data Visualization Cheat Sheet

Note: some of these cheatsheet are accessible from within RStudio, under the help menu.

At SFU

The Research Commons

The SFU Research Commons offers consultations, workshops, and online resources for R.

The Research Commons is also a partner of Software Carpentry and Data Carpentry, now merged under the Carpentries. The Carpentries organize workshops-including workshops on R-regularly. You can find their upcoming workshops on their website.

Library

The SFU library owns several classic books on R. Don't hesitate to talk to a librarian if you need help finding them. And remember that you can also suggest new book acquisitions if important books are missing from the collection.

The Scientific Programming Study Group

SciProg, short for Scientific Programming Study Group, is an SFU student lead group open to anyone interested in learning or sharing programming resources through workshops, hackathons, and other events. R workshops are regularly offered. If you are interested in learning about a particular topic (or if you are interested in giving workshops), get in touch!

Trouble-shooting

  1. Read the error message*
  2. Look for typos (R is case sensitive)
  3. Re-start your R process
  4. Make sure your working directory is where you think it is and your files are where you think they are
  5. Update R, RStudio if you use it, and your packages
  6. Look at the help files of the functions involved
  7. Google using judicious keywords
  8. If relevant, look for explanations and examples in Hadley's books and/or RStudio cheatsheets
  9. Simplify your non running code until it starts running or alternatively start very simple and add elements until the code breaks

*Don't panic as soon as you see something red: some information (for instance when you install new packages) and warnings are also red. They are important to read, but they are not error messages. Most students panic as soon as they see error messages and they do not read those. Error messages are not there to punish you: they are very useful bits of information that are critical to finding a solution. While they may not always make sense, read them several times. You might understand part of it and it can give you hints on how to get started. Error messages are also very useful to look for help on google.

If, after doing all of these, you are stick stuck, then ask for help:

Getting help

Where to ask for help

At SFU

Maybe you can ask for help to your supervisor, or your peers.

The SFU Research Commons offers one-on-one consultations to help you with your R code.

Online

R has a wonderful community and you can also ask for help online.

But different sites and forums have different cultures and you should familiarise yourself with a site before making your first post. You also have to make a reproducible example first or you may get your head chopped off.

Options

How to ask for help

This is critical…

The golden rules

The R community is full of people keen to help you: you will be amazed. But if you want to receive good help, you need to do your part. In order for others to understand your issue and be able to help you, the code that you post online needs to follow 4 (even better 5) rules, which are that it:

  1. makes sense without being run,
  2. can be run,
  3. does not contain sensitive or personal data,
  4. does not use data which needs to be downloaded,
  5. (optionally) does not contain more than is necessary to reproduce the problem.

Let's go over each point. The posted code:

1. Makes sense without being run

This means that it includes the code and its output: not everybody wants to run your code and they may be able to see what is going on just by looking at this.

2. Can be run

Anyone copying your code and running it on their machine should get the output you got. This is necessary for others to test potential solutions without having to do the work of first making up data that looks like yours.

3. Does not contain sensitive or personal data

If your data is sensitive, it needs to be anonymised or you need to make a toy example which mimics the structure of your data.

4. Does not use data which needs to be downloaded

If your code uses, for instance, data from a .csv file, the code alone will not run. Uploading your .csv file or a .rds file for others to download is tedious and many people will not be keen to do so. At best, your question will be ignored, at worse, you will get negative feed-back. You either need to make a toy example which has all the characteristics necessary to replicate your problem or you need to provide a sample of your data using dput(). Hadley Wickham explains how to use dput() to create a reproducible example in his first version of Advanced R.

5. (Optionally) does not contain more than is necessary to reproduce the problem

While not absolutely necessary, reducing your code to the simplest and smallest sample necessary to reproduce your problem will make it easier for others to pinpoint what is going on. Additionally, it is likely that you will find the problem yourself in the process of producing this "minimal reproducible example". The accepted answer to the very popular question how to make a great R reproducible example? on the site Stack Overflow gives all the characteristics of a minimal reproducible example.

Data anonymisation

You can anonymise sensitive information yourself, or you can use the package anonymizer.

Last updated: November 24, 2018