# Functional programming in R (with *purrr*)

## Table of Contents

## Introduction

### What is functional programming?

It is a programming paradigm based on the evaluation of functions. This is opposed to *imperative programming*. While some languages are based strictly on functional programming (e.g. Haskell), R allows both imperative code (e.g. loops) and functional code (e.g. many base functions, the `apply()`

family, `purrr`

).

### Iterations

Iterations are the repetition of a process (e.g. applying the same function to several variables, several datasets, or several files).

The classic methods in R are:

- loops
`apply()`

functions family

### The purrr package

One of the `tidyverse`

core packages, `purrr`

was written in 2015 by Lionel Henry (also the maintainer), Hadley Wickham, and RStudio inc.

#### Goal

`Purrr`

is a set of tools allowing consistent functional programming in R in a `tidyverse`

style (using `magrittr`

pipes and following the same naming conventions found in other `tidyverse`

packages).

As Hadley Wickham says, in many ways, `purrr`

is the equivalent of the `dplyr`

, but while `dplyr`

focuses on data frames, `purrr`

works on vectors: it works on the elements of atomic vectors, lists, and data frames. Since R's most basic data structure is the vector, this makes `purrr`

extremely powerful and flexible.

#### Logistics

Install it with:

install.packages("tidyverse") ## or install.packages("purrr")

Load it with:

library(tidyverse) ## or library(purrr)

As always, once the package is loaded, you can get information on the package with:

?purrr

and on any of its functions with:

?function ## e.g. for the map function ?map

## Let's dive in

### Load packages

First, let's load the packages that we will use. It is always a good idea to write all the packages that you will be using at the top of the script. This will help others, using your script, to know what is required to run it.

library(tidyverse) # we will use purrr and other core packages library(magrittr) # we will use several types of pipes

### Create some fake banding data

Let's create some imaginary bird banding data:

banding <- tibble( bird = paste0("bird", 1:50), sex = sample(c("F", "M"), 50, replace = T), population = sample(LETTERS[1:3], 50, replace = T), mass = rnorm(50, 43, 4) %>% round(1), tarsus = rnorm(50, 27, 1) %>% round(1), wing = rnorm(50, 112, 3) %>% round(0) ) banding

### Map: apply functions to elements of a list

Imagine that you want to calculate the mean for each of the morphometric measurements (mass, tarsus, and wing).

How would you usually do this?

Spend 5 minutes writing code you would usually use.

To apply functions to elements of a list, you can use `map`

, one of the key function of the `purrr`

package.

#### Usage

map(.x, .f, ...)

.x a list or atomic vector .f a function, formula, or atomic vector ... additional arguments passed to .f

For every element of `.x`

, apply `.f`

.

What we have, in the simplest case, is:

map(list, function)

#### In our example

How could we use `map()`

to calculate the means of all 3 measurement types?

A data frame is a list! It is a list of vectors.

Without running it in your computer, try to guess what the result of the following will be:

length(banding)

Now, run it. What do you get? Why?

So, back to our example, we do have a list: a list of vectors. That's what our banding data frame is! So no problem about applying `map()`

to it.

Answer

map(banding[4:6], mean)

or using a pipe

banding[4:6] %>% map(mean)

However, the output of `map()`

is always a list. And a list as output is not really convenient here. There are other map functions which have vector or data frame outputs. To get a numeric vector as the output, we use `map_dbl()`

:

Answer

map_dbl(banding[4:6], mean)

or

banding[4:6] %>% map_dbl(mean)

Similarly, you can calculate the variance, the sum, look for the largest value, or apply any other function to our data.

Spend 2 min writing codes for these.

Answer

map_dbl(banding[4:6], var) map_dbl(banding[4:6], sum) map_dbl(banding[4:6], max)

#### Stepping things up

Now, imagine that you would like to plot the relationship between tarsus and mass for each population.

How would you usually do that?

Spend 5 min writing code for this.

And feel free to chat.

Answer

You could write a for loop:

for (i in unique(banding$population)) { print(ggplot(banding %>% filter(population == i), aes(tarsus, mass)) + geom_point()) }

But this is the functional programming method:

banding %>% split(.$population) %>% map(~ ggplot(., aes(tarsus, mass)) + geom_point())

Let's save those graphs in a variable called `graphs`

that we will use later.

graphs <- banding %>% split(.$population) %>% map(~ ggplot(., aes(tarsus, mass)) + geom_point())

#### Formulas

Formulas = a shorter notation for anonymous functions

##### With one element

The code:

map(function(x) x + 3)

which contains the anonymous function `function(x) x + 3`

can be written as:

map(~ . + 3)

This code abbreviation is called a "formula".

Your turn: write the following anonymous function as a formula.

map(function(x) mean(x) + 3)

Answer

map(~ mean(.) + 3)

##### With 2 elements

The code:

map2(function(x, y) x + y)

can be shortened to:

map2(~ .x + .y)

##### Referring to elements

1st element | 2nd element | 3rd element | ||
---|---|---|---|---|

`.` |
||||

`.x` |
`.y` |
|||

`..1` |
`..2` |
`..3` |

etc.

Your turn: write the following anonymous function as a formula.

pmap(function(x1, x2, y) lm(y ~ x1 + x2))

Answer

pmap(~ lm(..3 ~ ..1 + ..2))

`map_if`

/`modify_if`

and `map_at`

/`modify_at`

We built our data frame with `tibble()`

which, as is the norm in the `tidyverse`

, does not transform strings into factors:

banding <- tibble( bird = paste0("bird", 1:50), sex = sample(c("F", "M"), 50, replace = T), population = sample(LETTERS[1:3], 50, replace = T), mass = rnorm(50, 43, 4) %>% round(1), tarsus = rnorm(50, 27, 1) %>% round(1), wing = rnorm(50, 112, 3) %>% round(0) ) %T>% str()

Several base R functions however, do.

Let's build the same data with the base R function `data.frame()`

:

banding <- data.frame( bird = paste0("bird", 1:50), sex = sample(c("F", "M"), 50, replace = T), population = sample(LETTERS[1:3], 50, replace = T), mass = rnorm(50, 43, 4) %>% round(1), tarsus = rnorm(50, 27, 1) %>% round(1), wing = rnorm(50, 112, 3) %>% round(0) ) %T>% str()

The reason several base R functions transform strings into factors is historic. This used to be essential to save space. But this is not relevant anymore and has become somewhat of an annoyance.

If you have such a data frame, you may wish to transform the factors into characters.

How can you do this?

`map()`

has the derivatives `map_if()`

and `map_at()`

which allow to apply functions when conditions are met or at certain locations. Here, we can use `map_if()`

:

banding %>% map_if(is.factor, as.character) %T>% str()

However, `map_if`

and `map_at`

always return lists. If you want the output to be of the same type of the input, use `modify_if`

and `modify_at`

instead.

banding <- data.frame( bird = paste0("bird", 1:50), sex = sample(c("F", "M"), 50, replace = T), population = sample(LETTERS[1:3], 50, replace = T), mass = rnorm(50, 43, 4) %>% round(1), tarsus = rnorm(50, 27, 1) %>% round(1), wing = rnorm(50, 112, 3) %>% round(0) ) banding %>% modify_if(is.factor, as.character) %>% head() %T>% str()

This could also be accomplished with `mutate_if()`

:

banding %>% mutate_if(is.factor, as.character)

But the `map()`

functions also work with lists and are more flexible than `mutate()`

and its derivatives.

#### Usage

modify(.x, .f, ...) modify_if(.x, .p, .f, ...) modify_at(.x, .at, .f, ...)

.x a list or atomic vector .f a function, formula, or atomic vector ... additional arguments passed to .f .p a predicate function. Only the elements for which .p evaluates to TRUE will be modified .at a character vector of names or a numeric vector of positions. Only the elements corresponding to .at will be modified

For every element of `.x`

, apply `.f`

, and return a modified version of `.x`

.

So basically, in its simplest form, we have:

modify(list, function)

### Walk: apply side effects to elements of a list

Now, we want to save the 3 graphs we previously drew into 3 files.

How would you do this?

Spend 5 minutes writing code you would usually use.

To apply side effects to elements of a list, we use the `walk`

functions family.

#### Usage

walk(.x, .f, ...)

.x a list or atomic vector .f a function, formula, or atomic vector ... additional arguments passed to .f

#### Apply to our example

We already have a list of graphs: `graphs`

. Now, we can create a list of paths where we want to save them:

paths <- paste0("population_", names(graphs), ".png")

So we want to save each element of `graphs`

into an element of `paths`

. The function we will use is `ggsave`

. To apply it to all of our elements, instead of using `map`

, we will use `walk`

because we are not trying to create a new object.

The problem is that we have 2 lists to deal with. `Map`

and `walk`

only allow to deal with one list. But `map2`

and `walk2`

allow to deal with 2 lists (`pmap`

and `pwalk`

allow to deal with any number of lists).

Here is how `walk2`

works (it is the same for `map2`

):

walk2(.x, .y, .f, ...)

.x, .y vectors of the same length. A vector of length 1 will be recycled. .f a function, formula, or atomic vector ... additional arguments passed to .f

Give it a try:

use `walk2`

to save the elements of `graphs`

into the elements of `paths`

using `ggsave`

.

Don't hesitate to look up the help file for `ggsave`

with `?ggsave`

if you don't remember how to use it!

Answer

walk2(paths, graphs, ggsave)

## Summary of the map and walk functions family

We will use different `map`

(or `walk`

, if we want the side effects) function depending on:

- How many lists we are using in the input

number of arguments in input | purrr function | ||
---|---|---|---|

1 | `map` or `walk` |
||

2 | `map2` or `walk2` |
||

more | `pmap` or `pwalk` |

- The class of the output we want

class we want for the output | purrr function | ||
---|---|---|---|

nothing* | `walk` |
||

list* | `map` |
||

double | `map_dbl` |
||

integer | `map_int` |
||

character | `map_chr` |
||

logical | `map_lgl` |
||

data frame (by row-binding) | `map_dfr` |
||

data frame (by column-binding) | `map_dfc` |

Results are returned predictably and consistently, which is not the case of `sapply()`

.

*As Jenny Bryan said nicely:

"

`walk()`

can be thought of as`map_nothing()`

`map()`

can be thought of as`map_list()`

"

- How we want to select the input

selecting input based on | purrr function | ||
---|---|---|---|

condition | `map_if` |
||

location | `map_at` |

## Conclusion

These are some of the most important `purrr`

functions. But there are many others and I encourage you to explore them by yourself.

Great resources for this are:

- The iteration chapter of Hadley Wickham's book R for data science
- The purrr cheatsheet
- The purrr CRAN manual
- The vignettes and help files for the many purrr functions

Have fun!!!