11  Functions

11.1 Introduction

So far we have used many in-built functions available in base R such as print, class, sum, etc. In addition, we have used multiple functions from libraries such dplyr, ggplot, etc. Learning these functions is vital to performing data analysis in R. Another skill that is of immense value for a data scientist is to write custom functions.

In simple terms, a function is a keyword assigned to a block of code and to execute this code we call that keyword. There are several advantages of writing functions. Functions allow us to reduce and reuse the code. Once we have written a function, it can be called multiple times with the program. Further, a function defined in one program can be called from another program. It is because of this versatility, it is important to learn how to write custom functions. The code snippet below defines a function a new function called test_func. This function has two cat statements one prints hello and the second one prints world! The function is called using test_func.

test_func <- function(){
  cat("Hello ")
  cat("World!")
}
test_func()
Hello World!

A function can take some variables as input these are called arguments. For example, the code below shows a new function called sumSquare which takes two arguments and prints the square of the sum of the two values that are given as arguments. Another function getDomain takes an email id as an argument and prints the domain name.

sumSquare <- function(a,b){
  c = a**2 + b**2 + 2*a*b
  print(c)
}
sumSquare(2,4)
[1] 36
getDomain <- function(emailID){
  domain <- strsplit(emailID,"@")[[1]][2]
  print(domain)
}
getDomain("manish@bioinfo.guru")
[1] "bioinfo.guru"

The argument(s) for a function need not be just numbers or text; they can be of any data type.  The code below shows two versions of the evenSquareOddCube function The first one takes a number as input while the second one takes a vector. Note that the ifelse function is vectorized that’s why we could use a vector for checking the condition.

evenSquareOddCube <- function(a){
  if(a%%2 == 0){
    print(a**2)
  }
  else{
    print(a**3)
  }
}
evenSquareOddCube(4)
[1] 16
evenSquareOddCube <- function(a){
  ifelse(a %% 2 == 0, a^2, a^3)
}

evenSquareOddCube(c(1:5))
[1]   1   4  27  16 125

11.2 Returning value from a function

Any variable inside a function has a local scope that is the values for these variables are not accessible outside that function. When some value from a function needs to be passed on to the main program (after the call to that function is over) then the return keyword is used. When calling a function that returns a value, we need to pass that function to a variable that can hold the returned value.

The code below defines a new function Febonacci that takes a number as an argument. It calculates the Febonacci series starting from 1 and computes total numbers equal to the argument given. It returns the series as a vector. We can use this vector for further calculations e.g., to compute the sum of all the numbers in the vector.

Fibonacci <- function(total_nums){
  f_nums <- c()
  x <- 0
  y <- 1
  for(ctr in c(1:total_nums)){
    z <- x + y
    f_nums <- append(f_nums,z)
    x <- y
    y <- z
  }
  return(f_nums)
}
nums <- Fibonacci(10)
print(nums)
 [1]  1  2  3  5  8 13 21 34 55 89
print(sum(nums))
[1] 231

From the code above, you’ll be able to appreciate the value of writing custom functions. Here, we have assigned multiple lines of code to a keyword Fibonacci in this case. We can call this function as many times as we want with different argument to get a different sequence of numbers.

It is important to note here that any function in R, by default, returns the last computed value within the function even when there is no return statement. For instance in the code below, on call the test1 function, the value of y is returned.

test1 <- function(){
  x = 5
  y = x**2
}
print(test1())
[1] 25

Quiz

Write a function to print prime numbers. The function should take a number (n) as an argument and return first n prime numbers.

Solution
prime_nums <- function(n){
  x <- 3
  p_nums <- c()
  p_nums <- append(p_nums,x-1)
  FLAG = TRUE
  while (length(p_nums)<n) {
    for(y in c(3:x-1)){
      if(x %% y == 0){
        FLAG <- FALSE
        break
      }
      else{
        FLAG <- TRUE
      }
    }
      if(FLAG){
        p_nums <- append(p_nums,x)
      }
  x <- x + 1  
  }
  return(p_nums)
}
  
print(prime_nums(5))

11.3 Calling a function on dataframes

Since most of the time we are working with dataframes, it is important to understand how to call a function using values from a dataframe. The mutate function from the dplyr library makes it easy to call functions using column(s) as arguments. In the code below, we’ll first create a dataframe with three rows and three columns. Then we’ll add a new column by calling the evenSquareOddCube function on the third column. Note that we need to use the vectorized version of our function since the argument (a column) is a vector.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
df1 <- data.frame(matrix(1:9,nrow = 3)) #(replicate(3,1:5))
colnames(df1) <- c("Col1","Col2","Col3")
df1
  Col1 Col2 Col3
1    1    4    7
2    2    5    8
3    3    6    9

Apply a function to a column and add a new column

df1 %>%
  mutate(Col4 = evenSquareOddCube(Col3))
  Col1 Col2 Col3 Col4
1    1    4    7  343
2    2    5    8   64
3    3    6    9  729

Apply a function to multiple columns using the across function. The columns can be mutated in-place i.e. the original values in the column(s) are replace with new values . The code below shows two scenarios where we call the evenSquareOddCube function on columns 2 and 3. In the first case the column values are changed in-place which in the second case two new columns are added to the dataframe. Notice the evenSquareOddCube is called with a dot (.) as an argument.

df1 %>%
  mutate(across(c("Col2","Col3"), ~ evenSquareOddCube(.)))
  Col1 Col2 Col3
1    1   16  343
2    2  125   64
3    3   36  729

The .names keyword argument for the across function is used to specify the names for the new columns. Here, we’ll create new column names by taking the original column name and adding a suffix “_ESOC” to it.

df1 %>%
  mutate(across(c("Col2","Col3"), ~ evenSquareOddCube(.), .names = "{col}_ESOC"))
  Col1 Col2 Col3 Col2_ESOC Col3_ESOC
1    1    4    7        16       343
2    2    5    8       125        64
3    3    6    9        36       729

Quiz

Write a function that takes dataframe of numbers as an argument and return a new dataframe having “Even” and “Odd” as values corresponding to the numbers in the input dataframe. A sample input and output dataframes are given below.

  Col1 Col2 Col3
1    1    4    7
2    2    5    8
3    3    6    9
  Col1 Col2 Col3
1  Odd Even  Odd
2 Even  Odd Even
3  Odd Even  Odd
Solution
checkEvenOdd <- function(a){
  return(ifelse(a %% 2 == 0, "Even", "Odd"))
}
df1 <- data.frame(matrix(1:9,nrow = 3)) #(replicate(3,1:5))
colnames(df1) <- c("Col1","Col2","Col3")
df1 %>%
  mutate(across(c("Col1","Col2","Col3"), ~ checkEvenOdd(.)))