<- function(){
test_func cat("Hello ")
cat("World!")
}test_func()
Hello World!
So far we have used many in-built functions available in base R such as print
, class
, sum
, etc. In addition, we have used multiple functions from libraries such dplyr
, ggplot
, etc. Learning these functions is vital to performing data analysis in R. Another skill that is of immense value for a data scientist is to write custom functions.
In simple terms, a function is a keyword assigned to a block of code and to execute this code we call that keyword. There are several advantages of writing functions. Functions allow us to reduce and reuse the code. Once we have written a function, it can be called multiple times with the program. Further, a function defined in one program can be called from another program. It is because of this versatility, it is important to learn how to write custom functions. The code snippet below defines a function a new function called test_func
. This function has two cat
statements one prints hello and the second one prints world! The function is called using test_func
.
<- function(){
test_func cat("Hello ")
cat("World!")
}test_func()
Hello World!
A function can take some variables as input these are called arguments. For example, the code below shows a new function called sumSquare
which takes two arguments and prints the square of the sum of the two values that are given as arguments. Another function getDomain
takes an email id as an argument and prints the domain name.
<- function(a,b){
sumSquare = a**2 + b**2 + 2*a*b
c print(c)
}sumSquare(2,4)
[1] 36
<- function(emailID){
getDomain <- strsplit(emailID,"@")[[1]][2]
domain print(domain)
}getDomain("manish@bioinfo.guru")
[1] "bioinfo.guru"
The argument(s) for a function need not be just numbers or text; they can be of any data type. The code below shows two versions of the evenSquareOddCube
function The first one takes a number as input while the second one takes a vector. Note that the ifelse
function is vectorized that’s why we could use a vector for checking the condition.
<- function(a){
evenSquareOddCube if(a%%2 == 0){
print(a**2)
}else{
print(a**3)
}
}evenSquareOddCube(4)
[1] 16
<- function(a){
evenSquareOddCube ifelse(a %% 2 == 0, a^2, a^3)
}
evenSquareOddCube(c(1:5))
[1] 1 4 27 16 125
Any variable inside a function has a local scope that is the values for these variables are not accessible outside that function. When some value from a function needs to be passed on to the main program (after the call to that function is over) then the return
keyword is used. When calling a function that returns a value, we need to pass that function to a variable that can hold the returned value.
The code below defines a new function Febonacci
that takes a number as an argument. It calculates the Febonacci series starting from 1 and computes total numbers equal to the argument given. It returns the series as a vector. We can use this vector for further calculations e.g., to compute the sum of all the numbers in the vector.
<- function(total_nums){
Fibonacci <- c()
f_nums <- 0
x <- 1
y for(ctr in c(1:total_nums)){
<- x + y
z <- append(f_nums,z)
f_nums <- y
x <- z
y
}return(f_nums)
}<- Fibonacci(10)
nums print(nums)
[1] 1 2 3 5 8 13 21 34 55 89
print(sum(nums))
[1] 231
From the code above, you’ll be able to appreciate the value of writing custom functions. Here, we have assigned multiple lines of code to a keyword Fibonacci
in this case. We can call this function as many times as we want with different argument to get a different sequence of numbers.
It is important to note here that any function in R, by default, returns the last computed value within the function even when there is no return statement. For instance in the code below, on call the test1
function, the value of y
is returned.
<- function(){
test1 = 5
x = x**2
y
}print(test1())
[1] 25
Quiz
Write a function to print prime numbers. The function should take a number (n) as an argument and return first n prime numbers.
<- function(n){
prime_nums <- 3
x <- c()
p_nums <- append(p_nums,x-1)
p_nums = TRUE
FLAG while (length(p_nums)<n) {
for(y in c(3:x-1)){
if(x %% y == 0){
<- FALSE
FLAG break
}else{
<- TRUE
FLAG
}
}if(FLAG){
<- append(p_nums,x)
p_nums
}<- x + 1
x
}return(p_nums)
}
print(prime_nums(5))
Since most of the time we are working with dataframes, it is important to understand how to call a function using values from a dataframe. The mutate
function from the dplyr
library makes it easy to call functions using column(s) as arguments. In the code below, we’ll first create a dataframe with three rows and three columns. Then we’ll add a new column by calling the evenSquareOddCube
function on the third column. Note that we need to use the vectorized version of our function since the argument (a column) is a vector.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
<- data.frame(matrix(1:9,nrow = 3)) #(replicate(3,1:5))
df1 colnames(df1) <- c("Col1","Col2","Col3")
df1
Col1 Col2 Col3
1 1 4 7
2 2 5 8
3 3 6 9
Apply a function to a column and add a new column
%>%
df1 mutate(Col4 = evenSquareOddCube(Col3))
Col1 Col2 Col3 Col4
1 1 4 7 343
2 2 5 8 64
3 3 6 9 729
Apply a function to multiple columns using the across
function. The columns can be mutated in-place i.e. the original values in the column(s) are replace with new values . The code below shows two scenarios where we call the evenSquareOddCube
function on columns 2 and 3. In the first case the column values are changed in-place which in the second case two new columns are added to the dataframe. Notice the evenSquareOddCube
is called with a dot (.
) as an argument.
%>%
df1 mutate(across(c("Col2","Col3"), ~ evenSquareOddCube(.)))
Col1 Col2 Col3
1 1 16 343
2 2 125 64
3 3 36 729
The .names
keyword argument for the across
function is used to specify the names for the new columns. Here, we’ll create new column names by taking the original column name and adding a suffix “_ESOC” to it.
%>%
df1 mutate(across(c("Col2","Col3"), ~ evenSquareOddCube(.), .names = "{col}_ESOC"))
Col1 Col2 Col3 Col2_ESOC Col3_ESOC
1 1 4 7 16 343
2 2 5 8 125 64
3 3 6 9 36 729
Quiz
Write a function that takes dataframe of numbers as an argument and return a new dataframe having “Even” and “Odd” as values corresponding to the numbers in the input dataframe. A sample input and output dataframes are given below.
Col1 Col2 Col3
1 1 4 7
2 2 5 8
3 3 6 9
Col1 Col2 Col3
1 Odd Even Odd
2 Even Odd Even
3 Odd Even Odd
<- function(a){
checkEvenOdd return(ifelse(a %% 2 == 0, "Even", "Odd"))
}<- data.frame(matrix(1:9,nrow = 3)) #(replicate(3,1:5))
df1 colnames(df1) <- c("Col1","Col2","Col3")
%>%
df1 mutate(across(c("Col1","Col2","Col3"), ~ checkEvenOdd(.)))