4  R Basics

4.1 Introduction

In programming there are certain standard concepts one need to know irrespective of the domain in which one is working. You can think of these concepts as like alphabets in a human language. We need to have a good understanding to the basics before go into applying our coding skills for data analysis. This chapter introduces some of the fundamental concepts in R; which we’ll be using in the rest of this book.

4.2 Packages

A package in R is a collection of functions (more about it later) and sample datasets. When we install R, a specific set of packages get installed. These include basic functions for input, output, file handling, etc. In fact, the package with these basic functions is called base. Another frequently used package that is part of the default installation is utils. All packages are installed in the library folder within the R installation path. To get a list of installed packages run library().

We can always install additional packages using install.packages(“<package name>”). Once a package is installed, load it in the R environment using library(<package name>).

4.3 Printing

The print function, as expected, prints the argument on the console. It take a single argument. Note that print is one of the the ‘default’ functions in R i.e. when a variable or an object is executed without any function, the print function is called with that object as an argument. This results in printing of the value of the variable or the object (as is) on the standard output (screen). When printing some text, a logical keyword argument quote (default is True) indicates whether to print with or without quotes.

print("hi")
[1] "hi"
"hello"
[1] "hello"
print("Hello World!", quote = FALSE)
[1] Hello World!

To print multiple items, cat (concatenate and print) should be used. Note that print add a newline after printing while cat doesn’t. So, to print on the next line with cat, explicit use of the newline character ("\n") is required. There are a couple of differences between the default behaviour of print and cat — unlike print, the cat function doesn’t prefix the line numbers to the output and there are no quotes around characters, by default, when using cat.

x <- 5
print("Hello World")
[1] "Hello World"
print(x)
[1] 5
cat ("Hello", "World", "\n")
Hello World 
cat (x)
5

The sep keyword argument for the cat function specifies the separator to use when printing different elements. The default separator is a blank space character.

nums <- c(1:10)
cat(nums,"\n")
1 2 3 4 5 6 7 8 9 10 
cat(nums, sep = ",")
1,2,3,4,5,6,7,8,9,10

4.4 Variables

A variable can be assigned a value using an operator. R offers five different operators to do this task — <-, <<-, ->, ->>, and =. The code below shows the syntax for assigning the value to five variables (x1 to x5) using different assignment operators. <- is the most preferred assignment operator.

x1 <- 5   #using <-  
x2 <<- 6  #using <<-
7 -> x3   #using ->
8 ->> x4  #using ->> 
x5 = 9    #using = 
cat(x1,x2,x3,x4,x5)
5 6 7 8 9

There are certain rules when it comes to naming a variable in R.

  • A variable name can be a combination of alpha-numeric characters.
  • The variable name must start with an alphabet.
  • Usage of two special characters such as underscore (_) and period (.) is allowed in variable names.

4.5 Data type

Each variable has some characteristics associated with it such as its name, value, data type,and memory location. The name and value aspects we have discussed above. Now let’s talk about the data type. For a variable, its data type indicates the kind of data that variable stores. There can be a variable that stores a number (as shown in the code above) or its can hold some text value. These different type of data have specific names in programming languages. For example, when we say x = 5, R implicitly assigns this variable a numeric data type. To check the data type for a variable, class function can be used. Note that the typeof, mode, and storage.mode functions also return the data type of a variable and are helpful when working with advanced data types such as matrices and arrays.

x = 5 
y = "hello" 
class(x) 
[1] "numeric"
class(y)
[1] "character"

Quiz

What would be the output of the following code?

x = "y" 
print(class(x))
Show answer
x = "y" 
print(class(x))  # character

4.6 Operators

4.7 Arithmetic operators

Table 1: Arithmetic operators
Symbol Meaning Example
+ Addition 2 + 2 = 4
- Subtraction 5 - 2 = 3
* Multiplication 2 * 5 = 10
/ Division 8 / 2 = 4
%% Modulus 8 %% 4 = 0
%/% Floor division 5 %/% 2 = 2
** Exponent 2 ** 3 = 8
^ Exponent 2 ^ 3 = 8
print(8%%4)  # Remainder after division 
[1] 0
print(5%/%2) # Quotient after division 
[1] 2
print(2**3)  # Raise to power
[1] 8
print(2^3)   # Raise to power
[1] 8

4.8 Comparison operators

Symbol Meaning Example
> Greater than 3 > 2 is True
< Less than 3 < 2 is False
== Equal to 2 == 3 is False
!= Not equal to 2 != 3 is True
>= Greater than or equal to 4 >= 2 is True
<= Less than or equal to 5 <= 2 is False

4.9 Logical operators

Symbol Meaning
and / & True if both operands are true
or / | True if either of the operands is true
not / ! True if operands are false

4.10 Special operators in R

R has certain unique operators that are not there in other programming languages e.g., the pipe operator, %>% (used to forward a value to the next function) and %in% (used to perform matching) etc. These operators are used to do specialized tasks. We’ll discuss the uses of these operators in the relevant context, once we have covered the basics. 

The operators :: and ::: are used to access variables in a namespace and require a string on both left and right of the operator. Similarly, $ and @ operators are used to access specific attributes associated with some of the datatypes.

{r} base::print}

4.11 User input

The readline function is used to get input from the user in an interactive manner. The prompt keyword argument can be used to display some message to the user.

name <- readline(prompt = "Enter you name: ")
print(name)

By default, an input from the console is assigned a class character. In case you would like to have numeric input form the user then you need to explicitly coerce the input to numeric type using as.numeric (see below for details).

4.12 is.<data type>

To check whether an object is of a particular data type or not, we can use the is.<type> function, which returns a Boolean value. E.g. to check whether an object is numeric, use is.numeric similarly to check if an object is of class character, use is.character.

x <- 4
y <- "4"
is.numeric(x) 
[1] TRUE
is.numeric(y) 
[1] FALSE
is.logical(y)
[1] FALSE

4.13 as.<data type>

There are occasions when we need to change the data type of an object to another data type. This can be achieved using the as.<type> function. The process of conversion of datatypes in R is called coercion. For instance, to convert a numeric data type to character, as.character function is used. Note that we cannot convert any data type to any other data type. There are certain rules that govern the process of coercion.

x <- 4
cat(x, class(x),"\n")
4 numeric 
y <- as.character(x)
cat(y, class(y))
4 character

The code below would throw an error since a string having alphabets cannot be coerced into number; which, of course, doesn’t make sense.

x <- "hello"
as.numeric(x)

Similarly, a vector can be coerced in a dataframe using as.data.frame.

a <- c(1:5)
b <- as.data.frame(a)
print(a)
[1] 1 2 3 4 5
print(b)
  a
1 1
2 2
3 3
4 4
5 5

The use of as.<type> to convert data type comes under explicit coercion. There are certain functions in R that do coercion, if required, implicitly. We’ll look such functions in the subsequent chapters.

Quiz

What will be the output of the following code?

y <- as.character(4)
print(is.logical(is.numeric(y)))
Show answer
y <- as.character(4)
print(is.logical(is.numeric(y)))

# TRUE