Now that we have R installed, let’s start scratching the surface by learning about its basic functionality. In this tutorial, you will be learning:
How to use R as a calculator
How to create objects
A couple of different data types in R
What a function is and how to use them
How to convert one data type to another
How to create a vector
How to load a package
Before we start writing any code, it’s important to understand some basic aspects of how R’s code works:
R is case-sensitive.
a
has a different meaning than A
within R, so you’ll want to use consistent capitalization throughout.We execute commands through the use of functions, which typically end with a parenthetical.
mean()
is a function that tells R to calculate the mean from a list of numbers.You can add comments to a line of code by appending a pound sign before the start of the comment; this is useful for documenting specific lines code
2+3 # This adds two and three
. The command 2+3
will be executed by R but what follows the pound sign (# This adds two and three
) will be ignored.By default, R will use your “home directory” as the “working directory.” (For me, that might be C:\Users\Rodrigo
in Windows and /Users/rodrigo
in macOS.)
To make it easier to organize your files, I strongly recommend creating a new “project” for each assignment. You can create a project by going to File
, New Project
. When you open that project, it will set the working directory to the folder you created for that specific assignment or exercise.
You can manually set your working directory in RStudio by using the Files
tab in the bottom-right pane, going to the desired folder, clicking on the More
icon at the top of the Files tab, and selecting Set As Working Directory
.
Working directories are important because they will keep your files in a predictable place. For example, if you have set your working directory (or created a project), you’ll know where to look for a file when we create it using R.
R is an extremely powerful calculator. To perform any mathematical operation, just type in the equation into your code chunk (or straight into the console).
In R, you’ll use +
and -
for addition and subtraction. For multiplication and division, you’ll use *
and /
, respectively. You’ll use ^
for exponents and parentheses ((
and )
) to organize the order of operations. (Remember, Please Excuse My Dear Aunt Sally?)
For example, this is how we would compute two plus three:
2 + 3
## [1] 5
The first box above is the code that we would enter into an R code chunk. For your convenience, there’s some syntax highlighting, to help you separate numbers from operators.
The second box, which is in all white, is the output (result) that R will give us after running our code.
This will be the pattern for all of the tutorials going forward.
If we wanted to calculate the result of four plus two, times eight, we would write:
(4+2)*8
## [1] 48
=== It does not matter if you include spaces between the numbers and operators. For example, 4+2
is exactly the same as 4 + 2
. Use whatever is most readable for you. ===
One nice thing about working with R is that it allows you to store information into objects that we can refer back to later on.
We can do this with the following syntax: object <- operation
.
What this tells R is: perform an operation and assign the output into a named object. (The left arrow, <-
, is called the assignment operator.)
Let’s show this off by assigning the operation 4+2
to an object called a
. Then, we can just call a
whenever we want the result of that operation.
a <- 4 + 2
a
## [1] 6
If we want to multiply the result of that operation by 8
, we can just do:
a <- 4 + 2
a*8
## [1] 48
If we wanted to subtract my favorite number (rodrigos_fave
) from your’s (your_fave
), here’s how we’d do that:
rodrigos_fave <- 36
your_fave <- 7 # Replace 7 with your favorite number
your_fave - rodrigos_fave
## [1] -29
R has a handful of different data types. We’ll cover these types as they come up but we’ll start with two very important ones. The first type is numeric
(num
) and it refers to real or decimal numbers. The second is character
(chr
) and it covers text (strings). It’s important to understand that if an object is stored as a string, you cannot perform a mathematical operation directly on it.
Here is an illustration of the above example. Notice how the numbers stored in the object your_fave
are wrapped in quotation marks ("
). The quotation marks make it a text value (i.e., character data type).
rodrigos_fave <- 36
your_fave <- "7"
your_fave - rodrigos_fave
## Error in your_fave - rodrigos_fave: non-numeric argument to binary operator
Unsurprisingly, R will give us an error because we are trying to perform a mathematical operation (subtraction) using a non-numeric object.
However, we can often translate between data types (e.g., from numeric to character and vice versa). We just need to make sure we do so explicitly before running an operation. Before we convert between types, let’s take a step back and learn about “functions.”
One of the great things about R is that it gives us a multitude of functions we can use to perform myriad operations. You can think of a function as a wrapper (a single command) that executes a series of commands that allow you to do things to your data.
Some of these functions come with R, others can be installed via optional packages (more on that shortly). You can also create your own functions at any point to help you complete repetitive tasks. (We won’t create functions in this tutorial, though.)
For now, there are two parts of a function that you need to remember about functions. The first is the function name (what we use to call the function) and the second are the arguments (what options the function should use, what objects it should be applied on/with, etc.).
Functions are expressed by the function name followed by a parenthetical that includes the arguments you want to supply it with (i.e., function_name(arguments)
).
For example, we can check the data type of an object by using the str()
function. The str()
function requires us to specify one argument: the name of the object we want to check.
Let’s do that for the two objects we defined:
rodrigos_fave <- 36
str(rodrigos_fave)
## num 36
your_fave <- "7"
str(your_fave)
## chr "7"
Our code chunk gave us two lines of output, one for each operation we ran. The first line tells us rodrigos_fave
is num
(numerical). The second tells us your_fave
is chr
(character).
RStudio has a Help
tab, usually on the bottom-right panel. After you click on it, you can get help for every single loaded function by clicking the text box with a loupe on it. Just type the name of the function and it will describe it, list all the arguments it accepts, and provide some examples.
R’s help system is extremely useful, though it can be a bit daunting at first because of all the information it provides you. You’ll also find yourself Googling a lot and ending up on websites like Stack Overflow and Quora, in addition to different developer blogs.
One nifty way to move between numerical and character types is to use the as.numeric()
and as.character()
functions.
Here’s how we’d turn the value of your_fave
from chr
to num
:
your_fave <- "7"
as.numeric(your_fave)
## [1] 7
Notice how the quotation marks are now gone, suggesting it is now being treated as a number.
We can confirm the conversion succeeded by wrapping the as.numeric()
function within the str()
function to effectively perform two operations in one sequence:
your_fave <- "7"
str(as.numeric(your_fave))
## num 7
We could have done this in two lines by assigning the result of as.numeric(your_fave)
to an object (e.g., first_operation
) and then supplying that object to the str()
function (i.e., str(first_operation)
).
It is important to note here that our original your_fave
object remains of chr
type. That is because we never re-assigned the result of the operation back onto the original object.
If we want to permanently change it back into a num
object, we’ll need to recreate the object. In this block of code, we’ll assign the result of the as.numeric()
operation back into our original object of your_fave
. We’ll then double-check it by using the str()
function.
your_fave <- "7"
your_fave <- as.numeric(your_fave)
str(your_fave)
## num 7
We could also store the conversion in a new object altogether, like your_fave_num
. That way, you can always easily go back to the unchanged object (your_fave
).
R also has different data structures for its objects. We’ll cover different structures throughout the course as the need arises. One very important data structure to know about now is the vector.
Vectors allow us to store multiple values of the same data type into a single object. (We can’t mix numbers and text within a single vector. If there’s a single chr
element in the vector, R will automatically make all the elements chr
.)
We can create a vector using the c()
(concatenate) function. With c()
, each argument will be a different element that we’re adding to that object. (Each argument is separated by a comma.)
For example, here is how we would create a vector comprised of the following five numbers: 1
, 5
, 7
, 5
, and 22
.
c(1, 5, 7, 5, 22)
## [1] 1 5 7 5 22
Vectors are useful for a number of different operations. To illustrate, I’ll start by storing that vector into an object (rodrigos_vector
):
rodrigos_vector <- c(1, 5, 7, 22, 5)
str(rodrigos_vector)
## num [1:5] 1 5 7 22 5
First, take note that this is a numeric vector, as shown by the num
. Second, we can see that there are five elements in our vector ([1:5]
). This is helpful because we can pick out specific elements within a vector by subscripting. (More on that later.) But this should explain the [1]
in some of the earlier output: we were actually getting a single-element vector as the result of the mathematical operations we ran earlier on.
Now, let’s divide rodrigos_vector
by 2:
rodrigos_vector/2
## [1] 0.5 2.5 3.5 11.0 2.5
As we can see in our output, each element in rodrigos_vector
was divided by two.
Even more useful than that is the fact that I can pair a vector with functions like mean()
and max()
to take the mean from that sequence of numbers and identify the highest number within it.
For example, here’s how we can take the mean of rodrigos_vector
:
rodrigos_vector <- c(1, 5, 7, 22, 5)
mean(rodrigos_vector)
## [1] 8
Our output gives us a single number (specifically, a vector comprised of a single element) that represents the mean of the original rodrigos_vector
.
One of the great things about R is that it is modular and has a huge community supporting it. What this means is that anyone can add functionality to R and share that functionality with the rest of the world. We call those modules packages, which will give us access to new functions.
We are going to make extensive use of a small set of packages throughout this book. To use a package, we first need to install it. To install a package, click on Tools
and Install Packages
. (Unless I tell you otherwise, keep CRAN selected for the repository, don’t change the install directory, and make sure “install dependencies” is checked.)
You can also install packages by using the install.packages()
function in your console. I don’t recommend adding that code to your R Notebook, though, because it will result in your reinstalling the package every time you knit the document.
One key package that we will be using frequently is called tidyverse
, which is actually a metapackage that contains a number of very helpful packages, such as readr
(which helps R intelligently read CSV files), dplyr
(which helps us organize and slice up data), and ggplot2
(which helps us create some data visualizations).
Try installing the tidyverse
package now. Again, we’ll be using it a lot in this book.
Installing a package is just the first step, though. You’ll always need to load a package prior to using one of its functions — and that includes when you restart RStudio, if you don’t save your workspace.
You can load a package by using the library()
function, supplying the name of the package you want to load as an argument (i.e., putting it within the parentheses).
For example, we can now load the tidyverse
package (and, consequently, readr
) by doing the following:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Oftentimes, nothing will appear to have happened when you execute the code to load a library. That’s because we haven’t used it for anything yet. Sometimes, it will give you a warning message, as is the case here. This warning is fine — it’s just telling us the different tidyverse
packages being loaded (e.g., ggplot2
, tibble
, etc.), and is telling us that there are a couple of small conflicts. We don’t need to worry about these for now.
You only need to install a package once in R, but you’ll need to load it with every R Notebook you produce. You’ll need to insert the package-loading code (library()
) before executing any of the package’s functions. Thus, it’s common practice to include the code to load packages at the start of an R Notebook.
Going forward, I may refer to a function as follows: readr::read_csv()
.
What this means is that we’ll use the read_csv()
function that is part of the readr
package.
Put another way, without loading readr
(by itself or through tidyverse
), you won’t be able to access read_csv()
.
This notation is also important because, sometimes, two different packages will use the same name for a function that does different things. (After all, anyone can create a package and do so without knowing the function names that others have used.) So, if I were to use the read_csv()
function and I only loaded the readr
package, R would reason that I want to use the function from readr
. However, if I had also loaded a package called another_csv_reader
that also had a function named read_csv()
, R would simply call the function from the most recently loaded package.
If I want to explicitly tell R which package to load the function from, I would use the package name::function()
notation in my code.