✔ It's working!
Lab 1: Getting Started
Preamble
The purpose of this lab is to ensure that you have a productive working environment for reproducible scientific computing.
After completing this lab, you should be able to render this Quarto document! You will know that you are done when you click on the “Render” button in RStudio, and then you see seven checkmarks in the output.
Why are we here?
The practice of science is suffering through a reproducibility crisis. The scientific community is realizing that many “findings”—even many that are published in reputable journals after careful peer-review—do not stand up under the scrutiny of replication. The reasons behind this are complex and wide-ranging.
Whatever your interests are, we want your work to be rigorous and replicable. This means that the data you analyze, the models you fit, and the tables and figures you produce, need to be reproducible.
Quarto is a new, open-source, cross-platform, document authoring software produced by Posit (née RStudio) that combines the data processing capabilities of R, Python, or Julia with the venerable authoring technologies Markdown, Pandoc, and \(\LaTeX\). Figure 1 illustrates the Quarto processing pipeline.
If you are already familiar with R Markdown, Quarto is like R Markdown 2.0, and is made by the same people.
In this class, you will learn how to import, wrangle, and visualize data using R within Quarto. You can use Quarto to produce production-quality reports in HTML and PDF formats for use in your companion SDS class(es). Figure Figure 2 shows several other coding languages (including Python) going into a Quarto document.
Lab instructions
Installing New Software
All of your day-to-day work in this class will be done inside a program called RStudio, which is an application that makes it easier to use the R programming language. To use this program, you’ll need to download and install both R and RStudio program on your computer. Follow the instructions below to install them, and then verify your installation is set-up correctly.
Step 1: Install R
R is a programming language and computing environment specialized for statistical analysis and data manipulation. It’s commonly used for performing statistical tests, creating data visualizations, and writing data analysis reports. Despite focusing on statistics, it’s a full-fledged programming language, and relatively easy to learn.
Installing R for Mac Computers
To figure out which version of R you need for your computer, click on the small apple icon in the top-left corner of your screen and choose “About this Mac” from the menu
Look at the “Processor” line in the pop-up, and take note of whether your computer uses an Intel processor, or an Apple M processor
Then, go to https://cloud.r-project.org/bin/macosx/
- If your computer uses an Intel processor, click on the file titled R-4.3.2-x86_64.pkg
- If your computer uses an Apple M processor, click on the file titled R-4.3.2-arm64.pkg
This will download the R installer to your computer’s download folder. Go to your download folder, and double-click on the R-4.3.2 installer you just downloaded. Follow the prompts on the screen to finish installing R. You can safely accept all the default settings without changing anything.
Installing R for Windows Computers
Go to https://cloud.r-project.org/bin/windows/base/ , and click the link titled Download R-4.3.2 for Windows
This will download the R installer to your computer’s download folder. Go to your download folder, and double-click on the R-4.3.2 installer you just downloaded. Follow the prompts on the screen to finish installing R. You can safely accept all the default settings without changing anything.
Installing R for Linux Computers
If you are using a Linux-based operating system, use your system’s package manager to install R. For example, here are the instructions for installing R on Ubuntu .
Installing R for Chromebooks
R cannot be installed on Chromebooks, so you’ll need to use the Smith RStudio Server to participate in this course.
If you want to see more visually detailed instructions for how to install R, you can watch this video tutorial if you have a Mac and this video tutorial if you’re using a Windows PC.
Note that some of the software versions number might change (e.g., R 4.3.2 instead of 4.1), but the steps you should follow will be exactly the same.
Step 2: Install RStudio
RStudio is an integrated development environment for reproducible scientific computing that caters to the R programming language. You will use it extensively in all of your SDS classes.
Instructions
- Download the latest, free version of RStudio Desktop. Be sure to get the version that is appropriate for your operating system.
- Install RStudio Desktop by launching the installer after it downloads. You can accept all the defaults during installation.
If you are using a Mac computer, make sure to drag the RStudio icon into your applications folder!
If you want to see more visually detailed instructions for how to install RStudio, you can watch this video tutorial if you have a Mac and this video tutorial if you’re using a Windows PC.
Note that some of the software versions number might change, the steps you should follow will be exactly the same.
Verification
First, open the RStudio program by clicking on the RStudio icon.
If you are using a Mac, you can find the RStudio icon by either:
- Opening a new Finder window
- Navigating to the Applications folder (which should be in the list of locations along the left pane)
- Scrolling down until your find the RStudio icon
Or, you can search for the word RStudio in the Spotlight search bar
If you are using a Windows PC, you can find the RStudio icon by:
- Clicking the Windows Start menu icon in the lower-left corner of the screen
- Searching for the word RStudio in the search bar
When RStudio opens, you should see a window like this:
If you’re using a Mac, you may see a pop-up window that looks like this when you open RStudio:
You can safely choose “Not Now” (we will not use the “git” command in this class) but the pop-up will continue to appear in the future.
You can also choose “Install” (which will stop this pop-up message from appearing in the future) but the installation process may take several minutes.
The RStudio application window is divided into four “panes”, which are labeled and color-coded in the diagram below
- The Console pane
- The Editor pane (not shown)
- The Environment pane
- The “Miscelleaneous” pane
Question
Copy and paste the code below into the console pane, and press the Enter key to run the code. If your installation of R is working correctly, you should see the R version printed out, and a message indicating that your version of R is up do date (like the message seen below).
<- R.Version()
r_version_info if (all(as.numeric(r_version_info$major) >= 4,
as.numeric(r_version_info$minor) > 3
)
) {<- "✔ Your version of R is up to date!"
msg else {
} <- "✖ Your version of R is out of date; You should update to version 4.3 now."
msg
}cat(r_version_info$version.string, msg, sep="\n")
R version 4.4.0 (2024-04-24)
✔ Your version of R is up to date!
Install necessary packages
Like many modern programming languages, R is modular, meaning that it relies on packages to provide additional functionality. There are thousands of R packages hosted on CRAN, and many more hosted on GitHub. In this course, we will focus on a handful of popular, well-crafted, useful packages.
In RStudio, the Packages tab displays a searchable list of packages that are installed on your computer.
You should get comfortable checking which packages you have installed, and installing new packages. You only have to install a package once.
Install rstudio.prefs
and customize your RStudio
RStudio has hundreds of options that can be configured so that each user can customize its behavior to their preferences. Unfortunately, several of its configuration options have default settings that make it more difficult to conduct reproducible data analyses. These settings can be changed using the Global Options GUI, but hunting down each setting and changing them one at a time is slow and tedious.
Instead, we’ll use the rstudio.prefs
package, which will allow use to change all the settings at once by executing a few R commands in the console. But first, we need to install the rstudio.prefs
package!
Instructions
Install the
rstudio.prefs
package. There are two main ways to install packages. You can use either, but you only need to install a package once.- In RStudio, you can install a package by clicking on the “Install” button in the Packages tab. Then type
rstudio.prefs
and click Install. - Alternatively, you can install a package using the
install.packages()
function. In this case, you would typeinstall.packages("rstudio.prefs")
into the R console, and press the Enter key to run the code
- In RStudio, you can install a package by clicking on the “Install” button in the Packages tab. Then type
Copy and paste the R code below into the console, and press the Enter key to run the code. This code will change several of RStudio’s default configuration options to make the program more user-friendly.
Before changing your settings, R will print out your pending changes, and ask if you want to continue. You can indicate “Yes” by typing a
y
into the console, and pressing the Enter key.library(rstudio.prefs) use_rstudio_prefs( save_workspace = "never", load_workspace = FALSE, restore_last_project = FALSE, restore_source_documents = FALSE, check_for_updates = FALSE, color_preview = FALSE, rmd_viewer_type = "pane", rmd_chunk_output_inline = FALSE )
If you get an error message in your console saying Error in library(rstudio.prefs) : there is no package called ‘rstudio.prefs’, return to Step 1 in this list and make sure you have finished installing the
rstudio.prefs
package.
Verification
Question
Copy and paste the R code below into the console and press the Enter key to run the code. If you have configured RStudio correctly by following the instructions above, you should see a message that says “✔ RStudio is correctly configured!”.
<- jsonlite::fromJSON(paste0(rstudio.prefs::rstudio_config_path(),
rstudio_config "/rstudio-prefs.json")
)<- c(
options_set $save_workspace == "never",
rstudio_config$load_workspace == FALSE,
rstudio_config$restore_last_project == FALSE,
rstudio_config$restore_source_documents == FALSE,
rstudio_config$check_for_updates == FALSE,
rstudio_config$color_preview == FALSE,
rstudio_config$rmd_viewer_type == "pane",
rstudio_config$rmd_chunk_output_inline == FALSE
rstudio_config
)
if (all(options_set)) {
::cli_alert_success("RStudio is correctly configured!")
clielse {
} ::cli_alert_danger("RStudio settings have not been correctly configured.")
cli }
Install tidyverse
and usethis
The tidyverse
is a meta-package that installs eight other commonly-used packages. The tidyverse
is developed by RStudio and has become a popular way to use R (Wickham et al. 2019). In this course, we will be using the tidyverse
extensively.
usethis
is package that helps set up projects and other packages. We’ll only be using it in this lab to verify our setup.
Instructions
- Install the
tidyverse
andusethis
packages by running the following code in your console (not in your Quarto document).
install.packages("tidyverse")
install.packages("usethis")
Verification
Question
Copy and paste the R code below into the console and press the Enter key to run the code. If an appropriate version of the tidyverse
package is properly installed, you should see two messages: “✔ tidyverse is installed and relatively up-to-date” and “✔ usethis is installed.”
<- suppressPackageStartupMessages(require(tidyverse))
has_tidyverse <- suppressPackageStartupMessages(require(usethis))
has_usethis
if (has_tidyverse) {
<- installed.packages() |>
tidyverse_version as_tibble() |>
filter(Package == "tidyverse") |>
pull(Version)
if (tidyverse_version > 2.0) {
::cli_alert_success("tidyverse is installed and relatively up-to-date.")
clielse {
} ::cli_alert_danger("tidyverse is installed but it is not up-to-date. Please update your packages.")
cli
}else {
} ui_oops("tidyverse could not be loaded.")
}if (has_usethis) {
::cli_alert_success("usethis is installed")
clielse {
} ui_oops("usethis could not be loaded")
}
Create a project environment for SDS 100
Now that you have R and RStudio installed, we are going to set up your working environment to maximize your productivity.
The work that you do in RStudio should be organized into Projects. Working within projects allows you to switch contexts safely, and keep your work organized. We recommend that you have at least two projects:
- one project for this class (SDS 100)
- one project for the companion SDS class you are taking
You can switch between Projects in RStudio at any time using the Projects dropdown menu in the upper-right corner of your screen.
Instructions
- Create a new project in RStudio named “SDS 100” inside your
Documents
folder. (On Windows this may be calledMy Documents
).
Warning: We strongly recommend that you not place your Project in any of the following places:
- Your Downloads folder
- Your Desktop
- A temporary (i.e., “temp”) folder
We also recommend that you not place your Project in a cloud-based storage location (e.g., your OneDrive folder or your iCloud folder, etc.)
Verification
Question
Copy and paste the code below into your R console, and press the Enter key to run the code. If your project is set up correctly, you should see a message saying “Your project is in a good place” (like the one shown in the image below).
<- tryCatch(
project :::proj_path(),
usethiserror = function(e) {
if (class(e)[1] == "usethis_error") {
FALSE
else {
} stop(e) # rethrow package not found error!
}
}
)
if (!isFALSE(project)) {
::cli_alert_success("Project found at: {.path {project}}. Your project is in a good place.")
clielse {
} ::cli_alert_danger(
cli"No project environment detected. Make sure you:\n
1. Create and open an R project for SDS 100.
2. Move your Quarto file into your project folder before rendering it
"
) }
Your First Quarto Document
Quarto is a software program that can be used inside of RStudio which allows you write narrative text and R code together within the same document. A Quarto document is a Markdown document that has an R console built right into it. This allows you to create a “final product” from all your data analysis work that contains all your R code and its output (like tables and figures) and all your written explanations, so you can easily share with other people exactly what you did, and what your results mean.
Now, it’s time for you to open and render your very first Quarto document! You’ll be using Quarto documents to complete your lab work throughout this course, so we’ll start getting used to the workflow of rendering and turning in your final products.
Instructions
- Download this Quarto file
- If this file has underscore (
_
) at the beginning on the file name after you download it, make sure to remove it from the file name.
- If this file has underscore (
- Move
lab_01_setup.qmd
from where it was downloaded (likely~/Downloads
or similar) to the folder you created when you made your project (~/Documents/SDS 100
).- There are lots of different ways to do this (e.g., using the Finder, Windows Explorer, or command line, etc.). Perhaps the easiest way is to open the
lab_01_setup.qmd
file, go to File -> Save As…, and navigate to your RStudio project directory.
- There are lots of different ways to do this (e.g., using the Finder, Windows Explorer, or command line, etc.). Perhaps the easiest way is to open the
- Open
lab_01_setup.qmd
in RStudio. - Click on the button near the top middle of the editor pane.
- Inspect the output in the Viewer Pane. How many checkmarks do you see?
Finding your rendered document
Open a new window in your computer’s file explorer program (Explorer if you’re on Windows, or Finder if you’re on a Mac), and navigate to the folder where you saved your SDS 100 R Project.
In this folder you should see a filed named lab_01_setup.html
. This is the output from the Quarto document you just rendered. If you double-click this file, it should open in your web browser, and you should see the same thing you saw in the RStudio Viewer pane.
Submitting this lab
Step 1: Compare rendered solutions
Each week, you will compare your solutions to ours. Does your file have the same outputs (checkmarks/messages)? It is expected that the exercises in the solutions file look a bit different than they do in the template.
Step 2: Complete the Moodle Quiz
Complete the Moodle quiz for this lab (Weekly Quiz 1). You can complete the quiz any time before 9:25am on the following Tuesday.
Optional reading
To prepare for next week’s lab on working in R, we recommend reading Chapter 2: R basics and workflows.