Lab 1: Getting Started

Preamble

The purpose of this lab is to ensure that you have a productive working environment for reproducible scientific computing.

After completing this lab, you should be able to render this Quarto document! You will know that you are done when you click on the “Render” button in RStudio, and then you see seven checkmarks in the output.

✔ It's working!

Why are we here?

The practice of science is suffering through a reproducibility crisis. The scientific community is realizing that many “findings”—even many that are published in reputable journals after careful peer-review—do not stand up under the scrutiny of replication. The reasons behind this are complex and wide-ranging.

Whatever your interests are, we want your work to be rigorous and replicable. This means that the data you analyze, the models you fit, and the tables and figures you produce, need to be reproducible.

Quarto is a new, open-source, cross-platform, document authoring software produced by Posit (née RStudio) that combines the data processing capabilities of R, Python, or Julia with the venerable authoring technologies Markdown, Pandoc, and \(\LaTeX\). Figure 1 illustrates the Quarto processing pipeline.

Figure 1: A Quarto document (with the .qmd file extension) can use either the knitr package to render R code, or Jupyter to render Python code into a plain Markdown document (.md). That Markdown document can then be rendered by Pandoc into a variety of output formats.
Note

If you are already familiar with R Markdown, Quarto is like R Markdown 2.0, and is made by the same people.

In this class, you will learn how to import, wrangle, and visualize data using R within Quarto. You can use Quarto to produce production-quality reports in HTML and PDF formats for use in your companion SDS class(es). Figure Figure 2 shows several other coding languages (including Python) going into a Quarto document.

Figure 2: R is just one of the computing languages that can be used in a Quarto document.

Lab instructions

Installing New Software

All of your day-to-day work in this class will be done inside a program called RStudio, which is an application that makes it easier to use the R programming language. To use this program, you’ll need to download and install both R and RStudio program on your computer. Follow the instructions below to install them, and then verify your installation is set-up correctly.

Step 1: Install R

R is a programming language and computing environment specialized for statistical analysis and data manipulation. It’s commonly used for performing statistical tests, creating data visualizations, and writing data analysis reports. Despite focusing on statistics, it’s a full-fledged programming language, and relatively easy to learn.

Installing R for Mac Computers

To figure out which version of R you need for your computer, click on the small apple icon in the top-left corner of your screen and choose “About this Mac” from the menu

Look at the “Processor” line in the pop-up, and take note of whether your computer uses an Intel processor, or an Apple M processor

Example of an Apple M processor

Example of an Intel processor

Then, go to https://cloud.r-project.org/bin/macosx/

  • If your computer uses an Intel processor, click on the file titled R-4.3.2-x86_64.pkg
  • If your computer uses an Apple M processor, click on the file titled R-4.3.2-arm64.pkg

This will download the R installer to your computer’s download folder. Go to your download folder, and double-click on the R-4.3.2 installer you just downloaded. Follow the prompts on the screen to finish installing R. You can safely accept all the default settings without changing anything.

Installing R for Windows Computers

Go to https://cloud.r-project.org/bin/windows/base/ , and click the link titled Download R-4.3.2 for Windows

This will download the R installer to your computer’s download folder. Go to your download folder, and double-click on the R-4.3.2 installer you just downloaded. Follow the prompts on the screen to finish installing R. You can safely accept all the default settings without changing anything.

Installing R for Linux Computers

If you are using a Linux-based operating system, use your system’s package manager to install R. For example, here are the instructions for installing R on Ubuntu .

Installing R for Chromebooks

R cannot be installed on Chromebooks, so you’ll need to use the Smith RStudio Server to participate in this course.

Note

If you want to see more visually detailed instructions for how to install R, you can watch this video tutorial if you have a Mac and this video tutorial if you’re using a Windows PC.

Note that some of the software versions number might change (e.g., R 4.3.2 instead of 4.1), but the steps you should follow will be exactly the same.

Step 2: Install RStudio

RStudio is an integrated development environment for reproducible scientific computing that caters to the R programming language. You will use it extensively in all of your SDS classes.

Instructions

  1. Download the latest, free version of RStudio Desktop. Be sure to get the version that is appropriate for your operating system.
  2. Install RStudio Desktop by launching the installer after it downloads. You can accept all the defaults during installation.
  • If you are using a Mac computer, make sure to drag the RStudio icon into your applications folder!

Note

If you want to see more visually detailed instructions for how to install RStudio, you can watch this video tutorial if you have a Mac and this video tutorial if you’re using a Windows PC.

Note that some of the software versions number might change, the steps you should follow will be exactly the same.

Verification

First, open the RStudio program by clicking on the RStudio icon.

  • If you are using a Mac, you can find the RStudio icon by either:

    1. Opening a new Finder window
    2. Navigating to the Applications folder (which should be in the list of locations along the left pane)
    3. Scrolling down until your find the RStudio icon

    Or, you can search for the word RStudio in the Spotlight search bar

  • If you are using a Windows PC, you can find the RStudio icon by:

    1. Clicking the Windows Start menu icon in the lower-left corner of the screen
    2. Searching for the word RStudio in the search bar

When RStudio opens, you should see a window like this:

Note

If you’re using a Mac, you may see a pop-up window that looks like this when you open RStudio:

You can safely choose “Not Now” (we will not use the “git” command in this class) but the pop-up will continue to appear in the future.

You can also choose “Install” (which will stop this pop-up message from appearing in the future) but the installation process may take several minutes.

The RStudio application window is divided into four “panes”, which are labeled and color-coded in the diagram below

  1. The Console pane
  2. The Editor pane (not shown)
  3. The Environment pane
  4. The “Miscelleaneous” pane

Question

Copy and paste the code below into the console pane, and press the Enter key to run the code. If your installation of R is working correctly, you should see the R version printed out, and a message indicating that your version of R is up do date (like the message seen below).

r_version_info <- R.Version()
if (all(as.numeric(r_version_info$major) >= 4, 
        as.numeric(r_version_info$minor) > 3
        )
    ) {
  msg <- "✔ Your version of R is up to date!"
} else {
  msg <- "✖ Your version of R is out of date; You should update to version 4.3 now."
}
cat(r_version_info$version.string, msg, sep="\n")
R version 4.4.0 (2024-04-24)
✔ Your version of R is up to date!

Install necessary packages

Like many modern programming languages, R is modular, meaning that it relies on packages to provide additional functionality. There are thousands of R packages hosted on CRAN, and many more hosted on GitHub. In this course, we will focus on a handful of popular, well-crafted, useful packages.

In RStudio, the Packages tab displays a searchable list of packages that are installed on your computer.

You should get comfortable checking which packages you have installed, and installing new packages. You only have to install a package once.

Install rstudio.prefs and customize your RStudio

RStudio has hundreds of options that can be configured so that each user can customize its behavior to their preferences. Unfortunately, several of its configuration options have default settings that make it more difficult to conduct reproducible data analyses. These settings can be changed using the Global Options GUI, but hunting down each setting and changing them one at a time is slow and tedious.

Instead, we’ll use the rstudio.prefs package, which will allow use to change all the settings at once by executing a few R commands in the console. But first, we need to install the rstudio.prefs package!

Instructions

  1. Install the rstudio.prefs package. There are two main ways to install packages. You can use either, but you only need to install a package once.

    • In RStudio, you can install a package by clicking on the “Install” button in the Packages tab. Then type rstudio.prefs and click Install.
    • Alternatively, you can install a package using the install.packages() function. In this case, you would type install.packages("rstudio.prefs") into the R console, and press the Enter key to run the code
  2. Copy and paste the R code below into the console, and press the Enter key to run the code. This code will change several of RStudio’s default configuration options to make the program more user-friendly.

    Before changing your settings, R will print out your pending changes, and ask if you want to continue. You can indicate “Yes” by typing a y into the console, and pressing the Enter key.

    library(rstudio.prefs)
    
    use_rstudio_prefs(
      save_workspace = "never",
      load_workspace = FALSE,
      restore_last_project = FALSE,
      restore_source_documents = FALSE,
      check_for_updates = FALSE,
      color_preview = FALSE,
      rmd_viewer_type = "pane",
      rmd_chunk_output_inline   = FALSE
    )

    If you get an error message in your console saying Error in library(rstudio.prefs) : there is no package called ‘rstudio.prefs’, return to Step 1 in this list and make sure you have finished installing the rstudio.prefs package.

Verification

Question

Copy and paste the R code below into the console and press the Enter key to run the code. If you have configured RStudio correctly by following the instructions above, you should see a message that says “✔ RStudio is correctly configured!”.

rstudio_config <- jsonlite::fromJSON(paste0(rstudio.prefs::rstudio_config_path(),
                                            "/rstudio-prefs.json")
                                     )
options_set <- c(
  rstudio_config$save_workspace == "never",
  rstudio_config$load_workspace == FALSE,
  rstudio_config$restore_last_project == FALSE,
  rstudio_config$restore_source_documents == FALSE,
  rstudio_config$check_for_updates == FALSE,
  rstudio_config$color_preview == FALSE,
  rstudio_config$rmd_viewer_type == "pane",
  rstudio_config$rmd_chunk_output_inline == FALSE
  )

if (all(options_set)) {
  cli::cli_alert_success("RStudio is correctly configured!")
} else {
  cli::cli_alert_danger("RStudio settings have not been correctly configured.")
}

Install tidyverse and usethis

The tidyverse is a meta-package that installs eight other commonly-used packages. The tidyverse is developed by RStudio and has become a popular way to use R (Wickham et al. 2019). In this course, we will be using the tidyverse extensively.

usethis is package that helps set up projects and other packages. We’ll only be using it in this lab to verify our setup.

Instructions

  1. Install the tidyverse and usethis packages by running the following code in your console (not in your Quarto document).
install.packages("tidyverse")
install.packages("usethis")

Verification

Question

Copy and paste the R code below into the console and press the Enter key to run the code. If an appropriate version of the tidyverse package is properly installed, you should see two messages: “✔ tidyverse is installed and relatively up-to-date” and “✔ usethis is installed.”

has_tidyverse <- suppressPackageStartupMessages(require(tidyverse))
has_usethis <- suppressPackageStartupMessages(require(usethis))

if (has_tidyverse) {
  tidyverse_version <- installed.packages() |>
    as_tibble() |>
    filter(Package == "tidyverse") |>
    pull(Version) 
  
  if (tidyverse_version > 2.0) {
    cli::cli_alert_success("tidyverse is installed and relatively up-to-date.")
  } else {
    cli::cli_alert_danger("tidyverse is installed but it is not up-to-date. Please update your packages.")
  }
} else {
  ui_oops("tidyverse could not be loaded.")
}
if (has_usethis) {
  cli::cli_alert_success("usethis is installed")
} else {
  ui_oops("usethis could not be loaded")
}

Create a project environment for SDS 100

Now that you have R and RStudio installed, we are going to set up your working environment to maximize your productivity.

The work that you do in RStudio should be organized into Projects. Working within projects allows you to switch contexts safely, and keep your work organized. We recommend that you have at least two projects:

  • one project for this class (SDS 100)
  • one project for the companion SDS class you are taking

You can switch between Projects in RStudio at any time using the Projects dropdown menu in the upper-right corner of your screen.

Instructions

  1. Create a new project in RStudio named “SDS 100” inside your Documents folder. (On Windows this may be called My Documents).

Step 1

Figure 3: Click “Project:” in the top right of RStudio.

Step 2

Step 3

Figure 4: Select “New Project” from the dropdown. Click “New Directory” in the pop-up window that opens.

Step 4

Step 5

Figure 5: Click “New Project.” A window will appear with two fields. In the first, “Directory name:”, put SDS 100. For the second, click the button to the right that says “Browse.”

Step 6

Step 7

Figure 6: In the pop-up window that appears, navigate to your Documents folder, then select it and click “Open.” This will fill out the second field in the pop-up window. Ensure that both fields are correct, then click “Create Project.”

Warning

Warning: We strongly recommend that you not place your Project in any of the following places:

  • Your Downloads folder
  • Your Desktop
  • A temporary (i.e., “temp”) folder

We also recommend that you not place your Project in a cloud-based storage location (e.g., your OneDrive folder or your iCloud folder, etc.)

Verification

Question

Copy and paste the code below into your R console, and press the Enter key to run the code. If your project is set up correctly, you should see a message saying “Your project is in a good place” (like the one shown in the image below).

project <- tryCatch(
  usethis:::proj_path(),
  error = function(e) {
    if (class(e)[1] == "usethis_error") {
      FALSE
    } else {
      stop(e) # rethrow package not found error!
    }
  }
)

if (!isFALSE(project)) {
  cli::cli_alert_success("Project found at: {.path {project}}. Your project is in a good place.")
} else {
  cli::cli_alert_danger(
    "No project environment detected. Make sure you:\n
     1. Create and open an R project for SDS 100.
     2. Move your Quarto file into your project folder before rendering it
    "
  )
}

Your First Quarto Document

Quarto is a software program that can be used inside of RStudio which allows you write narrative text and R code together within the same document. A Quarto document is a Markdown document that has an R console built right into it. This allows you to create a “final product” from all your data analysis work that contains all your R code and its output (like tables and figures) and all your written explanations, so you can easily share with other people exactly what you did, and what your results mean.

Now, it’s time for you to open and render your very first Quarto document! You’ll be using Quarto documents to complete your lab work throughout this course, so we’ll start getting used to the workflow of rendering and turning in your final products.

Instructions

  1. Download this Quarto file
    • If this file has underscore (_) at the beginning on the file name after you download it, make sure to remove it from the file name.
  2. Move lab_01_setup.qmd from where it was downloaded (likely ~/Downloads or similar) to the folder you created when you made your project (~/Documents/SDS 100).
    • There are lots of different ways to do this (e.g., using the Finder, Windows Explorer, or command line, etc.). Perhaps the easiest way is to open the lab_01_setup.qmd file, go to File -> Save As…, and navigate to your RStudio project directory.
  3. Open lab_01_setup.qmd in RStudio.
  4. Click on the render button button near the top middle of the editor pane.
  5. Inspect the output in the Viewer Pane. How many checkmarks do you see?

Finding your rendered document

Open a new window in your computer’s file explorer program (Explorer if you’re on Windows, or Finder if you’re on a Mac), and navigate to the folder where you saved your SDS 100 R Project.

In this folder you should see a filed named lab_01_setup.html. This is the output from the Quarto document you just rendered. If you double-click this file, it should open in your web browser, and you should see the same thing you saw in the RStudio Viewer pane.

Submitting this lab

Step 1: Compare rendered solutions

Each week, you will compare your solutions to ours. Does your file have the same outputs (checkmarks/messages)? It is expected that the exercises in the solutions file look a bit different than they do in the template.

Step 2: Complete the Moodle Quiz

Complete the Moodle quiz for this lab (Weekly Quiz 1). You can complete the quiz any time before 9:25am on the following Tuesday.

Optional reading

To prepare for next week’s lab on working in R, we recommend reading Chapter 2: R basics and workflows.

References

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.