Lab 10: Referencing Figures and Tables

Preamble

The purpose of this lab is to learn how to integrate figures and tables into the narrative of the document using references.

After completing this lab, you should be able to create figures from static images or data graphics that you create and reference them by number automatically in the text. Similarly, you should be able to create tables from data or by hand and reference them automatically.

You will know that you are done when you have created two tables and two figures with captions and referenced them in the text.

Why are we here?

When you write a scientific paper, you must refer to any figure or table by number in the text. Moreover, you often include multiple tables, and sometimes you have to move those tables around during the revising process. Hard coding the numbers of the tables and figures into the text is an exercise in frustration, since if you change the order of the tables, you have to then go back and change all of the references to that table.

Instead, we use references to automatically refer to figures and tables by number. Once you learn how to references tables and figures automatically, you will never go back to referencing them manually again!

Preliminary work

Because Quarto is so new, this material is not really covered anywhere other than the section of the Quarto documentation on references. Quarto greatly improves on the figure and table referencing schemes in R Markdown, which were cobbled together.

In R Markdown:

  • it can be a struggle to get figure and table references to work seamlessly in both HTML and PDF outputs
  • the underlying system for referencing figures is different than the underlying system for referencing tables.

Quarto smooths over both of these rough edges: figure and table references work the same way, and work with all outputs.

Lab instructions

Start by creating a new .qmd document in the source editor by unchecking the “Use visual markdown editor” button. This document is specific to practice for this lab, and should be separate from your project rotation documents.

In this lab, we will use two functions from the knitr package, which you already have installed. Let’s begin by loading the tidyverse and knitr packages.

Question

Use library() to load the tidyverse and knitr packages.

Table from scratch

There is a syntax for making tables by hand in Markdown. However, since you are using R anyway, and R has powerful functions for manipulating data, it’s almost always a better option to create tables by constructing a data frame in R that holds the data you want to display in the table.

For example, suppose that want to eat once piece of fruit every weekday. However, you want to randomize which fruit you eat each day.

To do this, we first create a tibble called fruits with two columns that shows the fruits in alphabetical order.

Question

Use the tibble() function to create the fruits data frame. Display the fruits data frame.

Code
fruits <- tibble(
  weekday = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"),
  week1 = c("apple", "banana", "grapes", "orange", "pear")
)
fruits
# A tibble: 5 × 2
  weekday   week1 
  <chr>     <chr> 
1 Monday    apple 
2 Tuesday   banana
3 Wednesday grapes
4 Thursday  orange
5 Friday    pear  

Now that we have this data structure in R, we can modify it. In this case, we want to randomly select the order of fruit for subsequent weeks. We can do this using the mutate() function you learned about in Lab 4 and the sample() function.

Question

Use the mutate() function to add a third column to the fruits data frame that contains a random sample() of the week1 column. Call the resulting column week2.

Displaying the table

Next, we want to display the contents of the fruits data frame as a table. We do this using the kable() function from the knitr package. Render the document to see the difference between just printing fruits and using the kable() function.

Use the kable() function to display the fruits data frame as a table.

weekday week1 week2
Monday apple apple
Tuesday banana grapes
Wednesday grapes pear
Thursday orange banana
Friday pear orange

Referencing a table

To reference a table, we need three parts:

  • The chunk that creates the table (uses the kable() function) has to have a label, and the label has to start with tbl-, e.g. tbl-blah.
  • The chunk that creates the table has to have a caption created by the tbl-cap chunk option.
  • The text needs to refer to @tbl-blah. The link below that says “Table 1” comes from using the @ notation in that sentence.

For example, see @tbl-example below:

#| label: tbl-example
#| tbl-cap: "This is a table we are referencing in the text."
kable(mtcars)

To see this in action, consider Table 1, which shows the fruits.

Code
kable(fruits)
Table 1: These are the first two weeks for our fruits.
weekday week1 week2
Monday apple apple
Tuesday banana grapes
Wednesday grapes pear
Thursday orange banana
Friday pear orange
Question

Repeat the previous exercise to create fruits for weeks 3 and 4. Refer to the resulting table in the text by reference. Render your document to see the results.

Table 2: Fruits through week 4.
weekday week1 week2 week3 week4
Monday apple apple pear grapes
Tuesday banana grapes grapes banana
Wednesday grapes pear orange orange
Thursday orange banana banana apple
Friday pear orange apple pear

See Table 2 for a sample solution.

Table from data

The kable() function can help you create a table from any data frame.

In this section, we’ll illustrate two different techniques for producing commonly-used univariate summaries. This will build on material from Lab 5.

Data summaries from scratch

If you want total control of your data summary, build it from scratch using the dplyr functions you learned about in Lab 4 and Lab 5. This may be time-consuming, but it gives you full control.

See Table 3 for an example of a summary table constructed from scratch.

Code
starwars |>
  group_by(species) |>
  summarize(
    n = n(), 
    mean_height = mean(height),
    sd_height = sd(height)
  ) |>
  arrange(desc(n)) |>
  head() |>
  kable()
Table 3: Example
species n mean_height sd_height
Human 35 NA NA
Droid 6 NA NA
NA 4 175.0000 12.355835
Gungan 3 208.6667 14.189198
Kaminoan 2 221.0000 11.313709
Mirialan 2 168.0000 2.828427
Question

Use the na.rm argument to remove the missing data from the data summary shown in Table 3. See help(mean)

Question

Use the digits argument to the kable() function to round the numbers in Table 3 to one decimal place.

Data summaries from another package

There are a number of packages that provide functions for displaying data summaries.

For example, Table 4 shows the distribution of height and mass as displayed by the skim() function from the skimr package. Note that we include the skimr_include_summary: false chunk option in order to suppress some extraneous output.

Code
```{r}
#| label: tbl-skimr
#| tbl-cap: "Example"
#| skimr_include_summary: false
starwars |>
  skimr::skim(height, mass) |>
  rename(`Missing Obs.` = n_missing)
```

Variable type: numeric

Table 4: Example
skim_variable complete_rate mean sd p0 p25 p50 p75 p100 hist Missing Obs.
height 0.93 174.60 34.77 66 167.0 180 191.0 264 ▂▁▇▅▁ 6
mass 0.68 97.31 169.46 15 55.6 79 84.5 1358 ▇▁▁▁▁ 28
Question

Choose one of the approaches above to create a summary data table of the penguins data frame in the palmerpenguins package you learned about in Lab 3 and reference it in the text.

Figure from data

In Quarto, figures are embedded and referenced in a similar manner as tables.

You already know how to create figures from data in a Quarto document, having learned how to construct them in Lab 3 and how to polish them in Lab 6. To reference a figure in a Quarto document, we need the same things that we needed to reference a table:

  • The chunk that creates the figure has to have a label, and the label has to start with fig-, e.g. fig-blah.
  • The chunk that creates the figure has to have a caption created by the fig-cap chunk option.
  • The text needs to refer to @fig-blah.

For example, Figure 1 shows a scatterplot.

Code
ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point()

Figure 1: A scatterplot
Question

Create a data graphic and reference it in the text.

Figure from Internet

You can also create figures that come from the Internet.

In Lab 8, you learned how to embed an image in a Quarto using Markdown. Such images are embedded using this syntax:

![](https://pbs.twimg.com/profile_images/1048189234904010753/RUL5NyvY_400x400.jpg)

This works great, but it doesn’t create a figure that can be referenced automatically. To create a figure that we can reference, we use the include_graphics() function from the knitr package. Its only required argument is a path to the image you want to embed. The path can be either a URL or a path to an image file on your computer.

You will still need to set the label and fig-cap chunk options in order to cross-reference this image.

Our final example shows the SDS logo in Figure 2.

Code
include_graphics("https://pbs.twimg.com/profile_images/1048189234904010753/RUL5NyvY_400x400.jpg")
Question

Find an image on the Internet, embed it in your Quarto document using include_graphics(), and reference it in the text.

Submitting this Lab

For this lab submission, you will be practicing creating summaries for one of the datasets in your group’s rotation and then referencing those tables in your .qmd file.

Step 1: Final Project Prep, Rotation 3 - Summary Statistics and Tables

In this first step, you will build on a previous .qmd file.

Step 1.1: Get Rotation Document

Before beginning this lab submission, you will need to get the rotation document from your fellow group mate. If you are unsure whose document you are inheriting, consult your Lab 8 submission for your group’s rotation schedule.

Step 1.2: Two views introducing your data

In this step, you are creating two views to get to know this new-to-you dataset.

  • First, create a list of the variable names (you can use the colnames() function). Based on these names, what kinds of data (categorical, ordinal, or numerical) do you expect to see?
  • Second, create a glimpse of the data. Does the glimpse match your expectations from the variable names? Note in prose two things that you notice about the data.

Step 1.3: Table of Summary Statistics

Now that you have a general sense of this data, create tables of summary statistics for four of the variables. Your tables should include at least one numeric variable and at least one categorical variable.

In prose, write at least two sentences about the data based on these statistics.

Step 1.4: Add cross-references

Now that you have both the tables and text about the tables, in this step you will link them together using cross-references. In your descriptions, use the Markdown formatting introduced in this lab to explicitly reference the tables. [You can check that your references are working properly by changing the order of your tables.]

Step 1.5: Submit this document on Moodle AND rotate this document

Once you are happy with your tables and descriptions (including the cross-references!), render your document. Check that everything renders as you expect and then:

  • Submit both the .qmd file and the .html file for this rotation document on Moodle under “Project Rotation 3” AND
  • Send the Quarto document (and any additional files need for rendering it, such as any local .csv files) to the next person in your group’s rotation.

Step 2: Complete the reading

Lab 11 has a set of pre-readings to complete to facilitate a fruitful discussion surrounding ethical codes for data science. To lighten the workload, we will divide up the readings among your final project group as follows:

Everyone reads the following two (very short) readings:

The following 3 readings should be divided among your group. Each person will do 1 of the following readings, and present the key points during class.

Step 3: Complete the Moodle Quiz

Complete the Moodle quiz for this lab.

References

Drum, Kevin. 2013. “It’s the Austerity, Stupid: How We Were Sold an Economy-Killing Lie.” Mother Jones. https://www.motherjones.com/politics/2013/09/austerity-reinhart-rogoff-stimulus-debt-ceiling/.
Elliott, Alan C, S Lynne Stokes, and Jing Cao. 2018. “Teaching Ethics in a Statistics Curriculum with a Cross-Cultural Emphasis.” The American Statistician 72 (4): 359–67. https://doi.org/10.1080/00031305.2017.1307140.
“Fact Sheet: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence.” 2023. The White House. https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/.