Lab 5: Summarizing Data

Question

Load the mosaicData R package in your Quarto document, and then visit the documentation page for the SaratogaHouses data set.

Question

Import the Saratoga Houses data set into your environment using the command SaratogaHouses <- mosaicData::SaratogaHouses in your Quarto document. Use nrow() to compute the number of rows in SaratogaHouses.

[1] 1728

Question

How old was a “typical” home in Saratoga County during 2006?. To find out, use summarize() to compute the median age of all the homes in the SaratogaHouses data set. Be sure to think about what “ingredients” you need to supply for your code to work!

  median_age
1         19
Question

What was the average amount of living space of a home for sale in Saratoga County during 2006? Write your own code “from scratch” to compute this summary.

Be sure to think about:

  1. What variable in the data set measures home size?
  2. What function in R calculates the average?
  3. What are the units in the output
  4. What a good name for your summary would be!
  avg_size
1 1754.976
Question

Does the price of a home depend on the type of heating system it has?

To find out, insert a new code chunk into your Quarto document, and copy the code template below into this new code chunk.

Then, modify the “blanks” in this template to compute the median price of of a home, given its type of heating system.

Be sure to look back at the variables in the data set to figure out which one can tell you what type of heating system a home uses.

# A tibble: 3 × 2
  heating         median_price
  <fct>                  <dbl>
1 hot air               200000
2 hot water/steam       199700
3 electric              149000
Question

How does the average size of a home depend on the number of bedrooms in the home? Write your own code “from scratch” to compute this summary. Be sure to think about

  1. Which variable should be the “grouping” variable (i.e., which variable you should look at to figure out which group a home belongs to) and
  2. Which variable should be the summarized variable?
# A tibble: 7 × 2
  bedrooms avg_size
     <int>    <dbl>
1        1     885.
2        2    1202.
3        3    1628.
4        4    2273.
5        5    2476.
6        6    3060.
7        7    2521.
Question

Which type heating fuel is used the most frequently? We can use the n() function to find out!

Investigate by inserting a new code chunk into your Quarto document, and copy the code template below into this new code chunk.

Then, modify the “blanks” in this code chunk to count the number of homes that use electric, gas, or oil as their heating fuel.

Be sure to look back in the documentation for the SaratogaHouses data set to figure out which variable measures what kind of fuel a home uses, so you know which variable to group the data by!

# A tibble: 3 × 2
  fuel     num_homes
  <fct>        <int>
1 gas           1197
2 electric       315
3 oil            216
Question

Write your own code “from scratch” to calculate how many homes in the data set do and do not have a waterfront on their property.

# A tibble: 2 × 2
  waterfront num_homes
  <fct>          <int>
1 Yes               15
2 No              1713
Question

Insert a new code chunk into your Quarto document, and copy code below into that code chunk. Press the green “play” button to run this code, and then explain in words what this code is doing.

This code reaches into the palmerpenguins package, makes a copy of the penguins data set, and stores this copy of these data as an object in the environment named penguins.

Question

Use the is.na function and the penguins data set to see only penguins with missing body mass.

# A tibble: 2 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen             NA            NA                NA          NA
2 Gentoo  Biscoe                NA            NA                NA          NA
# ℹ 2 more variables: sex <fct>, year <int>
Question

Compute the average flipper length for Adelie, Gentoo and Chinstrap penguins, and the number of penguins of each species. Discard any observations with missing flipper lengths.

# A tibble: 3 × 3
  species   avg_flipper sample_size
  <fct>           <dbl>       <int>
1 Adelie           190.         151
2 Chinstrap        196.          68
3 Gentoo           217.         123