[1] 1728
Lab 5: Summarizing Data
Question
Load the mosaicData
R package in your Quarto document, and then visit the documentation page for the SaratogaHouses
data set.
Question
Import the Saratoga Houses data set into your environment using the command SaratogaHouses <- mosaicData::SaratogaHouses
in your Quarto document. Use nrow()
to compute the number of rows in SaratogaHouses
.
Question
How old was a “typical” home in Saratoga County during 2006?. To find out, use summarize()
to compute the median age of all the homes in the SaratogaHouses
data set. Be sure to think about what “ingredients” you need to supply for your code to work!
median_age
1 19
Question
What was the average amount of living space of a home for sale in Saratoga County during 2006? Write your own code “from scratch” to compute this summary.
Be sure to think about:
- What variable in the data set measures home size?
- What function in R calculates the average?
- What are the units in the output
- What a good name for your summary would be!
avg_size
1 1754.976
Question
Does the price of a home depend on the type of heating system it has?
To find out, insert a new code chunk into your Quarto document, and copy the code template below into this new code chunk.
Then, modify the “blanks” in this template to compute the median price of of a home, given its type of heating system.
Be sure to look back at the variables in the data set to figure out which one can tell you what type of heating system a home uses.
# A tibble: 3 × 2
heating median_price
<fct> <dbl>
1 hot air 200000
2 hot water/steam 199700
3 electric 149000
Question
How does the average size of a home depend on the number of bedrooms in the home? Write your own code “from scratch” to compute this summary. Be sure to think about
- Which variable should be the “grouping” variable (i.e., which variable you should look at to figure out which group a home belongs to) and
- Which variable should be the summarized variable?
# A tibble: 7 × 2
bedrooms avg_size
<int> <dbl>
1 1 885.
2 2 1202.
3 3 1628.
4 4 2273.
5 5 2476.
6 6 3060.
7 7 2521.
Question
Which type heating fuel is used the most frequently? We can use the n()
function to find out!
Investigate by inserting a new code chunk into your Quarto document, and copy the code template below into this new code chunk.
Then, modify the “blanks” in this code chunk to count the number of homes that use electric, gas, or oil as their heating fuel.
Be sure to look back in the documentation for the SaratogaHouses
data set to figure out which variable measures what kind of fuel a home uses, so you know which variable to group the data by!
# A tibble: 3 × 2
fuel num_homes
<fct> <int>
1 gas 1197
2 electric 315
3 oil 216
Question
Write your own code “from scratch” to calculate how many homes in the data set do and do not have a waterfront on their property.
# A tibble: 2 × 2
waterfront num_homes
<fct> <int>
1 Yes 15
2 No 1713
Question
Insert a new code chunk into your Quarto document, and copy code below into that code chunk. Press the green “play” button to run this code, and then explain in words what this code is doing.
This code reaches into the palmerpenguins
package, makes a copy of the penguins
data set, and stores this copy of these data as an object in the environment named penguins
.
Question
Use the is.na
function and the penguins
data set to see only penguins with missing body mass.
# A tibble: 2 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen NA NA NA NA
2 Gentoo Biscoe NA NA NA NA
# ℹ 2 more variables: sex <fct>, year <int>
Question
Compute the average flipper length for Adelie, Gentoo and Chinstrap penguins, and the number of penguins of each species. Discard any observations with missing flipper lengths.
# A tibble: 3 × 3
species avg_flipper sample_size
<fct> <dbl> <int>
1 Adelie 190. 151
2 Chinstrap 196. 68
3 Gentoo 217. 123