# A tibble: 2,000 × 3
married race edu
<fct> <fct> <fct>
1 no white college
2 no white hs or lower
3 no white hs or lower
4 no white hs or lower
5 no white hs or lower
6 yes other hs or lower
7 no white hs or lower
8 no other hs or lower
9 no asian hs or lower
10 yes white hs or lower
# ℹ 1,990 more rows
Lab 4: Introduction to Data Wrangling
Question
Use the code below to:
- Load the tidyverse package.
- Import the
acs12
data from the openintro package, and assign it to an object calledacs_df
in your environment.
Question
Using the above code as an example, use the select()
function to keep the variables married
, race
, and edu
. Use the template code below as a guide to help you select the columns.
Question
Select the following variables: citizen
, time_to_work
, and lang
.
# A tibble: 2,000 × 3
citizen time_to_work lang
<fct> <int> <fct>
1 yes NA english
2 yes NA english
3 yes NA english
4 yes NA other
5 yes NA other
6 yes 15 other
7 yes NA english
8 yes NA english
9 yes NA other
10 yes 40 english
# ℹ 1,990 more rows
Question
Use the template below to only keep observations where time_to_work
is greater than or equal to 35 minutes and income
is strictly greater than fifty thousand dollars ($50000).
# A tibble: 68 × 13
income employment hrs_work race age gender citizen time_to_work lang
<int> <fct> <int> <fct> <int> <fct> <fct> <int> <fct>
1 70000 employed 50 black 50 male yes 40 english
2 85000 employed 50 white 33 male yes 65 english
3 100000 employed 40 white 27 male yes 45 english
4 55000 employed 60 white 66 male yes 50 english
5 58000 employed 50 white 59 male yes 45 english
6 60000 employed 40 other 48 male yes 45 english
7 90000 employed 50 asian 30 male yes 45 other
8 100000 employed 40 white 40 male yes 142 other
9 360000 employed 50 white 52 male yes 40 english
10 89000 employed 55 white 43 male yes 40 english
# ℹ 58 more rows
# ℹ 4 more variables: married <fct>, edu <fct>, disability <fct>,
# birth_qrtr <fct>
Question
Use the mutate()
command to generate a new variable in the subset_df
data frame that measures age in months instead of years.
# A tibble: 969 × 6
income age gender hrs_work time_to_work age_months
<int> <int> <fct> <int> <int> <dbl>
1 60000 68 female 40 NA 816
2 NA 12 female NA NA 144
3 0 77 female NA NA 924
4 1700 35 female 40 15 420
5 NA 8 female NA NA 96
6 8600 69 female 23 5 828
7 4000 67 female 8 10 804
8 19000 36 female 35 15 432
9 NA 12 female NA NA 144
10 1200 18 female 12 NA 216
# ℹ 959 more rows
Question
Use the template below to arrange the subset_df
data frame by the hrs_work
variable.
# A tibble: 969 × 6
income age gender hrs_work time_to_work age_months
<int> <int> <fct> <int> <int> <dbl>
1 1800 18 female 4 NA 216
2 180 65 female 4 NA 780
3 1300 21 female 5 5 252
4 680 19 female 5 5 228
5 50 30 female 5 20 360
6 850 19 female 6 10 228
7 1300 78 female 6 2 936
8 0 66 female 6 15 792
9 1000 64 female 6 10 768
10 4000 67 female 8 10 804
# ℹ 959 more rows
Question
Generate code that will arrange the subset_df
data frame by time_to_work
in descending order.
# A tibble: 969 × 6
income age gender hrs_work time_to_work age_months
<int> <int> <fct> <int> <int> <dbl>
1 38000 51 female 35 163 612
2 40000 53 female 37 157 636
3 56000 42 female 50 145 504
4 18000 32 female 40 128 384
5 65000 38 female 40 90 456
6 36000 41 female 36 90 492
7 60000 54 female 40 90 648
8 62000 48 female 40 80 576
9 0 37 female 30 75 444
10 150000 62 female 40 75 744
# ℹ 959 more rows
Question
Using the template below, select the variables hrs_work
, income
, and married
, and keep observations where income
is less than $ 30,000.
# A tibble: 1,171 × 3
hrs_work income married
<int> <int> <fct>
1 NA 0 no
2 NA 0 no
3 NA 0 no
4 40 1700 yes
5 23 8600 no
6 NA 0 yes
7 NA 0 no
8 8 4000 yes
9 35 19000 yes
10 25 3400 no
# ℹ 1,161 more rows
Question
Use the pipe operator to generate a variable that measures income in units of $10,000 and then arranges the data by hrs_work
. Then select these two variables.
# A tibble: 2,000 × 2
hrs_work income_10000
<int> <dbl>
1 1 0.03
2 1 0.075
3 2 0.011
4 2 0.03
5 4 0.01
6 4 0
7 4 0.18
8 4 0.018
9 5 0.12
10 5 0.13
# ℹ 1,990 more rows