Lab 6: Polishing figures

Question
Question

Use the data wrangling skills you learned in Lab 4 to restrict the data set to:

  1. only those majors that belong to the major_category values of Computers & Mathematics and Engineering, and
  2. only those majors with at least 5,000 total graduates.

To avoid overwriting the original college_recent_grads data, assign the resulting data frame to a new object called college_majors.

Hint: the filter() function is your friend!

Check: When you’re done, your new data frame should have 22 rows.

# A tibble: 22 × 21
    rank major_code major          major_category  total sample_size   men women
   <int>      <int> <chr>          <chr>           <int>       <int> <int> <int>
 1     5       2405 Chemical Engi… Engineering     32260         289 21239 11021
 2     9       2414 Mechanical En… Engineering     91227        1029 80320 10907
 3    10       2408 Electrical En… Engineering     81527         631 65511 16016
 4    11       2407 Computer Engi… Engineering     41542         399 33258  8284
 5    12       2401 Aerospace Eng… Engineering     15058         147 12953  2105
 6    13       2404 Biomedical En… Engineering     14955          79  8407  6548
 7    16       2402 Biological En… Engineering      8925          55  6062  2863
 8    17       2412 Industrial An… Engineering     18968         183 12453  6515
 9    18       2400 General Engin… Engineering     61152         425 45683 15469
10    21       2102 Computer Scie… Computers & M… 128319        1196 99743 28576
# ℹ 12 more rows
# ℹ 13 more variables: sharewomen <dbl>, employed <int>,
#   employed_fulltime <int>, employed_parttime <int>,
#   employed_fulltime_yearround <int>, unemployed <int>,
#   unemployment_rate <dbl>, p25th <dbl>, median <dbl>, p75th <dbl>,
#   college_jobs <int>, non_college_jobs <int>, low_wage_jobs <int>
Question

Insert a new code chunk, and copy the code template below into this code chunk. Then, fill in the three blanks to create a bar plot showing the total number of recent graduates in each of these 22 majors. Importantly, you should:

  • represent the majors along the x-axis of the plot, and the number of graduates along the y-axis of the plot.
  • draw a column to represent the number of graduates in each major using the geom_col() function.

Don’t worry about colors, labels, or anything else just yet.

The code for Exercise 2 won’t print out any results. To check your work for Exercise 1, make sure that results for Exercise 3 match with the Exercise 3 solutions below. If your results from Exercise 3 don’t match the solutions, re-visit your code in Exercise 2.

Question

Display the bar plot you just created in Exercise 2 by inserting a new code chunk, and calling the print() function on your ggplot object.

Question

Insert a new code chunk, and copy the code template below into this code chunk. Modify your bar plot by filling in the blank in the code template below. Add a layer-specific aesthetic mapping that will fill in each column in the bar plot with a different color, based on the proportion of graduates in each major that are women.

Question

Generalize the example above to modify the x aesthetic so that the bars for each major appear in order according to the total number of recent graduates.

Make sure to use the <- assignment operator to modify your ggplot object permanently!

Tip

If you make a mistake, and your plot isn’t coming out right, re-run all the code chunks prior to this exercise to start again with a clean slate.

Question

Insert a new code chunk, and copy the code template below into this code chunk. Fill in the blanks in the code template below to add a title, subtitle, and caption to your plot.

Make sure that the text for your plot’s title, subtitle and caption is quoted!

The content of the title, subtitle, and caption do not need to match exactly. But you do need to have all 3 components.

Question

Update your bar plot to use the viridis color palette by adding the scale_fill_viridis_c() function (just like you added the labs() function to the plot).

Use the name argument to for the scale_fill_viridis_c() function to change the name of the color legend to something more informative (e.g., “Percent Women” or “% Women”).

Make sure to use the <- assignment operator to modify your ggplot object permanently!

Question

Remove your plot’s x-axis label by adding the scale_x_discrete() function to your plot. Set the name argument to NULL to supress the axis label.

Question

Force R to use “normal” notation for the numbers along the y-axis of the plot by adding the scale_y_continuous() function to your plot. Use the labels argument within the scale_y_continuous() function to force the numbers to show in the format provided by the scales::label_comma() function.

Also, use the name argument within the scale_y_continuous() function to improve the title for the y-axis, just like you did with the title for the color scale in Question 7.

Question

Update your plot’s x-axis by using the scale_x_discrete() function once again. This time, also pass the scales::label_wrap() function to the labels argument in scale_x_discrete(), and make sure to wrap the major field names to 25 characters.

Question

Use the coord_flip() function to flip the coordinate axes.

Question

Use the facet_wrap() function to create facets for each major_category. Be sure to set the scales and ncol arguments as specified above.