Lab 7: Importing Data

Question

Use the library() function to load the tidyverse, janitor, and googlesheets4 packages.

Question

Create a new object in R called file_location and assign the URL to the pollster-ratings CSV to it.

Confirm that this worked by typing file_location so that R displays the URL.

[1] "https://raw.githubusercontent.com/fivethirtyeight/data/master/pollster-ratings/pollster-ratings-combined.csv"
Question

Use read_csv() to load the pollster-ratings data into R. Save the resulting object as pollster_raw.

Question

Use glimpse() to check the number of rows and columns of pollster_raw, and list the names of the variables.

Rows: 539
Columns: 13
$ pollster                            <chr> "The New York Times/Siena College"…
$ pollster_rating_id                  <dbl> 448, 3, 195, 391, 215, 183, 323, 7…
$ aapor_roper                         <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
$ inactive                            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
$ numeric_grade                       <dbl> 3.0, 3.0, 3.0, 2.9, 2.9, 2.9, 2.9,…
$ rank                                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
$ POLLSCORE                           <dbl> -1.5, -1.1, -1.0, -1.1, -0.9, -0.9…
$ wtd_avg_transparency                <dbl> 8.7, 9.2, 10.0, 8.8, 9.9, 9.1, 8.6…
$ number_polls_pollster_total         <dbl> 122, 97, 21, 624, 149, 211, 123, 1…
$ percent_partisan_work               <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,…
$ error_ppm                           <dbl> -1.0, -1.0, -1.0, -0.5, -0.7, -0.7…
$ bias_ppm                            <dbl> -1.9, -1.2, -1.1, -1.7, -1.0, -1.0…
$ number_polls_pollster_time_weighted <dbl> 113.5, 25.1, 12.4, 292.1, 85.4, 94…
Question

Use clean_names() to sanitize the variable names in pollster_raw. Save the resulting object as pollster_clean.

Question

Use glimpse() to display a listing of the names of the variables. What do you notice about the how the variable names are stored?

Rows: 539
Columns: 13
$ pollster                            <chr> "The New York Times/Siena College"…
$ pollster_rating_id                  <dbl> 448, 3, 195, 391, 215, 183, 323, 7…
$ aapor_roper                         <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
$ inactive                            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
$ numeric_grade                       <dbl> 3.0, 3.0, 3.0, 2.9, 2.9, 2.9, 2.9,…
$ rank                                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
$ pollscore                           <dbl> -1.5, -1.1, -1.0, -1.1, -0.9, -0.9…
$ wtd_avg_transparency                <dbl> 8.7, 9.2, 10.0, 8.8, 9.9, 9.1, 8.6…
$ number_polls_pollster_total         <dbl> 122, 97, 21, 624, 149, 211, 123, 1…
$ percent_partisan_work               <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,…
$ error_ppm                           <dbl> -1.0, -1.0, -1.0, -0.5, -0.7, -0.7…
$ bias_ppm                            <dbl> -1.9, -1.2, -1.1, -1.7, -1.0, -1.0…
$ number_polls_pollster_time_weighted <dbl> 113.5, 25.1, 12.4, 292.1, 85.4, 94…
Question

Import the raw avengers dataset and then clean up the column names. Call the imported data av_data.

Question

How many rows and columns are in the data?

Rows: 173
Columns: 21
$ url                         <chr> "http://marvel.wikia.com/Henry_Pym_(Earth-…
$ name_alias                  <chr> "Henry Jonathan \"Hank\" Pym", "Janet van …
$ appearances                 <dbl> 1269, 1165, 3068, 2089, 2402, 612, 3458, 1…
$ current                     <chr> "YES", "YES", "YES", "YES", "YES", "YES", …
$ gender                      <chr> "MALE", "FEMALE", "MALE", "MALE", "MALE", …
$ probationary_introl         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ full_reserve_avengers_intro <chr> "Sep-63", "Sep-63", "Sep-63", "Sep-63", "S…
$ year                        <dbl> 1963, 1963, 1963, 1963, 1963, 1963, 1964, …
$ years_since_joining         <dbl> 52, 52, 52, 52, 52, 52, 51, 50, 50, 50, 50…
$ honorary                    <chr> "Full", "Full", "Full", "Full", "Full", "H…
$ death1                      <chr> "YES", "YES", "YES", "YES", "YES", "NO", "…
$ return1                     <chr> "NO", "YES", "YES", "YES", "YES", NA, "YES…
$ death2                      <chr> NA, NA, NA, NA, "YES", NA, NA, "YES", NA, …
$ return2                     <chr> NA, NA, NA, NA, "NO", NA, NA, "YES", NA, N…
$ death3                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return3                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death4                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return4                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death5                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return5                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ notes                       <chr> "Merged with Ultron in Rage of Ultron Vol.…
Question

Navigate to the Google Sheet and save the URL as gapminder_loc. Confirm that this worked by typing gapminder_loc to display the URL.

[1] "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
Question

Earlier in this part of the lab, we stored the location of the Google Sheet as the object gapminder_loc. Use this variable with read_sheet() to read in the first page of this Google Sheet, clean the variable names using clean_names(), and store the resulting data frame as the object gm_data.

Question

Above, we discussed the difference between read_sheet() and read_csv(). Copy your code from the previous answer into a new code chunk below. Then, replace read_sheet() with read_csv() and gm_data with gm_data2, and run the code (note this command may take a few moments to run).

What happens when you use read_csv() instead of read_sheet() on a Google Sheet? Hint: Consider the number of rows and columns. Is this what you expect?

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Question

Download and read in the cirrhosis dataset from your local machine using the read_csv() function. Then clean the column names.