[1] "https://raw.githubusercontent.com/fivethirtyeight/data/master/pollster-ratings/pollster-ratings-combined.csv"
Lab 7: Importing Data
Question
Use the library()
function to load the tidyverse
, janitor
, and googlesheets4
packages.
Question
Create a new object in R called file_location
and assign the URL to the pollster-ratings
CSV to it.
Confirm that this worked by typing file_location
so that R displays the URL.
Question
Use read_csv()
to load the pollster-ratings
data into R. Save the resulting object as pollster_raw
.
Question
Use glimpse()
to check the number of rows and columns of pollster_raw
, and list the names of the variables.
Rows: 539
Columns: 13
$ pollster <chr> "The New York Times/Siena College"…
$ pollster_rating_id <dbl> 448, 3, 195, 391, 215, 183, 323, 7…
$ aapor_roper <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
$ inactive <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
$ numeric_grade <dbl> 3.0, 3.0, 3.0, 2.9, 2.9, 2.9, 2.9,…
$ rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
$ POLLSCORE <dbl> -1.5, -1.1, -1.0, -1.1, -0.9, -0.9…
$ wtd_avg_transparency <dbl> 8.7, 9.2, 10.0, 8.8, 9.9, 9.1, 8.6…
$ number_polls_pollster_total <dbl> 122, 97, 21, 624, 149, 211, 123, 1…
$ percent_partisan_work <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,…
$ error_ppm <dbl> -1.0, -1.0, -1.0, -0.5, -0.7, -0.7…
$ bias_ppm <dbl> -1.9, -1.2, -1.1, -1.7, -1.0, -1.0…
$ number_polls_pollster_time_weighted <dbl> 113.5, 25.1, 12.4, 292.1, 85.4, 94…
Question
Use clean_names()
to sanitize the variable names in pollster_raw
. Save the resulting object as pollster_clean
.
Question
Use glimpse()
to display a listing of the names of the variables. What do you notice about the how the variable names are stored?
Rows: 539
Columns: 13
$ pollster <chr> "The New York Times/Siena College"…
$ pollster_rating_id <dbl> 448, 3, 195, 391, 215, 183, 323, 7…
$ aapor_roper <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
$ inactive <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,…
$ numeric_grade <dbl> 3.0, 3.0, 3.0, 2.9, 2.9, 2.9, 2.9,…
$ rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
$ pollscore <dbl> -1.5, -1.1, -1.0, -1.1, -0.9, -0.9…
$ wtd_avg_transparency <dbl> 8.7, 9.2, 10.0, 8.8, 9.9, 9.1, 8.6…
$ number_polls_pollster_total <dbl> 122, 97, 21, 624, 149, 211, 123, 1…
$ percent_partisan_work <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,…
$ error_ppm <dbl> -1.0, -1.0, -1.0, -0.5, -0.7, -0.7…
$ bias_ppm <dbl> -1.9, -1.2, -1.1, -1.7, -1.0, -1.0…
$ number_polls_pollster_time_weighted <dbl> 113.5, 25.1, 12.4, 292.1, 85.4, 94…
Question
Import the raw avengers dataset and then clean up the column names. Call the imported data av_data
.
Question
How many rows and columns are in the data?
Rows: 173
Columns: 21
$ url <chr> "http://marvel.wikia.com/Henry_Pym_(Earth-…
$ name_alias <chr> "Henry Jonathan \"Hank\" Pym", "Janet van …
$ appearances <dbl> 1269, 1165, 3068, 2089, 2402, 612, 3458, 1…
$ current <chr> "YES", "YES", "YES", "YES", "YES", "YES", …
$ gender <chr> "MALE", "FEMALE", "MALE", "MALE", "MALE", …
$ probationary_introl <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ full_reserve_avengers_intro <chr> "Sep-63", "Sep-63", "Sep-63", "Sep-63", "S…
$ year <dbl> 1963, 1963, 1963, 1963, 1963, 1963, 1964, …
$ years_since_joining <dbl> 52, 52, 52, 52, 52, 52, 51, 50, 50, 50, 50…
$ honorary <chr> "Full", "Full", "Full", "Full", "Full", "H…
$ death1 <chr> "YES", "YES", "YES", "YES", "YES", "NO", "…
$ return1 <chr> "NO", "YES", "YES", "YES", "YES", NA, "YES…
$ death2 <chr> NA, NA, NA, NA, "YES", NA, NA, "YES", NA, …
$ return2 <chr> NA, NA, NA, NA, "NO", NA, NA, "YES", NA, N…
$ death3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death5 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return5 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ notes <chr> "Merged with Ultron in Rage of Ultron Vol.…
Question
Navigate to the Google Sheet and save the URL as gapminder_loc
. Confirm that this worked by typing gapminder_loc
to display the URL.
[1] "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
Question
Earlier in this part of the lab, we stored the location of the Google Sheet as the object gapminder_loc
. Use this variable with read_sheet()
to read in the first page of this Google Sheet, clean the variable names using clean_names()
, and store the resulting data frame as the object gm_data
.
Question
Above, we discussed the difference between read_sheet()
and read_csv()
. Copy your code from the previous answer into a new code chunk below. Then, replace read_sheet()
with read_csv()
and gm_data
with gm_data2
, and run the code (note this command may take a few moments to run).
What happens when you use read_csv()
instead of read_sheet()
on a Google Sheet? Hint: Consider the number of rows and columns. Is this what you expect?
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Question
Download and read in the cirrhosis
dataset from your local machine using the read_csv()
function. Then clean the column names.