- Bitterness
Lab 7: Importing Data
Question
Use the library()
function to load the tidyverse
, janitor
, and googlesheets4
packages.
Question
Spend a minute reviewing the README file in the coffee quality rating GitHub repository. Which of the following is not an aspect the coffees were rated on?
- Body
- Cup Cleanliness
- Bitterness
- Moisture
Question
Insert a new code chunk in your Quarto document, and copy the code template below into this code chunk.
Then, copy the “raw” URL for the robusta_ratings_raw.csv from GitHub, and paste this URL inside the call to read_csv()
. Don’t forget to put quotation marks around the URL!
Use the <-
assignment operator to store the output from the read_csv()
function in an object named robusta_raw
.
Finally, confirm that the data was imported correctly by printing the robusta_raw
object.
# A tibble: 28 × 51
...1 quality_score view_certificate_1 view_certificate_2
<dbl> <dbl> <lgl> <lgl>
1 0 83.8 NA NA
2 0 83.5 NA NA
3 0 83.2 NA NA
4 0 83 NA NA
5 0 83 NA NA
6 0 82.8 NA NA
7 0 82.6 NA NA
8 0 82.5 NA NA
9 0 82.5 NA NA
10 0 82.4 NA NA
# ℹ 18 more rows
# ℹ 47 more variables: `Cupping Protocol and Descriptors` <lgl>,
# `View Green Analysis Details` <lgl>, `Request a Sample` <lgl>,
# Species <chr>, Owner...9 <chr>, `Country of Origin` <chr>,
# `Farm Name` <chr>, `Lot Number` <chr>, Mill <chr>, `ICO Number` <chr>,
# Company <chr>, Altitude <chr>, Region <chr>, Producer <chr>,
# `Number of Bags` <dbl>, `Bag Weight` <chr>, `In-Country Partner` <chr>, …
Question
Use glimpse()
to get a quick overview of the robusta_raw
data. Then, use the output from glimpse()
to answer the following questions.
- How many variables are measured in these data?
- How many variables have names with
...
in them? - How many variables have names with spaces in them?
Rows: 28
Columns: 51
$ ...1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ quality_score <dbl> 83.75, 83.50, 83.25, 83.00, 83.00, …
$ view_certificate_1 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ view_certificate_2 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Cupping Protocol and Descriptors` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `View Green Analysis Details` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ `Request a Sample` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Species <chr> "Robusta", "Robusta", "Robusta", "R…
$ Owner...9 <chr> "Ankole coffee producers coop", "Ni…
$ `Country of Origin` <chr> "Uganda", "India", "India", "Uganda…
$ `Farm Name` <chr> "Kyangundu cooperative society", "S…
$ `Lot Number` <chr> NA, "25", NA, NA, NA, NA, NA, "7", …
$ Mill <chr> "Ankole coffee producers", "Sethura…
$ `ICO Number` <chr> "0", "14/1148/2017/21", "0000", "0"…
$ Company <chr> "Ankole Coffee Producers Coop", "Ka…
$ Altitude <chr> "1488", "3170", "1000m", "1212", "1…
$ Region <chr> "Sheema South Western", "Chikmagalu…
$ Producer <chr> "Ankole coffee producers coop", "Ni…
$ `Number of Bags` <dbl> 300, 320, 300, 320, 1, 200, 320, 32…
$ `Bag Weight` <chr> "60 kg", "60 kg", "2 kg", "60 kg", …
$ `In-Country Partner` <chr> "Uganda Coffee Development Authorit…
$ `Harvest Year` <dbl> 2013, 2017, 2015, 2013, 2013, 2012,…
$ `Grading Date` <chr> "June 26th, 2014", "October 31st, 2…
$ Owner...24 <chr> "Ankole coffee producers coop", "Ni…
$ Variety <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Ot…
$ Status <chr> "Completed", "Completed", "Complete…
$ `Processing Method` <chr> NA, "Washed / Wet", NA, NA, NA, NA,…
$ NA...28 <chr> "Sample", "Sample", "Sample", "Samp…
$ `Fragrance / Aroma` <dbl> 7.83, 8.00, 7.92, 8.00, 8.33, 8.00,…
$ Flavor <dbl> 8.08, 7.75, 7.83, 7.92, 7.83, 7.92,…
$ Aftertaste <dbl> 7.75, 7.92, 7.92, 7.92, 7.83, 7.67,…
$ `Salt / Acid` <dbl> 7.92, 8.00, 8.00, 7.75, 7.75, 8.00,…
$ `Bitter / Sweet` <dbl> 8.00, 8.00, 7.83, 7.75, 7.58, 7.75,…
$ Mouthfeel <dbl> 8.25, 7.92, 7.83, 7.83, 8.25, 7.75,…
$ NA...35 <chr> "Sample", "Sample", "Sample", "Samp…
$ `Uniform Cup` <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, …
$ `Clean Cup` <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, …
$ Balance <dbl> 7.92, 7.92, 7.92, 7.75, 7.75, 7.92,…
$ `Cupper Points` <dbl> 8.00, 8.00, 8.00, 8.08, 7.67, 7.75,…
$ `Total Cup Points` <chr> "Sample 83.75", "Sample 83.50", "…
$ NA...41 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Moisture <chr> "12 %", "0 %", "0 %", "12 %", "12 %…
$ `Category One Defects` <chr> "0 full defects", "0 full defects",…
$ Quakers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Color <chr> "Green", NA, "Green", "Green", "Gre…
$ `Category Two Defects` <chr> "2 full defects", "2 full defects",…
$ NA...47 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Expiration <chr> "June 26th, 2015", "October 31st, 2…
$ `Certification Body` <chr> "Uganda Coffee Development Authorit…
$ `Certification Address` <chr> "Coffee House, Plot 35, Jinja Road,…
$ `Certification Contact` <chr> "Clare Rwakatogoro - 256-41-256940/…
- 51
- 7
- 25
Question
Insert a new code chunk in your Quarto document, and copy the code template below into this code chunk.
Fill in the blank inside the clean_names()
function with the robusta_raw
data frame object. Then, use the <-
assignment operator to store the output from the clean_names()
function in a new data frame object named robusta_cleaned
.
Finally, run the glimpse()
function to inspect the newly created robusta_cleaned
data frame. How have the variable names changed after using the clean_names()
function?
Rows: 28
Columns: 51
$ x1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ quality_score <dbl> 83.75, 83.50, 83.25, 83.00, 83.00, 82…
$ view_certificate_1 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ view_certificate_2 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cupping_protocol_and_descriptors <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ view_green_analysis_details <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ request_a_sample <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ species <chr> "Robusta", "Robusta", "Robusta", "Rob…
$ owner_9 <chr> "Ankole coffee producers coop", "Nish…
$ country_of_origin <chr> "Uganda", "India", "India", "Uganda",…
$ farm_name <chr> "Kyangundu cooperative society", "Set…
$ lot_number <chr> NA, "25", NA, NA, NA, NA, NA, "7", "R…
$ mill <chr> "Ankole coffee producers", "Sethurama…
$ ico_number <chr> "0", "14/1148/2017/21", "0000", "0", …
$ company <chr> "Ankole Coffee Producers Coop", "Kaap…
$ altitude <chr> "1488", "3170", "1000m", "1212", "120…
$ region <chr> "Sheema South Western", "Chikmagalur …
$ producer <chr> "Ankole coffee producers coop", "Nish…
$ number_of_bags <dbl> 300, 320, 300, 320, 1, 200, 320, 320,…
$ bag_weight <chr> "60 kg", "60 kg", "2 kg", "60 kg", "6…
$ in_country_partner <chr> "Uganda Coffee Development Authority"…
$ harvest_year <dbl> 2013, 2017, 2015, 2013, 2013, 2012, 2…
$ grading_date <chr> "June 26th, 2014", "October 31st, 201…
$ owner_24 <chr> "Ankole coffee producers coop", "Nish…
$ variety <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Othe…
$ status <chr> "Completed", "Completed", "Completed"…
$ processing_method <chr> NA, "Washed / Wet", NA, NA, NA, NA, N…
$ na_28 <chr> "Sample", "Sample", "Sample", "Sample…
$ fragrance_aroma <dbl> 7.83, 8.00, 7.92, 8.00, 8.33, 8.00, 7…
$ flavor <dbl> 8.08, 7.75, 7.83, 7.92, 7.83, 7.92, 7…
$ aftertaste <dbl> 7.75, 7.92, 7.92, 7.92, 7.83, 7.67, 7…
$ salt_acid <dbl> 7.92, 8.00, 8.00, 7.75, 7.75, 8.00, 7…
$ bitter_sweet <dbl> 8.00, 8.00, 7.83, 7.75, 7.58, 7.75, 8…
$ mouthfeel <dbl> 8.25, 7.92, 7.83, 7.83, 8.25, 7.75, 7…
$ na_35 <chr> "Sample", "Sample", "Sample", "Sample…
$ uniform_cup <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10…
$ clean_cup <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10…
$ balance <dbl> 7.92, 7.92, 7.92, 7.75, 7.75, 7.92, 7…
$ cupper_points <dbl> 8.00, 8.00, 8.00, 8.08, 7.67, 7.75, 7…
$ total_cup_points <chr> "Sample 83.75", "Sample 83.50", "Sa…
$ na_41 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ moisture <chr> "12 %", "0 %", "0 %", "12 %", "12 %",…
$ category_one_defects <chr> "0 full defects", "0 full defects", "…
$ quakers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ color <chr> "Green", NA, "Green", "Green", "Green…
$ category_two_defects <chr> "2 full defects", "2 full defects", "…
$ na_47 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ expiration <chr> "June 26th, 2015", "October 31st, 201…
$ certification_body <chr> "Uganda Coffee Development Authority"…
$ certification_address <chr> "Coffee House, Plot 35, Jinja Road, P…
$ certification_contact <chr> "Clare Rwakatogoro - 256-41-256940/23…
All the letters have been convered to lower case, all the spaces have replaced with underscores, and the leading ...
have been turned into x’s!
Question
Investigate the online repository for the avengers data from FiveThirtyEight, which is all about Marvel comic book characters.
Then, import the avengers.csv file, and then clean up the variable names. Finally, use the glimpse function to answer the following question: how many variable’s in this data set contain numeric data?
Rows: 173
Columns: 21
$ url <chr> "http://marvel.wikia.com/Henry_Pym_(Earth-…
$ name_alias <chr> "Henry Jonathan \"Hank\" Pym", "Janet van …
$ appearances <dbl> 1269, 1165, 3068, 2089, 2402, 612, 3458, 1…
$ current <chr> "YES", "YES", "YES", "YES", "YES", "YES", …
$ gender <chr> "MALE", "FEMALE", "MALE", "MALE", "MALE", …
$ probationary_introl <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ full_reserve_avengers_intro <chr> "Sep-63", "Sep-63", "Sep-63", "Sep-63", "S…
$ year <dbl> 1963, 1963, 1963, 1963, 1963, 1963, 1964, …
$ years_since_joining <dbl> 52, 52, 52, 52, 52, 52, 51, 50, 50, 50, 50…
$ honorary <chr> "Full", "Full", "Full", "Full", "Full", "H…
$ death1 <chr> "YES", "YES", "YES", "YES", "YES", "NO", "…
$ return1 <chr> "NO", "YES", "YES", "YES", "YES", NA, "YES…
$ death2 <chr> NA, NA, NA, NA, "YES", NA, NA, "YES", NA, …
$ return2 <chr> NA, NA, NA, NA, "NO", NA, NA, "YES", NA, N…
$ death3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ death5 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ return5 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ notes <chr> "Merged with Ultron in Rage of Ultron Vol.…
3 variables contain numeric data (appearances
, year
, and years_since_joining
).
Question
Insert a new code chunk into your Quarto document, and copy the code template below into this chunk.
Then, navigate to the Gapminder Google Sheet, and copy the URL from the address bar. Paste this the URL inside the call to the read_sheet()
function, with quotation marks around the URL. Next, use the assignment operator to store the output from the read_sheet()
function in an object named gapminder
. Finally, confirm that the data was imported correctly by printing the gapminder
object.
# A tibble: 624 × 6
country continent year lifeExp pop gdpPercap
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Algeria Africa 1952 43.1 9279525 2449.
2 Algeria Africa 1957 45.7 10270856 3014.
3 Algeria Africa 1962 48.3 11000948 2551.
4 Algeria Africa 1967 51.4 12760499 3247.
5 Algeria Africa 1972 54.5 14760787 4183.
6 Algeria Africa 1977 58.0 17152804 4910.
7 Algeria Africa 1982 61.4 20033753 5745.
8 Algeria Africa 1987 65.8 23254956 5681.
9 Algeria Africa 1992 67.7 26298373 5023.
10 Algeria Africa 1997 69.2 29072015 4797.
# ℹ 614 more rows
Question
Above, we discussed the difference between read_sheet()
and read_csv()
. Copy your code from the previous answer into a new code chunk below. Then, replace read_sheet()
with read_csv()
, and run the code again (this command may take a several moments to finish running).
What happens when you use read_csv()
instead of read_sheet()
on a Google Sheet URL? Hint: Consider the number of rows and columns. Is this what you expect?
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
# A tibble: 11 × 4,078
<!DOCTYPE html><html la…¹ .gb_A .gb_B{-webkit-tap-hi…² .gb_E{fill:#5f6368}.…³
<chr> <chr> <chr> <chr>
1 "/*" <NA> <NA> <NA>
2 "Copyright The Closure L… <NA> <NA> <NA>
3 "SPDX-License-Identifier… <NA> <NA> <NA>
4 "*/" <NA> <NA> <NA>
5 "function a(d){var f" "g" "h;if(((f=document)==… l;((l=(k=window).DOCS…
6 "for(var e;b.length&&(e=… <NA> <NA> <NA>
7 "// Google Inc." <NA> <NA> <NA>
8 "//# sourceMappingURL=fo… <NA> <NA> <NA>
9 "</script><script nonce=… "Rob… "Helvetica" Arial
10 "g&&(a+=15);var f=d.getE… "m=f… "k=f?null:d.getElemen… n=l?l.offsetHeight:0;…
11 "e.style.display=\"block… "e=g… <NA> <NA>
# ℹ abbreviated names:
# ¹`<!DOCTYPE html><html lang="en-US"><head><script nonce="fhy_159A9h6oa_L07WYlUg">var DOCS_timing={}; DOCS_timing['pls']=new Date().getTime();</script><meta property="og:title" content="gapminder"><meta property="og:type" content="article"><meta property="og:site_name" content="Google Docs"><meta property="og:url" content="https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit?gid=780868077&usp=embed_facebook"><meta property="og:image" content="https://lh7-us.googleusercontent.com/docs/AHkbwyKlA5QbblnI18BecNvdWzoAfwMyGHSz9WApv1JE3fYcedBKxPsQFkIvPkA1i6Y0mXS2wTUms0HQPhQU8YWTyJ0cMlK4LtRJIC49SfymkAcbDc69oq4=w1200-h630-p"><meta property="og:image:width" content="1200"><meta property="og:image:height" content="630"><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;"><meta name="referrer" content="origin"><title>gapminder - Google Sheets</title><link rel="shortcut icon" href="//ssl.gstatic.com/docs/spreadsheets/spreadsheets_2023q4.ico"><link rel="chrome-webstore-item" href="https://chrome.google.com/webstore/detail/apdfllckaahabafndbhieahigkjlhalf"><style nonce="sNlH5gLJp8yEVHWq6xR84w">.gb_x{display:-webkit-box;display:-webkit-flex;display:-moz-box;display:-ms-flexbox;display:-webkit-box;display:-moz-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-box-pack:end;-webkit-justify-content:flex-end;-moz-box-pack:end;-ms-flex-pack:end;-webkit-justify-content:flex-end;justify-content:flex-end;width:100%}.gb_y{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box;height:48px;padding:4px}.gb_z`,
# ²`.gb_B{-webkit-tap-highlight-color:transparent;-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box;float:right;outline:none}.gb_C{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box;display:inline-block;height:48px;outline:none;padding:4px}.gb_D`,
# ³`.gb_E{fill:#5f6368}.gb_F .gb_E`
# ℹ 4,074 more variables: `.gb_F .gb_D{fill:rgba(255` <chr>, `255...6` <chr>,
# `255...7` <chr>, `.87)}.gb_H .gb_A` <chr>,
# `.gb_I .gb_A{cursor:pointer;padding:8px}.gb_J{display:none}a.gb_z{-webkit-box-align:center;-webkit-align-items:center;-moz-box-align:center;-ms-flex-align:center;-webkit-align-items:center;align-items:center;background:#1a73e8;border:1px solid transparent;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;color:#fff;cursor:pointer;display:-webkit-box;display:-webkit-flex;display:-moz-box;display:-ms-flexbox;display:-webkit-box;display:-moz-box;display:-webkit-flex;display:-ms-flexbox;display:flex;font-family:Google Sans` <chr>, …
No, this result doesn’t look at all like what I expected. The number of rows and columns should be 624 and 6 for the gapminder data. In addition to having the wrong dimensions, the data appear to be a bunch of mangled text.
Question
Extract the data file named product review text.csv from the zip file you downloaded, and move it into your SDS 100 project folder.
If you’ve never extracted files from a zip file before, here are some instructions you might find helpful:
Since no R code is needed for this question, you won’t see any output below this question in your rendered HTML file. You’ll know you completed this exercise correctly if you see the product review text.csv
file in your SDS 100 project folder, along with all your lab files.
Question
Use the locale()
function (which comes from the readr
package) to create an object representing a Chinese locale that uses the GB18030 encoding, just like the example below:
Then, try using the read_csv()
function to import the product review text.csv file and store the data in an object named JD_product_reviews
again. But this time, pass the locale object you created into the read_csv()
function using the locale
argument.
Check that the data were imported properly by printing out the JD_product_reviews
object; the data should look like the screenshot above.
# A tibble: 67,846 × 2
充电有点烫 rel
<chr> <chr>
1 用了三天来评价 irr
2 上网还可以 rel
3 就是用一会手机发烫 rel
4 不知道是怎么回事 irr
5 可以可以不错不错 irr
6 最便宜的苹果了 rel
7 拿在手里刚刚好 rel
8 比较好看的苹果机型作为备用机还是很不错的 rel
9 挺好的 irr
10 留个纪念也好 irr
# ℹ 67,836 more rows