Data collection & analysis using Google Calendar

Published

November 6, 2024

Resources for instructors for Albert Y. Kim and Johanna Hardin’s Journal of Statistics and Data Science Education paper:
“Playing the whole game”: A data collection and analysis exercise with Google Calendar.

Abstract: We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to ‘play the whole game’ of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question ‘How do I spend my time?’ On the one hand, the exercise involves answering a question with near universal appeal, but on the other hand, the data collection mechanism is not beyond the reach of a typical undergraduate student. A further benefit of the exercise is that it provides an opportunity for discussions on ethical questions and considerations that data providers and data analysts face in today’s age of large-scale internet-based data collection.

Learning Goals

The learning goals of this assignment relate to:

  • Data Collection
    1. Experience creating measurable data observations (e.g., how is “one day” measured, or what defines “studying”).
    2. Address data collection constraints due to limits in technological capacity and human behavior.
  • Data Ethics
    1. Practice the ethical and legal responsibilities of those collecting, storing, and analyzing data.
    2. Decide limits for personal privacy.
    3. Deliberate on the trade-offs between research results and privacy.
  • “Playing the whole game”
    1. Tie together data collection, analysis, ethics, and communication components.
    2. Iterate between and within the components of the “whole game.”

Resources

Examples

Both Albert and Johanna’s versions of the assignment assumed basic familiarity with R and RStudio as well as the ggplot2 and dplyr packages.

  1. Previews of the assignment as given to students in
  2. A .zip file of all files necessary for both assignments: calendar.zip
  3. Both assignments give students some variation of the starter code below:
library(tidyverse)
library(lubridate)
library(ical)

calendar_data <- "calendar_filename.ics" %>% 
  # Use ical package to import into R
  ical_parse_df() %>% 
  # Convert to "tibble" data frame format
  as_tibble() %>% 
  # Use lubridate package to wrangle dates and times
  mutate(
    start_datetime = with_tz(start, tzone = "America/New_York"),
    end_datetime = with_tz(end, tzone = "America/New_York"),
    duration = end_datetime - start_datetime,
    date = floor_date(start_datetime, unit = "day")
  ) %>%
  # Convert calendar entry to all lowercase and rename:
  mutate(activity = tolower(summary)) %>%
  # Compute total duration of time for each day & activity:
  group_by(date, activity) %>%
  summarize(duration = sum(duration))

Screencast demo

Here is a 6m56s YouTube screencast demonstrating how to:

  1. Log time data in Google Calendar
  2. Export the calendar data to .ics file format
  3. Import the .ics in R using the ical package


Examples from the community

Prof. Katharine Correia at Amherst College extended our original idea and used it for her students to create data diaries during the COVID-19 quarantine period of Spring 2020. Some examples of her students’ amazing work can be found here.