Project

Overview

The goals for this project are to implement the ideas from the course, work collaboratively (and asynchronously), and to practice iterative document development. The final deliverable is to contribute a new dataset for use in a specific SDS 100 lab with a report that could be reproduced on another person’s computer. It is possible that the data you propose may end up in a future semester of SDS 100.

This is your chance to reorient the curriculum towards your interests or the interests of Smithies in general. There are a variety of metrics that you could use to explore and select data. Below are a few possible approaches (though there are many more ways to think about this!):

  • For Lab 3: Basic Data Graphics, you might select a dataset that has a variable with a “nice” distribution or a “not nice” distribution
  • For Lab 4: Data Wrangling, you might choose a large dataset that illuminates filtering and selecting
  • You might select a dataset from a discipline that you didn’t see represented in the class already
  • You might look for a dataset with problematic ethical considerations or with poor documentation

The data that your group chooses will be the result of a structured, multi-week, asynchronous brainstorming activity. The balance of the project is about 80/20 individual/group work. But that being said, you will be implementing each step of the project individually and then collaborating to create the final report. The final report will include a basic overview of the data, including summary tables and visualizations, as well as citations and a discussion of ethical considerations with the data.

Logistics

This project will be done in groups of 3 (talk to your instructor if the numbers require a different group size for your group). Each group member will be implementing steps of the project individually, but your work depends on the progress of other classmates. The project will be divided into “rotations”, meaning steps in which you hand off your progress to another group member to build on, and receive work from another classmate.

We will rotate through every group member over the course of 5 rotations, meaning you will work on some documents multiple times. Each group member will be completing each part of the project individually until the final submission. After all the rotations, you will have 3 documents by the lab 13 submission, each on a separate dataset.

In the final step, you will agree upon which dataset you are recommending as a group and turn in a single polished report that includes the work done during the rotations.

The work you will complete for the final project will rely on work done in the labs throughout the semester. In the first half of the semester, you will be tasked with identifying potential data sources, and in the second half, you will receive weekly assignments with more details for each rotation of the project.

Group of 3 Rotation Pattern
Data Option 1 Data Option 2 Data Option 3
Lab 8- Rotation 1 Student 1 Student 2 Student 3
Lab 9- Rotation 2 Student 2 Student 3 Student 1
Lab 10- Rotation 3 Student 3 Student 1 Student 2
Lab 11- Rotation 4 Student 1 Student 2 Student 3
Lab 12- Rotation 5 Student 2 Student 3 Student 1

Deadlines

This project has a number of parts, but most of the pieces are individual and are part of the lab submissions.

  • Individual rotations are due on Tuesdays. Specific instructions for each rotation can be found in the labs.

    • Lab 8: Initial Data Proposal
    • Lab 9: Variable Selection & Description
    • Lab 10: Summary Statistics & Tables
    • Lab 11: Data Visualization
    • Lab 12: Ethics & Citations
  • Final Group submission - Only the last part of the project is explicitly done as a group. This is the only part that needs to be submitted as a group submission. It is due at 11:59pm on the first day of finals.

  • Peer/self-evaluation - reflection on your own and your group members’ contributions to the project. It is due 11:59pm on the first day of finals.

Final Report Instructions

As a group, you will recommend one of the datasets from your group to the SDS faculty for inclusion in future iterations of SDS 100. Your final recommendation will be a written report revising the information from the five rotation steps into a coherent narrative about your recommended data. This version should be polished, include section headings, and be written with proper citations included. Your report should include:

  • An introduction to the data
  • A discussion of the data’s impact and what specifically this dataset adds to a specific SDS 100 lab
  • Details about the dataset, including a glimpse of the data and descriptions of the variables
  • At least two summary tables of the data
  • At least two visualizations of the data, one of which must be at least bivariate
  • Limitations and ethical considerations of the data

A few logistical notes:

  • The file paths must be set up where every report is able to render the document on any computer that has the appropriate files
  • Your report should read as a narrative instead of a listing of detached answers and include transitions between the sections
  • Make sure that the name of everyone in your group appears in the author: field in the YAML header!
  • Include embed-resources: true in your YAML header.