Lab 9: Ethics

Preamble

The purpose of this lab is to discuss the potential large-scale impacts and harms of data science and statistics.

After completing this lab, you should be able to identify examples of algorithmic bias, differential harm, and issues concerning data privacy. This lab is different than the previous ones because we will be spending half an hour in a structured group discussion, follow by time to work on the final project rotation step for this week.

Why are we here?

So far this course has focused on the mechanics of using RStudio and building Quarto documents. These skills are critical for engaging with the computational aspects of both statistical and data sciences. But we—the SDS faculty—do not regard these computational skills as the only introductory skills that one needs to begin a statistics or data science project. The other necessary skills include the ability to identify ethical considerations and to be aware of practices that support work that is more ethical and just.

Recently the examination of ethics and ethical issues has become a central topic of discussion and area of research in both statistics and data science. In preparation for this lab, you have read several articles about ethical issues and considerations within data science and statistics. We want to stress that these materials were selected to be the beginning of your journey considering and examining ethics within statistics and data science; these articles alone are not sufficient for a complete education in ethics within statistics and data science.

Preliminary work

Leading up to this week, you engaged with the following materials about algorithmic bias and data privacy:

Lab instructions

Today will be a structured discussion during the first part of class. After an introduction to the activity, your group will discuss the following three questions for 15–20 minutes.

For each question, write at least three ideas, questions, or discussion points that came up during your conversation:

Question

What surprised you most about these readings?

Question

What frustrated you most about these readings?

Question

Statistics and data science are often regarded as fields free from bias. Do you agree that this is the case? Why or why not?

Reporting back

Then we will come together as a class for a short conversation before transitioning to individual work on the final project. In preparation for the full class discussion, there is a Google slide deck linked on Moodle, and each group will have a slide to document their conversation. Using the slides you created, we’ll spend another 10–15 minutes reporting back to the class about what we discussed in our groups.

One minute essay

Finally, spend one minute writing down an answer to the following question:

Question

What came up during the wrap up session that was new to you? In other words, what points were discussed in the wrap up that were not part of your discussion?

Submitting this Lab

For this lab submission, you will be selecting variables from one of the datasets in your group’s rotation as well as filtering on that dataset.

Step 1: Final Project Prep, Rotation 2 - Selecting and Filtering the Data

Step 1.1: Get Rotation Document

Before beginning this lab submission, you will need to get the rotation document from your fellow group mate. If you are unsure whose document you are inheriting, consult your Lab 8 submission for your group’s rotation schedule.

Step 1.2: Select Variables

After reading the introduction of the data (rotation 1), select at least three variables that you think are the most critical to this dataset.

Once you have made your selections, use select() to only use those variables.

Step 1.3: Filter Your Data

Once you have made your variable selections, determine whether there are any classes of observations that you wish to exclude. If so, use filter() to create a subset your data.

Step 1.4: Provide a glimpse of the resulting data

Provide a glimpse() of the resulting data object. How many rows and columns are in your result?

Step 1.5: Describe the Data

Describe the selected and filtered data in detail. You should including a list of the variables with details about each one. Your details should include the type of variable and a summary of information in each.

Step 1.6: Submit this document on Moodle AND Rotate this document

Render your document. Check that everything renders as you expect and then:

  • Submit both the .qmd file and the .html file for this rotation document on Moodle under “Project Rotation 2” AND
  • Send the Quarto document (and any additional files need for rendering it, such as any local .csv files) to the next person in your group’s rotation.

Step 2: Complete the Reflection on Moodle

Write a reflection on the readings and discussion. Save it locally, and then paste it in the Moodle question. Please write at least 150 words. Submissions that are on topic and at least 150 words will receive credit. This counts towards your Weekly Lab Quizzes token.

Questions to consider:

  • What surprised or frustrated you about the readings?

  • Do you have recommendations for how to mitigate potential bias in the practice of statistics and data science?

  • What questions might you ask of yourself, the people you work with, or your data to encourage responsible use of data?

  • How has your perspective changed through the course of the readings and discussion?

Optional reading

To prepare for next week’s lab on referencing images and figures in your Quarto documents, we recommend reading the Cross References Page in the Quarto documentation website.