Lab 11: Ethical Codes for Data Science
Preamble
The purpose of this lab is to discuss best practices for working with data science and statistics through an ethical lens.
After completing this lab, you should be able to identify practices and procedures for engaging in computational work that is more ethical, to compare and contrast different methods for working with data through an ethical lens as described by one of the statements of ethical guidelines. This lab is different than the previous ones because we will be spending half an hour in a structured group discussion, follow by time to work on the final project rotation step for this week.
Why are we here?
Last week, we discussed the issues that can arise from computational work when one “focuses on just the data science.” Today, we will discuss different ethical codes that various professional societies have proposed. The goal of today is to make progress in defining practical steps to engage in more ethical data science and statistical work.
Preliminary work
Leading up to this week, you engaged with the following materials on reproducibility and ethical codes:
- Data Values and Principles Manifesto
- Data Science Oath from the National Academies of Sciences, Engineering, and Medicine
You also read one of the following:
- Teaching Ethics in a Statistics Curriculum with a Cross-Cultural Emphasis (Elliott, Stokes, Cao). This reading is available as a PDF on Moodle. Elliott, Stokes, and Cao (2018)
- Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence “Fact Sheet: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence” (2023)
- It’s the Austerity, Stupid: How We Were Sold an Economy-Killing Lie
Drum (2013)
Lab instructions
Today will be a structured discussion during the first part of class. Unless you need to reference a reading, please close your laptop and focus on the discussion. There will be time afterwards for your next project rotation.
Question
Give your groupmates a summary of the article you read. What did it cover? What were your main takeaways?
Next, your group will discuss the following three questions for 15–20 minutes.
Question
What surprised you most about these readings?
Question
What frustrated you most about these readings?
Question
What suggestions do you have for processes and procedures to avoid, mitigate, or lessen bias and differential impact from data science on the world?
Reporting back
Then we will come together as a class for a short conversation before transitioning to individual work on the final project.
In your group, pick one person to share out for each question and agree on what will be said for each question. You may find it helpful to jot down notes.
We’ll spend another 10–15 minutes reporting back to the class about what we discussed in our groups.
One minute essay
Finally, spend one minute writing down an answer to the following question:
Question
What came up during the wrap up session that was new to you? In other words, what points were discussed in the wrap up that were not part of your discussion?
Submitting this lab
For this lab submission, you will be practicing creating visualizations for one of the datasets in your group’s rotation and then referencing those figures in your .qmd
file.
Step 1: Final Project Prep, Rotation 4: Visualizing the Data
Step 1.1: Get Rotation Document
Before beginning this lab submission, you will need to get the rotation document from your fellow group mate. If you are unsure whose document you are inheriting, consult your Lab 8 submission for your group’s rotation schedule.
Step 1.2: Create visualizations
Create (at least) three visualizations for the rotation data. Across all three of your submitted visualizations, you should:
- Use categorical variables and numerical variables. You may have one plot that uses just categorical data, another that uses just numerical variables, and one that uses a combination of the two.
- Have at least one bivariate plot and at least one trivariate plot.
Step 1.3: Describe your visualizations
For each of your plots write 2–3 sentences describing the plots. Your descriptions should discuss the variables used and what the plot shows about the data.
Step 1.4: Add cross-references
Now that you have both the visualizations and the descriptions of your documents, in this step you will link them together using cross-references. In your descriptions, use Markdown formatting as covered in lab 10 to explicitly reference the figures. [You can check that your references are working properly by changing the order of your pictures.]
Step 1.5: Submit this document on Moodle AND Rotate this document
Once you are happy with your visualizations and descriptions (including the cross-references!), render your document. Check that everything renders as you expect and then:
- Submit both the
.qmd
file and the.html
file for this rotation document on Moodle under “Project Rotation 4” AND - Send the Quarto document (and any additional files need for rendering it, such as any local
.csv
files) to the next person in your group’s rotation.
Step 2: Complete the Reflection on Moodle
Write a reflection on the readings and discussion. Save it locally, and then paste it in the Moodle question. Please write at least 150 words. Submissions that are on topic and at least 150 words will receive credit. This counts towards your Weekly Lab Quizzes token.
Questions to consider:
What surprised or frustrated you about the readings?
Do you have recommendations for how to mitigate potential bias in the practice of statistics and data science?
What questions might you ask of yourself, the people you work with, or your data to encourage responsible use of data?
How has your perspective changed through the course of the readings and discussion?
Optional reading
To prepare for next week’s lab on citations and BibTeX, we encourage reading the following: