Ethics patches in a box

OkCupid, what should I do now?

Author

blinded

Published

November 6, 2024

Overview

Description of course/setting

  • Actual: senior capstone in statistical & data sciences
  • Generic:
    • introductory data science course
    • any course that covers web scraping

NAS ethical areas

  • Ethical precepts for data science and codes of conduct
  • Privacy and confidentiality
  • Responsible conduct of research
  • Ability to identify “junk” science

Questions/goals addressed

  • If I scrape data about other people and want to publish it, what is my responsibility to protect the privacy of the people whose information I scraped?
  • How do my ethical responsibilities differ from my legal responsibilities?
  • Are there technical solutions that could allow me to publish the data while protecting people’s privacy?
  • Are my ethical responsibilities impacted by the value of the data?

Bloom taxonomy

  • Application: Students apply knowledge of data science ethics in real-world context

Generalizability

  • any publishing of personal data
  • any attempt at web scraping

Lesson plan for instructors

Student preparation required

  • no technical knowledge required
  • before class, read:
    • Kim and Escobedo-Land (2015)
    • Poulsen (2014)
    • Hackett (2016)
  • (optional) other reference material:
    • Kirkegaard and Bjerrekær (2016)

Instructions for students

  • Your goal in this class activity is to help Prof. Kim draft a letter to the editor of the Journal of Statistics Education that addresses any ethical considerations you find present in Kim and Escobedo-Land (2015). Perhaps you think Prof. Kim should fully retract the article. Perhaps you think a partial retraction that removes or alters some part of the data is appropriate. Perhaps you think no action is necessary.
  • You may want to consult the Internet for relevant facts during the group discussion.
  • Have at least one person taking notes during your discussion, preferably on a white board that all group members can see.
  • There is no right or wrong answer. However, your choices should be based on informed logical reasoning.

Activity description

  1. (10 minutes): Give a brief overview of the three uses of OkCupid data. Focus on Prof. Kim’s responsibilities as a published author. The students should have already read the assigned material.
  2. (20 minutes): Break the class into four groups. Each group will discuss one of the following topics:
    1. Other OkCupid users
      • Discuss the use of OkCupid data by McKinlay and Kirkegaard. Was that use ethical? How did OkCupid respond? How does their work impact Prof. Kim’s situation?
    2. Anonymization and differential privacy
    3. Terms of Service
    4. Value of Data
      • What is the pedagogical value of the data set? How should these benefits be weighed against the potential for harm?
  3. (15 minutes): Have each group draft a one-paragraph statement to the editor of the Journal of Statistics Education on your topic that all group members agree with.
  4. (15 minutes): Have each group contribute their paragraph to a shared Google Doc that is presented to the whole class. Have each group discuss their contribution.
  5. (5 minutes): Wrap up. Tie up loose threads and try to connect key ideas to other topics in the class, recent events on campus, or current news.

Deliverables

  • one paragraph from each group
  • (optional): offer students extra credit to edit the Google Doc into a formal letter to the editor of JSE suitable for publication. Send the letter to Prof. Kim!

References

Hackett, Robert. 2016. “Researchers Caused an Uproar by Publishing Data from 70,000 OkCupid Users.” Fortune. https://fortune.com/2016/05/18/okcupid-data-research/.
Kim, Albert Y, and Adriana Escobedo-Land. 2015. OKCupid Data for Introductory Statistics and Data Science Courses.” Journal of Statistics Education 23 (2). https://amstat.tandfonline.com/doi/abs/10.1080/10691898.2015.11889737.
Kirkegaard, Emil OW, and Julius D Bjerrekær. 2016. “The OKCupid Dataset: A Very Large Public Dataset of Dating Site Users.” Open Differential Psychology 46.
Poulsen, Kevin. 2014. “How a Math Genius Hacked OKCupid to Find True Love.” Wired. https://www.wired.com/2014/01/how-to-hack-okcupid/.
Zimmer, Michael. 2010. “"But the Data Is Already Public": On the Ethics of Research in Facebook.” Ethics and Information Technology 12 (4): 313–25. https://link.springer.com/article/10.1007/s10676-010-9227-5.