About the course

SDS 410: (Capstone in Statistical & Data Sciences) is modeled as a data science consulting firm, wherein the instructor is the managing partner and the students are the junior partners. Students work collaboratively in teams of 3–5 on a project sponsored by a private company, research lab, or non-profit organization. All projects are centered around using data to solve real-world problems.

The ideal project:

  • is challenging but not impossible
  • would take a small team of students several months – not several weeks or years
  • has some public/open-source exposure (for their portfolio)
  • makes the world a better place

Each section consists of 25–30 students — all of whom are senior majors in statistical & data sciences. They have foundational training in statistics, computer programming, data visualization, data wrangling, communication, and ethics, as outlined in our curriculum, and in pursuit of our learning goals. They are ready to tackle projects involving statistical modeling, data visualization, web scraping, web or mobile app development, database development, etc. The end result might be a web API, a statistical model, a publishable research paper, an internal white paper, a web application, or whatever else seems appropriate.

You can learn more about the course through the slides for this talk:

What’s in it for you?

These students are highly-motivated and capable. They will go on to top graduate schools and companies. Students from previous classes are working full-time at Google, MasterCard, CitiGroup, and MassMutual. Others are attending doctoral and master’s programs in statistics, computer science, and data science at Harvard University, Columbia University, the University of Michigan, the University of North Carolina, the University of Pittsburgh, and the University of California-Riverside. A sponsored project will provide you with:

  • dedicated attention to your project
  • recruiting (of underrepresented genders – all applicants to Smith College must identify as a woman)
  • goodwill and PR
  • a substantive attempt to solve your data problem! Of course we can’t promise a solution, but we will make a worthy attempt.

What we’ll need from you (and when)

The next iteration of the course starts Friday January 26, 2024 and ends Wednesday May 1, 2024. If you are interested in sponsoring a project, we’ll need:

By October 15: Contact Shiya Cao with

  • an idea for a project that involves some non-trivial data (that you have or know how to get)
  • a general description of your data (what format, roughly how many rows and columns, etc.)
  • a commitment from someone in your organization to serve as a contact person to interact with our students. This commitment might be on the order of one hour per week for 13 weeks, but could be more or less depending on your needs and availability. Contact could come via email, Slack, Zoom, etc.

By November 1:

  • a one paragraph description of your problem
  • a more specific description of your data
  • a three-minute video in which you introduce yourself, your organization, and describe your project to the students
  • a completed intake form

Who’s done this before?

Please see our project archive for a comprehensive list of previous projects.

NFL logo
NFL logo

Michael Lopez

“The Smith SDS students consistently demonstrated passion, intellect, and creativity that led us to think of enhanced and data-drive ways of improving our product. In particular, the students possessed outstanding data visualization skills, an appropriate understanding of uncertainty, and a willingness to work independently that consistently left us impressed.”

Met Council logo
Met Council logo

Mauricio León

“I have to say that this group has done an outstanding work that surpassed my expectations. They were able to work around the difficulties of not having all the resources they had at the beginning due to the pandemic. They found creative solutions to problems involving datasets that we too large to handle in a conventional way, and worked with tools that are familiar to our team at the Met Council. The team maintained a positive attitude during the development of the work, asked reasonable questions, and tried to clarify the tasks along the way. As a result, they produced something that is of significant value for the development of our greenhouse gas mitigation research at the Metropolitan Council.”

EBSCO logo
EBSCO logo

Kayleigh Hinckley

“The students from SDS 410 were great to work with. They were able to quickly understand the large, complex problem we presented, researched and assessed multiple avenues for solving it, wrangled data from several sources, and ultimately produced an algorithm with results that are meaningful and actionable. The work from these students was the foundation our business needed to propel forward in a competitive industry. We look forward to working with Smith students again!”

OpenElections logo
OpenElections logo

Derek Willis

“The Smith College students dove right into a complicated and messy set of tasks. They asked good questions and weren’t shy about making suggestions that improved the outcome. We were looking for a way to turn what was a repetitive manual task into one that could be at least partially automated, and they succeeded at producing code that trades data entry and other time-consuming methods for a better, faster approach. Finally, they were professional in their communication and maintained a positive attitude throughout the project.”

NBA logo
NBA logo

NBA team analyst

“It was a pleasure to collaborate with the SDS 410 students. We were impressed by the students’ technical capabilities and their ability to work independently. Their work was both well documented and clearly presented and overall we were very happy with the results they achieved.”

Wisconsin Project on Nuclear Arms Control

Meghan Crimmins

“The Smith College students we worked with were professional, timely, and were able to explain complex ideas in digestible ways. They communicated well throughout the project and were committed to finding a solution to our data problem. Their use of natural language processing to extract and structure previously unstructured data has helped inform our research on proliferation networks, and the macro-level visualizations they delivered have given us new ways to present our research to potential clients and the public. We look forward to working with Smith SDS students again in the future!”


Sabrina Imam

“Their work has helped us figure out the entity definitions of a large portion of the database. We found the students to be very enthusiastic, smart, and professional in their approach. They were very quick to understand our database, and were always on time for our weekly calls despite the 10-hour time difference. We especially appreciate the students’ creative thinking and their use of some new and innovative tools to create a column-to-column connectivity diagram and a network graph that was outside the project scope.”


Ryann Grochowski Jones

“The SDS 410 students were communicative and professional. They worked independently but weren’t afraid to ask questions if something was unclear. I’m eager to work with Smith students again in future semesters.”