The Lasting Legacy of Redlining

Introduction to the Data

The data that will be used here is one of FiveThirtyEight’s data sources for their “The Lasting Legacy of Redlining” article.¹ This data set has the 2020 total population estimates by race/ethnicity (based on the 2020 census) for combined zones of each redlining grade (which can range from A (“Best”) to D (“Hazardous”)²) from the Home Owners’ Loan Corporation’s (HOLC) maps drawn in 1935-40. The maps were provided from the Mapping Inequality project.

Code

library(tidyverse)
redlining_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/redlining/metro-grades.csv")

The data set also includes population estimates in the surrounding area of each metropolitan area’s HOLC map and location quotients (LQs, which are compared to their proximity to 1, which indicates an accurate representation of a population)³ for each racial/ethnic group and HOLC grade. You can find more information on the data set, variables, and calculations in the Github link listed above.

Below we give an overview of the data sets contents:

Code

glimpse(redlining_data)

Rows: 551
Columns: 28
$ metro_area          <chr> "Akron, OH", "Akron, OH", "Akron, OH", "Akron, OH"…
$ holc_grade          <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", …
$ white_pop           <dbl> 24702, 41531, 73105, 6179, 16989, 26644, 56878, 16…
$ black_pop           <dbl> 8624, 16499, 22847, 6921, 1818, 7094, 16795, 19581…
$ hisp_pop            <dbl> 956, 2208, 3149, 567, 1317, 4334, 10357, 6688, 367…
$ asian_pop           <dbl> 688, 3367, 6291, 455, 1998, 2509, 6355, 2191, 21, …
$ other_pop           <dbl> 1993, 4211, 7302, 1022, 1182, 4650, 11153, 4364, 8…
$ total_pop           <dbl> 36963, 67816, 112694, 15144, 23303, 45230, 101538,…
$ pct_white           <dbl> 66.83, 61.24, 64.87, 40.80, 72.91, 58.91, 56.02, 3…
$ pct_black           <dbl> 23.33, 24.33, 20.27, 45.70, 7.80, 15.68, 16.54, 39…
$ pct_hisp            <dbl> 2.59, 3.26, 2.79, 3.75, 5.65, 9.58, 10.20, 13.48, …
$ pct_asian           <dbl> 1.86, 4.96, 5.58, 3.00, 8.57, 5.55, 6.26, 4.42, 1.…
$ pct_other           <dbl> 5.39, 6.21, 6.48, 6.75, 5.07, 10.28, 10.98, 8.79, …
$ lq_white            <dbl> 0.94, 0.86, 0.91, 0.57, 1.09, 0.88, 0.84, 0.51, 1.…
$ lq_black            <dbl> 1.41, 1.47, 1.23, 2.76, 0.66, 1.33, 1.40, 3.35, 0.…
$ lq_hisp             <dbl> 1.00, 1.26, 1.08, 1.45, 0.77, 1.30, 1.39, 1.83, 0.…
$ lq_asian            <dbl> 0.46, 1.23, 1.38, 0.74, 1.21, 0.78, 0.88, 0.62, 0.…
$ lq_other            <dbl> 0.97, 1.11, 1.16, 1.21, 0.72, 1.47, 1.57, 1.26, 1.…
$ surr_area_white_pop <dbl> 304399, 304399, 304399, 304399, 387016, 387016, 38…
$ surr_area_black_pop <dbl> 70692, 70692, 70692, 70692, 68371, 68371, 68371, 6…
$ surr_area_hisp_pop  <dbl> 11037, 11037, 11037, 11037, 42699, 42699, 42699, 4…
$ surr_area_asian_pop <dbl> 17295, 17295, 17295, 17295, 41112, 41112, 41112, 4…
$ surr_area_other_pop <dbl> 23839, 23839, 23839, 23839, 40596, 40596, 40596, 4…
$ surr_area_pct_white <dbl> 71.24, 71.24, 71.24, 71.24, 66.75, 66.75, 66.75, 6…
$ surr_area_pct_black <dbl> 16.55, 16.55, 16.55, 16.55, 11.79, 11.79, 11.79, 1…
$ surr_area_pct_hisp  <dbl> 2.58, 2.58, 2.58, 2.58, 7.36, 7.36, 7.36, 7.36, 26…
$ surr_area_pct_asian <dbl> 4.05, 4.05, 4.05, 4.05, 7.09, 7.09, 7.09, 7.09, 3.…
$ surr_area_pct_other <dbl> 5.58, 5.58, 5.58, 5.58, 7.00, 7.00, 7.00, 7.00, 4.…

The unit of observation corresponds to a metropolitan area’s combined total population estimates, surrounding area population estimates, percentages of both, and LQs for all of the zones in that area that correspond to the same HOLC grade. For example, the first observation describes the aforementioned data points for all locations within Akron, OH that have an “A” HOLC grade. Reading the columns from left to right, the first row tells us that:

The metropolitan area is Akron, Ohio.
The grade of the combined HOLC zones within Akron, OH is A.
The non-Hispanic white population is 24702.
The non-Hispanic Black population is 8624.
The Hispanic/Latino population is 956.
The Asian population is 688.
The population of those in any other race/ethnicity group is 1993.
The total population in these HOLC zones is 36963.
The estimated percentage of non-Hispanic white residents in HOLC zones with an A grade in Akron, OH out of the total population is 66.83%.
The estimated percentage of non-Hispanic Black residents in HOLC zones with an A grade in Akron, OH out of the total population is 23.33%.
The estimated percentage of Hispanic/Latino residents in HOLC zones with an A grade in Akron, OH out of the total population is 2.59%.
The estimated percentage of Asian residents in HOLC zones with an A grade in Akron, OH out of the total population is 1.86%.
The estimated percentage of residents with another race/ethnicity in HOLC zones with an A grade in Akron, OH out of the total population is 5.39%.
The non-Hispanic white location quotient (LQ) is 0.94.
The non-Hispanic Black location quotient (LQ) is 1.41.
The Hispanic/Latino location quotient (LQ) is 1.
The Asian location quotient (LQ) is 0.46.
The other race/ethnicity location quotient (LQ) is 0.97.
The estimated non-Hispanic white population within surrounding area of the HOLC zones is 304399.
The estimated non-Hispanic Black population within surrounding area of the HOLC zones is 70692.
The estimated Hispanic/Latino population within surrounding area of the HOLC zones is 11037.
The estimated Asian population within surrounding area of the HOLC zones is 17295.
The estimated other race/ethnicity population within surrounding area of the HOLC zones is 23839.
The estimated percentage of non-Hispanic white residents in the surrounding area around the HOLC zones out of the total surrounding area population is 71.24%.
The estimated percentage of non-Hispanic Black residents in the surrounding area around the HOLC zones out of the total surrounding area population is 16.55%.
The estimated percentage of Hispanic/Latino residents in the surrounding area around the HOLC zones out of the total surrounding area population is 2.58%.
The estimated percentage of Asian residents in the surrounding area around the HOLC zones out of the total surrounding area population is 4.05%.
The estimated percentage of other race/ethnicity residents in the surrounding area around the HOLC zones out of the total surrounding area population is 5.58%.

It is important to note that the population estimates are rounded to the next integer and the percentages are rounded to the nearest 2 decimal places.

Data Wrangling

In this section of the project, the data has been manipulated in order to show the specific variables we are working with in our data set. Later, each of these variables will be described further along with what we have used these variables for.

Code

redlining_data |> select(metro_area, holc_grade, white_pop, pct_white, pct_black, pct_hisp, pct_asian, pct_other)

# A tibble: 551 × 8
   metro_area        holc_grade white_pop pct_white pct_black pct_hisp pct_asian
   <chr>             <chr>          <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
 1 Akron, OH         A              24702      66.8     23.3      2.59      1.86
 2 Akron, OH         B              41531      61.2     24.3      3.26      4.96
 3 Akron, OH         C              73105      64.9     20.3      2.79      5.58
 4 Akron, OH         D               6179      40.8     45.7      3.75      3   
 5 Albany-Schenecta… A              16989      72.9      7.8      5.65      8.57
 6 Albany-Schenecta… B              26644      58.9     15.7      9.58      5.55
 7 Albany-Schenecta… C              56878      56.0     16.5     10.2       6.26
 8 Albany-Schenecta… D              16806      33.9     39.4     13.5       4.42
 9 Allentown-Bethle… A               1076      66.6      4.38    22.7       1.3 
10 Allentown-Bethle… B              16774      58.2      6.81    27.6       2.54
# ℹ 541 more rows
# ℹ 1 more variable: pct_other <dbl>

Final list and description of data:

metro_area (categorical variable): is the zones where the data was collected from
holc_grade (categorical ordinal variable) : is the grade given to the area
white_pop (numeric variable): is the white population in that area
pct_white (numeric variable): is the percentage of white people living in that area
pct_black (numeric variable): is the percentage of black people living in that area
pct_hisp (numeric variable): is the percentage of hispanic people living in that area
pct_asian (numeric variable): is the percentage of asian people living in that area
pct_other (numeric variable): is the percentage of people who do not have a specified race living in that area

This data set is considering the way redlining works to divide up different people into different areas.

Summary Statistics

Code

library(knitr)

After having a general sense of this data, we created Table 1, which is a table of summary statistics of four variables: “holc_grade”, “white_pop”, “total_pop”, and “pct_white”. The categorical variable “holc_grade” was used to group the numeric variables of “white_pop”, “total_pop”, and “pct_white”. Based on these statistics, we can state that areas with a higher percentage of white people are given a higher HOLC grade. In general, the number of white individuals also has the highest population out of the total population in areas with higher HOLC grades, especially areas with HOLC grades A and B.

Code

redlining_data |>
  group_by(holc_grade) |>
  skimr::skim("white_pop", "total_pop", "pct_white") |>
  rename(`Missing Obs.` = n_missing)

Variable type: numeric

skim_variable	holc_grade	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
white_pop	A	1	11641.85	21984.19	188.00	1679.50	4566.00	14625.25	205702.00	▇▁▁▁▁
white_pop	B	1	30901.47	74403.34	787.00	4855.00	10171.00	24992.25	752223.00	▇▁▁▁▁
white_pop	C	1	49202.88	126446.11	158.00	7395.00	15605.00	39031.00	1164087.00	▇▁▁▁▁
white_pop	D	1	23757.23	84629.09	183.00	2797.00	6204.50	15791.25	914030.00	▇▁▁▁▁
total_pop	A	1	17312.38	34613.85	228.00	2913.75	6048.00	19347.75	313867.00	▇▁▁▁▁
total_pop	B	1	63200.23	183446.18	1848.00	8010.00	19707.50	42165.00	1903251.00	▇▁▁▁▁
total_pop	C	1	136444.43	470333.41	351.00	14620.00	32594.00	92564.00	4558038.00	▇▁▁▁▁
total_pop	D	1	78458.62	304296.54	842.00	9139.00	15899.50	41134.00	3217775.00	▇▁▁▁▁
pct_white	A	1	73.78	15.21	11.29	68.53	77.54	83.04	94.12	▁▁▁▆▇
pct_white	B	1	59.94	17.89	6.63	50.03	62.86	72.12	90.97	▁▂▃▇▅
pct_white	C	1	48.65	19.48	6.99	33.69	48.43	63.34	87.65	▃▇▇▇▅
pct_white	D	1	39.39	20.88	3.77	22.56	39.85	53.08	86.17	▆▆▇▃▃

Table 1: A table of summary statistics of dataset. The four variables included are: the categorical variable holc_grade, and the numeric variables white_pop, pct_white, and total_pop.

Data Visualizations

Our goal is try to understand the correlation between grade assigned by the Home Owners’ Loan Corporation (HOLC grade) and population, particularly the white population, which is the most populous and dominant population in the history of the United States. Because of the long history of racial discrimination in the United States, we want to explore whether the idea of white privilege has an effect on HOLC grades. We will explore this correlation through four graphs using variables such as white population, percentage of white population, and their corresponding HOLC grades.

Figure 1 is a bar graph that takes in one categorical explanatory variable (HOLC grade) and outputs a numeric variable (White population).

Code

population_by_holc_grade <- redlining_data |>
  group_by(holc_grade) |>
  summarize(
    White = sum(white_pop),
    Black = sum(black_pop),
    Hispanic = sum(hisp_pop),
    Asian = sum(asian_pop),
    Other = sum(other_pop)
  )

Code

ggplot(data = population_by_holc_grade, aes(x = holc_grade, y = White)) +
  geom_col() +
  labs(
    x = "HOLC grade",
    y = "White population",
    title = "The total number of white population in areas with different HOLC grades"
  ) +
  scale_y_continuous(labels = scales::label_comma())

Figure 1: Number of white population in different HOLC class areas.

In addition to Figure 1, this graph takes in one more categorical explanatory variable (HOLC grade and Race). Instead of outputting the White population, this bar graph outputs the population of all race in areas with different HOLC grades. We can clearly see from Figure 2 that there are more white people living in areas with a higher HOLC grades (A and Bs).

Code

population_by_holc_grade_clean <- population_by_holc_grade |>
  pivot_longer(
    cols = c(White:Other),
    names_to = "race",
    values_to = "count"
  )

Code

ggplot(data = population_by_holc_grade_clean, aes(x = holc_grade, y = count, fill = race)) +
  geom_col(position = "dodge") +
  labs(
    x = "Grade assigned by the Home Owners' Loan Corporation (HOLC grade)",
    y = "Total population",
    fill = "Race",
    title = "The total population of all race in areas with different HOLC grades"
  ) +
  scale_y_continuous(labels = scales::label_comma())

Figure 2: Number of White, Black, Hispanic, Asian, and other races in different HOLC grades areas.

However, we cannot make conclusions based solely on the size of the white population, which is the largest population in the U.S. We also need to consider the percentage of the white population in each HOLC zone. Figure 3 is a histogram that takes in one numeric explanatory variable (Percentage of white population) and outputs the number of HOLC zones. It counts the number of HOLC zones with different percentages of white population.

Code

ggplot(data = redlining_data, aes(x = pct_white)) +
  geom_histogram(binwidth = 12.5, boundary = 50, color = "white") +
  labs(
    y = "Number of HOLC zones",
    x = "Percentage of white population"
  )

Figure 3: Number of HOLC zones with different white population percentage.

Finally, we finalize Figure 3 by taking in the categorical explanatory variable (HOLC grade). Figure 4 counts the number of HOLC zones of the percentage of white population with a given HOLC grade (A-D).

Code

ggplot(data = redlining_data, aes(x = pct_white, fill = holc_grade)) +
  geom_histogram(binwidth = 12.5, boundary = 50, color = "white") +
  labs(
    y = "Number of HOLC zones",
    x = "Percentage of white population",
    fill = "HOLC grade"
  )

Figure 4: Number of HOLC zones of the percentage of white population with a given HOLC grade

We can see from Figure 4 that HOLC zones with a higher percentage of white population will receive a higher HOLC grades. In other words, we can conclude that there is a correlation between the percentage of white population and HOLC grades.

Ethics

This data is about redlining. Redlining is refusing (a loan or insurance) to someone because they live in an area deemed to be a poor financial risk. It is racial discrimination and is a big problem in America. To fix problems like this, it is important to have the data to know the numbers behind the problem. Looking at data about issues of racial discrimination, it is important that it does not skew the data to make it look like there is no racial discrimination at play (Lloyd 2016). As a group, we thought it was important to look into redlining because of the long history of racial discrimination in the United States and wanted to explore whether the idea of white privilege has an effect on HOLC grades.

Tie the data to the Final Project’s goal

This data has, throughout the project helped us to understand the way that redlining works. In particular we have examined this issue broken down into race in each zone and how that relates to several variables given in the data set. One critical factor we looked into was the grade given to each set area. This grade was an essential part of our project that informed much of our visualizations and gives critical insight into how redlining works.

Our original goal is try to understand the correlation between grade assigned by the Home Owners’ Loan Corporation (HOLC grade) and population, particularly the white population, which is the most populous and dominant population in the history of the United States. We sought to explore this correlation through four graphs using variables such as white population, percentage of white population, and their corresponding HOLC grades. This does meet the final goal. I think this best fits in with Lab #10, when we talked about Data science ethics as well as Lab #11, which built upon Lab #10, when we walked about codes for Data science ethics. It is important that data analysis is correct, but if the analysis is discriminatory, it should be disregarded. Ethics when it comes to data science is extremely important. Not only is it important to try to weigh ethics and topics that we discussed in lab 10 and 11, but it is also important to shed light on data that illuminates issues of discrimination which we found that the redlining data did.

Potential Questions

Does this data offer a fair and just statistical picture of redlining?

Are there any gaps in the data that we think are important to make note of?

Are there biases that we are going in with that could change the way that we interact with the data?

Where might further research take us? Are there certain racial groups that we could focus in on instead of looking at a more general data picture?

References

Lloyd, James M. 2016. “Fighting Redlining and Gentrification in Washington, DC: The Adams-Morgan Organization and Tenant Right to Purchase.” Journal of Urban History 42 (6): 1091–1109. https://doi.org/10.1177/0096144214566975.

Footnotes

https://projects.fivethirtyeight.com/redlining/↩︎
You can read more about the redlining zone classifications here.↩︎
More information on location quotients.↩︎