Automating NYC

and (en)coding inequality?

Floating pink button with a white megaphone inside

Maybe you've heard that New York City has been using algorithms to improve our lives.

Like preventing fires by predicting and inspecting high-risk buildings.

Or reworking high school admissions to more effectively match students.

Or maybe you've heard that they're making people's lives worse.

Like targeting Black and Latinx communities with stop and frisk.

Or setting bail and sentencing using an algorithm where the results were biased against Black people.


hover for more information!

These are all examples of automated decision systems (ADSs) — processes that rely on computerized components to make or influence a decision.

We want to empower New Yorkers to advocate for ADSs that work to undo unjust systems instead of encoding inequality.

1


Why Do ADSs Matter?

Just ask Porfirio Mejia.*

Since 2012, Porfirio's been running 128 P&L Deli Grocery, a bodega in Washington Heights.

Locals know they can count on him for groceries, even when times are tough. Just as important, the bodega serves as a space for building community — a place where neighbors go to hang out, watch baseball, and smoke cigars.

Like many of his neighbors, Porfirio is a Dominican immigrant. He works with an anti-hunger advocacy group and knows the importance of the Supplemental Nutrition Assistance Program (SNAP), also known as food stamps. More than half of his customers bought their groceries with SNAP benefits from the US Department of Agriculture (USDA).

But when their benefits hadn't come in yet, he let patrons take groceries home trusting that they would come back later to settle their bills, like an informal IOU system. This IOU system allowed Porfirio to help members of his community feel secure about their food even if they felt insecure about their finances.

*Original reporting by The Intercept and and The New Food Economy

In 2018, a computer program almost put his bodega out of business

Porfirio got a notice from the USDA that his establishment was disqualified from accepting SNAP on suspicion of food stamp fraud. Earlier that year, the City, coordinating with the USDA, started using a computer program to find cases of trading cash for food stamps.

While the computer program hasn't been made public, it's likely that Porfirio's IOU system triggered the fraud suspicion.

The automated decision system (ADS) looked like this

if single item purchase > $100
and multiple such purchases
then flag for fraud

1. Sales Data

Sales data gets tracked on food stamp cards like electronic benefit transfer (EBT) cards and sent to the USDA.

2. Fraud Detection

A computer program uses an algorithm and sales data to flag potential fraud. The algorithm used by the program may have noticed large one-time purchases made at P&L Deli and mistakenly assumed it was a case of 'cash for food stamps' fraud.

3. Fraud Notice

Then, USDA staff sent a fraud notice to Porfirio.



Porfirio is struggling to keep P&L Deli going.

His sales have dropped by 30%.

The USDA told Porfirio to show them itemized receipts to prove his innocence, but Porfirio's registers were only able to print total sales figures. The letters his customers sent in as proof were deemed insufficient evidence.

Porfirio has called the USDA, but no one has been able to reverse the decision made by the algorithm.

Using a flawed algorithm, this automated decision system almost destroyed Porfirio's livelihood and hurt people who really needed food.

Why should we worry about this?

P&L Deli wasn't alone: the majority of businesses impacted by this new system have been in low-income neighborhoods like Porfirio's.

ADSs seem to keep expanding into more areas of our lives. From policing to school assignments, the decisions made by ADSs can have far-reaching consequences for all of us.

Governments tell us ADSs are being used to increase efficiency and improve service-delivery. But, as Porfirio's experience shows, that might have unintended harmful consequences.

But completely human systems also mess up, right?

Definitely. Just look back to the history of housing segregation and redlining.

Even though human-centric systems also make mistakes, ADSs are unique in the risk they pose. At the same time, if ADSs are created thoughtfully, they can be powerful tools to improve our lives and society.

  • ADSs can work at a more rapid pace and at a larger scale than human decision processes. This could accelerate injustices.
  • When they make mistakes, the ADS might not have human checks to override or course-correct them.
  • ADSs can be used to increase policing and criminalization of marginalized communities.
  • Implementing ADSs are often used to justify collecting a lot more data about people, breaching personal privacy.

  • The scale and speed of ADSs could be used to bring benefits to the communities that most need them when government resources and time are limited.
  • ADSs could build in more transparency than human systems so we know exactly how a decision is being made.
  • Since ADSs are using a lot more data, they might be able to see patterns that we can't see on our own, helping us uncover our own biases.
ADSs might be able to help us in poweful ways, but they can also make problems like racism, discrimination, surveillance, and inequality much worse, much faster.

2


What are Algorithms?

They're just instructions.

Like in Porfirio's case, all ADSs use computer programs that are made up of algorithms — detailed sets of automated, computerized instructions. These algorithms work together to complete a task.

Just think about how you buy an avocado. What factors do you consider?

I want one that's affordable.

Hopefully it's organic, but less than $1.50.

I'm going to make some guacamole to eat tonight,
so I want the ripest one I can find.

Hover over the avocados to find the best one.

First non-ideal avocado cut in half Second non-ideal avocado cut in half Third non-ideal avocado cut in half Best avocado cut in half jumping up and down
All of these calculations you're doing in your head are an algorithm.

You’ve taken some variables (the price, how ripe it is, and is it organic or not), given those variables weights (buying a cheap, ripe avocado is more important than an organic avocado), to analyze a set of data points (the pile of avocados) and reach a decision (which avocado should you buy).

What kind of algorithms do ADSs use?

It depends. They can sometimes be a bit more complicated than the ones we use in our everyday decisions. Some ADSs involve multiple algorithms, hundreds of thousands of data points, and variables that are given different weights.

But really, they all use the same building blocks for the instructions — variables, weights, and data points.

In some cases, people have control over what steps the algorithm takes. People like you decide which variables are the most important in buying avocados or predicting that a place is committing food stamp fraud.

In other cases, data is given to a computer, the computer finds patterns in the data, and then decides which variables are important based on the patterns it finds.

Two avocado halves with the pits still inside

When a computer finds patterns on its own, rather than being told which variables are important, the set of instructions it uses is called a machine learning algorithm. They are often less transparent because we don't always know why the computer places more or less importance on different variables.

Can I play with an algorithm?

We created this algorithm to help an imaginary fire department predict which buildings are at high risk for fire.

All of the boxes represent real buildings. Hover over them to see building characteristics. Click on different combinations of variables in the bubbles below to add or remove them from the algorithm.

Once the predicted fire risk passes a threshold of 30% fire risk, the building will appear with a flame to alert the fire department to inspect it!

*Created using data from the NYC Open Data Portal.

Hmm! Doesn't look like building age changed much. It has a low weight in the algorithm.

Interesting! Property value has a negative correlation. This means that the more valuable a property is, the less likely it is to catch fire.

Whoa! Height has a big impact. Looks like that variable has a high weight in the algorithm.

Seems like buildings in the Bronx have a higher risk than buildings in other boroughs. This could be because of other variables we didn't include in the dataset. Maybe buildings in the Bronx have a certain architectural style that is riskier. This is called ommitted variable bias.

Bigger buildings are more likely to catch on fire. That makes sense!

More people means greater risk. Fun fact! We didn't have an exact count of the number of people living in the buildings. So we used the number of residential units as a proxy variable.

Businesses have a higher risk of fire. There could be a variable we're missing that is correlated to businesses. Maybe a lot of these businesses are restaurants where kitchen fires can happen.

Algorithms are always influenced by humans because we decide what data to use and, often, which variables are important.

3


What Makes ADSs Good or Bad?

A good ADS addresses unjust systems

Every ADS reflects a set of values, whether they’re stated or not. We think that in an unequal and unjust world, the ADSs we build should strive for equity. If they don’t, they risk worsening the unjust systems that exist.

What’s an unjust system?

An unjust system is a set of political, cultural, and economic conditions that perpetuate inequality.

For example, think of access to credit and financial resources.

Poor communities and communities of color often have limited access to credit. This is the legacy of policies like redlining that prevented people who lived in non-White and immigrant neighborhoods from getting mortgage loans. The racial and class prejudices of bankers have led to discrimination against groups that they stereotyped as less reliable. If you look around, you’ll notice that, even today, the kinds of lenders in communities of color are often high-fee check-cashing services rather than banks.

All of these conditions create a system that reinforces poverty and prevents people from building wealth.

How can ADSs address these systems?

We can start with the ADS’s purpose — what it’s intended to accomplish.

At a minimum, an ADS’s purpose should include actively understanding the unjust systems its operating within. At best, ADSs will actively work to undo the injustice.

In Porfirio’s case, the USDA’s ADS failed to account for his customers’ lack of access to credit and the need for an informal IOU system. If the USDA had understood the unjust system it was operating within, it could have designed its ADS to distinguish between real fraud and Porfirio’s method for extending credit.

Even better, if the USDA sought to undo injustice, it could design the ADS to target the conditions that lead to poor credit access. The ADS might identify areas to increase SNAP benefits, make them more flexible, or use even bolder solutions.

Isn’t increasing efficiency a good enough purpose?

Nope.

Most government ADSs are intended to increase efficiency. For example, the USDA’s ADS was meant to efficiently catch SNAP (food stamp) fraud. A computer is faster than a human at looking through thousands of pages of financial records.

But efficiency alone is not a good purpose. Efficiency speeds up processes, but if that process is already creating inequality, speeding it up makes things worse.

ADSs are a unique opportunity to acknowledge and address the assumptions and existing decision-making processes that might perpetuate an unjust system.

We have a good purpose, now what?

If the purpose of the ADS is good because it understands or better yet undoes unjust systems, we need to make sure that we design the ADS to accomplish this purpose.

There are five design decisions we should think about:
impact bias explainability automation and flexibility.

Exclamation mark icon

Impact

Who is affected by the ADS and how?

Who the ADS affects and how it affects them helps us evaluate the potential harm.

Scale

How many people might be affected by the ADS?

Scope

How deep of an impact will the decision have on affected people? There's deep impact when a decision affects something important. For example, knowing which building might catch on fire is a matter of life and death.

Shallow impact is when a decision affects something inessential to life or when the impact is not very severe.

Vulnerability

How vulnerable are the groups that the ADS impacts? If an ADS impacts a group that has historically faced discrimination and removes resources from that group, the ADS is probably going to worsen inequality.

Forward slash icon

Bias

What inputs in the ADS create problems?

Bias in the data or algorithm might worsen unjust systems.

Data

Data is never objective. We embed our human bias when we collect, store, and use data. All data is biased; it may contain errors or reflect existing inequalities in the world or both.

When data is biased, it might tell us to do something that's not purely objective. It might tell us to do something based on the existing biases of the world that it's capturing.

For example, if we use arrest data by neighborhood to distribute the police force, we may send police to the same neighborhoods that they have historically policed, which leads to even more collection of arrest data in those same neighborhoods. This might then be used to justify sending even more police to that neighborhood. This creates a vicious cycle.

These biases can arise because all datasets, to some extent, are incomplete. It is impossible to capture the world absolutely and perfectly. The best datasets try to be as complete as they need to be for the purpose and acknowledge their bias.

Algorithmic Model

The type of algorithm an ADS uses and the way it handles the variables and data points matter. Choosing the best algorithm for the problem depends on the ADS's purpose. There are hundreds of different algorithms that each calculate outputs slightly differently. There will always be trade-offs in deciding what model to use, and we must think carefully about each case and the potential harms.

If humans are determining specific components of the algorithm, this can also create bias. For many algorithms, humans will be determining the outcome of interest and the variables. These decisions are subjective because they are based on our own values and world views. For example, if you don't know that geographic area is highly linked to race and socioeconomic status, you might accidentally bias the algorithm by including this variable.

Question mark icon

Explainability

How easy is it for humans to understand how the ADS works?

When an ADS is explainable, it's easier to pinpoint where problems arise.

Transparency

Before we can even begin to understand how an ADS works, we need to be able to find information about it. To be completely transparent, an ADS's algorithms and data (as long as privacy concerns are met) should be made publicly available with documentation explaining them.

Ease of Understanding

Some algorithms are very easy to understand, like instructions for choosing an avocado. We know exactly which variables are going into the model and how they're weighted.

The avocado algorithm is a decision tree. Super explainable.

Others can feel like a black box. We give the data to the computer and it gives us a decision, but we don't really know how it reached that decision.

A neural network algorithm is not that explainable.

Asterisk icon

Automation

How much of the decision is made by the algorithm?

If the decision is made mostly automatically, it could worsen unjust systems without humans even knowing.

Low Automation

The ADS doesn't make a decision. It simply analyzes the data which is then used to inform a human decision.

Imagine a dashboard that tells you what percentage of reported potholes have been filled this month without suggesting a specific next action.

Medium Automation

The algorithm provides an advisory decision, but a human still looks over the decision and decides whether to implement it.

Imagine a scoring system that prioritizes buildings to inspect for fire risk.

High Automation

An algorithm makes a decision and the people using it simply implement that decision.

Imagine a traffic light system that sends you a ticket when it detects that you ran a red light.

Tilde (Squiggly Line) icon

Flexibility

Can the ADS be changed easily if people have feedback?

If there is a problem with the ADS, we want to be able to fix it.

Feedback Mechanisms

The best feedback is from people directly affected by the ADS or people who use the ADS on a regular basis.

Feedback could be in the form of an appeals mechanism, a complaint system, or something else.

Ability and Access to Update

While feedback is important, what a government can do with the feedback is limited based on whether or not they have the access and skills to update the algorithm.

This depends in part on who owns the ADS. Is it the government or a private company? Does the government have the in-house talent to update the algorithm?

Let's look at Porfirio's story through this framework.

Click on the cards to open them

Tap on the cards to open them

Exclamation mark icon

Impact

Who does the ADS affect and how?

Impact

Scale

Large scale. The USDA algorithm affects any business that accepts food stamps. In New York City, 1.6 million low income people rely on SNAP.

Scope

Deep. The loss of revenue for Porfirio was significant. Moreover, the community lost a reliable food source.

Vulnerability

In general the USDA algorithm impacts vulnerable communities because it impacts communities that rely on alternative financial systems, which are usually lower income and communities of color.

Forward slash icon

Bias

What inputs in the ADS create problems?

Bias

Data

Because Porfirio's bodega is in a low-income neighborhood, he is more likely to have more customers with food stamps. While the data is technically accurate, it is geographically skewed so that bodegas like Porfirio's receive more scrutiny whereas bodegas in rich neighborhoods receive less scrutiny.

Algorithmic Model

The algorithmic model the USDA used to identify 'cash for food stamps' fraud was using large one time purchases as a variable to indicate fraud. The model was designed in such a way that it couldn't distinguish between true fraud and Porfirio's IOU system. It simply flagged certain transactions as fraudulent, so there was bias in the model against those kinds of transactions.

Question mark icon

Explainability

How easy is it for humans to understand how the ADS works?

Explainability

The USDA's ADS is not transparent. We don't know exactly how this ADS or its algorithms work. Who looks at the results of the algorithm that is flagging cases of supposed fraud? What are the checks and balances around the decision-making process?

If we knew more about the process and components of the ADS, we would be able to see if there were design decisions to mitigate mistakes like miscategorizing Porfirio's bodega.

Asterisk icon

Automation

How much of the decision is made by the algorithm?

Automation

High

As far as we know, the USDA algorithm automatically flags transactions it categorizes as "fraud" and sends out a letter to a business owner notifying them that their ability to accept SNAP will be suspended unless they prove otherwise.

It is unclear whether a USDA employee reviews the flagged transactions before such a letter is sent.

Tilde (Squiggly line) icon

Flexibility

Can the ADS be changed easily if people have feedback?

Flexibility

Feedback System

The USDA's ADS is not very flexible because there is not a good feedback system from people who are directly affected by the ADS.

Ability and Access to Update

Although business owners have the ability to submit feedback if they believe they've been wrongly flagged, this feedback has to be in a very specific form. Moreover, there is no mechanism for businesses or others to tell the USDA how it can improve its system.

Each of these design decisions interacts with the others.

For example, the more explainable an ADS is (explainability) the easier it is it give feedback and be able to update it (flexibility). This framework isn't cut and dry!

There are also elements we hope interact but don’t always. A government might state a good purpose, but might not make design decisions that reach this goal. The ADS must always be intentionally designed to achieve a good purpose.

When we ask good questions about the purpose and the design decisions and how they interact, we can create ADSs that acknowledge or undo unjust systems.

We need to think about bigger systems when we develop the purpose and make design decisions.

4


Examples

Learn about some real ADSs being used in NYC.

New York Public Library uses an algorithm to stock books.

A library needs books, and librarians need to pick them. But how should they do that? Libraries have limited resources and they can't really test out new books by buying and returning them. How can librarians tell which books will be popular with their patrons? What else should they consider when they're picking titles for their library branch?

Libraries use ESP to guess what we want to read.

No, not extra-sensory perception.

ESP is an algorithm that uses library and non-library data to recommend books it thinks will be popular. Here's how the system works:

1. Data

ESP is fed circulation data (like which books have been checked out) and sales data from a book warehouse called Baker & Taylor. It uses book reviews, but only in aggregate (total number of reviews, not whether they were good or bad).

2. Predict

The ESP algorithm looks for historical patterns in the data and makes suggestions for what books to stock at which library branches. For example it might predict that the young adult book Twilight will be popular in a specific branch because that branch has lots of teenagers that checked out Dracula and Harry Potter.

3. Pick

Then, it's up to the librarian to choose whether or not to follow the algorithm's advice.

So, which factors matter when ESP rates books?

We don't actually know. ESP is a neural network algorithm, which is hard to understand. It's like a black box. We know which datasets ESP uses, but the patterns it's using to make recommendations are fuzzy.

To be fair, we don't always know exactly what librarians are thinking either when they make decisions to stock particular books. But if they're connected with the community they serve, we might trust that and can draw from their experience and knowledge.

But we get better book picks, right?

Not so fast.

As a neural network, ESP works best when there is a lot of data. For example, in established book genres and topics like knitting, it tends to pick the winners well. But it would have a harder time when guessing which debut fiction (e.g. an author’s first book) will be a hit with readers.

Even if we have a bunch of data from Amazon or other book sellers, the data might not be representative of many NYC communities, particularly minorities and low income individuals. These communities are some of the largest users of library services, and they might also purchase fewer books. This means that the recommendations ESP produces might not reflect what patrons want and may not be easy to change unless we get a whole lot more data.

Also, it depends on what you mean by "better book." What if your goal is to bring in people who aren't currently using the library? Existing library patron data wouldn't allow the algorithm to figure out preferences for people who don't already check out books.

Don't librarians still have final say?

Yup! When the libraian knows a lot about their local patrons and has goals to promote undoing unjust systems, preserving librarians' power to make decisions helps to address these possible problems with ESP. Still, librarians' choices may be swayed by what seems like a concrete, numerical score based on mathematical equations.

ESP augments a librarian's choices by highlighting gaps in a librarian's knowledge and experience. On the flip side, it could sway librarians away from trusting their own instincts on what their branch needs.

Exclamation mark icon

Impact

Who does the ADS affect and how?

Impact

Scale

Large scale. A lot of people use libraries.

Scope

Superficial. Books are great, but probably not as serious as other social services. It's not about life or death.

Vulnerability

This depends on how you look at it. People who need to use libraries might be more vulnerable because they don't have other means to knowledge.

Authors whose books don't have high sales or who don't have a lot of data about their work are more likely to be vulnerable people (new authors, authors that are underrepresented, topics that are underrepresented).

Forward slash icon

Bias

What inputs in the ADS create problems?

Bias

Data

ESP uses a lot of data which incorporates both library and private book sale data but is currently unable to take into account other forms of data, like good or bad book reviews.

The data doesn't account for human biased equity issues. Sales data may be skewed to higher-income individuals who can buy more books. Libraries should think about books that their unique populations need and value.

Algorithmic Model

ESP uses a neural network. While a human doesn't make a decision about the variables that matter, using such an opaque model might mean that it's harder to know what kinds of biases in the data are driving outcomes.

Question mark icon

Explainability

How easy is it for humans to understand how the ADS works?

Explainability

This ADS could be more transparent. The company that makes ESP has documentation online explaining it broadly, but the specifics of how it works and how it is used by NYPL are not publicly available.

ESP is a neural network, which is not very explainable. We can't distinguish which variables the algorithm is using and how important they end up being to the eventual results.

If we knew more about the process and components of the ADS, we might be able to mitigate mistakes like incorrect predictions about which books will be popular.

Asterisk icon

Automation

How much of the decision is made by the algorithm?

Automation

Medium

ESP is advisory. It gives a score, but the librarians can still use their own knowledge or priorities to decide how much the score guides their decisions on whether to buy a book, how many copies to buy, or where to place the book.

Tilde (Squiggly line) mark

Flexibility

Can the ADS be changed easily if people have feedback?

Flexibility

Feedback System

Librarians have been able to give feedback to the company that runs ESP.

Ability and Access to Update

ESP is owned by a private company that has to make any updates.

Machine learning risk, with human safeguards

At its best, ESP makes useful suggestions that librarians can incorporate into their book stocking decisions. Because it depends on sales data and current circulation data, there is a risk that the results don't consider populations that are excluded from these spaces. Librarians still get final say, which is a good way to address these concerns.

Read more about ESP from the company that designed it.

NYCHA manages public housing placements with automation.

The New York City Housing Authority (NYCHA) has over 175,000 public housing units across the City, but fewer than 1% are vacated each year, and the waitlist is currently over 250,000 families long. When a unit becomes vacant, how does NYCHA sort through the long waitlist of applicants to fill their vacant units as quickly as possible?

TSAP, ASAP.

The Tenant Selection and Assignment Plan (TSAP) is an ADS that does what it says in the name: select and assign tenants to apartments that open up.

It’s built to do this as soon as possible, to make sure units don’t go unfilled for too long in a city where many people need shelter. In fact, the process begins even before units are vacated, by predicting which units will go vacant in the next six months. This helps NYCHA make sure it has staff resources and a ready waitlist of eligible and certified New Yorkers to fill apartments when they open up.

Here's a very simplified description of the process.

1. Apply & Assign

New people apply to get put onto the NYCHA waitlist. People already in public housing units also apply to transfer to other units when there are openings.

These two groups of applicants are assigned “priority codes” based on their income, needs, and other priorities for the City (like having experienced domestic abuse, hate crimes, or involuntary displacement; or being a homeless veteran; or aging out of foster care) and they are put on a general waiting list.

2. Predict

Every two weeks, TSAP uses a computer program to predict the types of units that are most likely to be vacated in the next six months.

The algorithm first looks at how many eligible and certified transfer applicants can fill those units. Then, it figures out how many new applicants from the general waiting list might be eligible and should get interviewed to be certified.

3. Verify

NYCHA staff reach out to the list of new applicants to interview them. These interviews are to verify that applicants actually do meet the requirements for the apartment that they predict will become vacant (making sure the applicant's family size will fit in the number of bedrooms for the unit that's predicted to be vacant; if the unit is specifically for elderly people, then making sure the applicant is old enough for the unit's requirements; etc). If they pass, they’ll be “certified” and placed in the pool to fill units.

4. Match

Once a vacant unit actually opens up, the computer program looks for people who are certified and eligible. It prioritizes transfers with high-needs, then transfers with less urgent needs and new applicants.

Sometimes this matching process can get tricky because there might be only one unit, but multiple certified and elibile applicants with different high-needs. TSAP's matching algorithm has to work to figure out how to sort competing priority codes.

What kind of algorithm does TSAP use?

Actually, each step of TSAP uses a different algorithm with different levels of automation.

For example, the step for assigning priority codes involves a pretty simple decision tree that follows a series of yes or no questions about the person’s needs.

The step for predicting vacancies is more like a linear regression. The algorithm looks at past data and past vacancies and determines how important each variable is in ‘causing’ a vacancy. It then uses this formula to predict future vacancies on new data.

What if there’s still more people waitlisted than there are apartments?

Unfortunately, TSAP doesn’t solve for the underlying problem that there are more people on the waitlist than there are available apartments. TSAP can only help sort people who are on the waitlist into vacant units, it can't make more vacant units.

But, this data or the length of the waitlist could be used as information to advocate for more public housing units.

Exclamation mark icon

Impact

Who does the ADS affect and how?

Impact

Scale

Large scale. There's hundreds of thousands of people on the waiting list who want to be assigned a new apartment.

Scope

Deep. TSAP matches people who need affordable public housing to those very units.

Vulnerability

Vulnerable population. New and existing applicants for public housing have limited access to affordable housing.

Forward slash icon

Bias

What inputs in the ADS create problems?

Bias

Data

The data TSAP uses is based on both historical data and new data input by new applicants, transfer requests, or case workers. The algorithm makes a prediction about how many units will be vacated based on how many units of a certain type were vacated in the past. However, there might be other factors at play in the present day that the algorithm can’t account for. For example, maybe that building was renovated and fewer people want to leave now. This is called omitted variable bias.

Algorithmic Model

There are multiple algorithms in this ADS. It seems like NYCHA is trying to use the best algorithm for each decision type.

Exclamation mark icon

Explainability

How easy is it for humans to understand how the ADS works?

Explainability

TSAP is pretty transparent. NYCHA has documentation online on how TSAP works, but the description is 57 pages long!

The actual algorithms (decision trees and linear regressions) used by TSAP are pretty explainable.

The ADS as a whole can be confusing though.

Certain parts are easier to understand than others. For example, what the priority codes are and what the qualifications for these priority codes are is clearly laid out, but predicting vacant units and how that impacts eligibility interviews is more opaque.

Asterisk icon

Automation

How much of the decision is made by the algorithm?

Automation

High

TSAP is mostly automated. For example, vacancies and suggestions for eligibility interviews are predicted using a formula. The process for matching people to vacant apartments is also automatic.

Tilde (squiggly line) icon

Flexibility

Can the ADS be changed easily if people have feedback?

Flexibility

Feedback System

TSAP can be updated through the annual plan process, which includes public hearings and a public comment period. The plan gets submitted to the US Department of Housing and Urban Development for approval.

Ability and Access to Update

It’s very updateable. There’s a NYCHA team that works solely on TSAP implementation.

High priority needs get housing, but still not enough units.

TSAP does its job: it matches people to vacant units. However, it's doing its job with a limited supply of housing. Making the matching process easier will definitely help some families, but it still leaves a lot of families on the waitlist. Is there a way to have this ADS help get at the underlying unjust system of not having enough affordable housing?

Read more about TSAP in the official documentation from 2016

DOE wants to match students with the best high school for them.

Every year, more than 80,000 New York City eighth graders transition to high schools. There are nearly 700 high school programs in more than 400 schools across the City, all with different locations, eligibility requirements, and number of seats. How does the DOE make sure that students go to the best high school for them?

Matchmaker, matchmaker, make me a match.

The Department of Education (DOE) uses an algorithm that goes by many names: centralized clearinghouse, two-sided deferred acceptance matching, and applicant-proposing acceptance algorithm. All of these names refer to the same algorithm that just matches students to high schools.

Here's how it works:

1. Students Submit

A student will "apply" to schools online, through a school counselor, or at a Family Welcome Center. In their application, they rank up to 12 schools in order of preference.

2. Schools Submit

Simultaneously each year, schools submit their information, like how many seats they have and what criteria they have for how to rank or prioritize students that want to apply to their school.

For example, some schools called "screened schools" might say that they give priority to students with the highest 8th grade GPA first. Some schools called "zoned schools" might say that they give priority to students who live in their neighborhoods.

3. Match

Students' first choices are tentatively matched to schools that want them.

Unmatched students are paired with their next choice.

4. Repeat

The process stops when there are no available seats remaining in schools on students’ lists of preferences.

Students who are still unmatched get to resubmit a list of ranked schools from a list of schools that still have seats for a second round of the matching process.

Didn’t get matched?

Unfortunately the algorithm doesn’t match everyone. About 3,000 students find themselves without a school after the first round and have to go through a second round of matching.

It used to be worse. Before the DOE started using this matching algorithm, about 30,000 students weren’t getting matched with any school. In 2005, the DOE enlisted economists to help redesign it. They dropped the number of matchless students in part by having them rank twelve schools instead of just five.

So this is better, right?

Sort of.

The new algorithm helps to more efficiently “assign as many students as possible into programs that they rank highly, given constraints due to limited seats and the high schools' own admission priorities,” as the New York Independent Budget Office reported in 2016.

But the matching algorithm doesn’t challenge the underlying unjust system that maintains school segregation and the unequal distribution of educational resources. It doesn’t challenge the specific admissions priorities and preferences of schools. These priorities and preferences can perpetuate school segregation.

Also, the matching process doesn’t apply to more competitive or highly resourced schools, like charter schools or the eight specialized high schools where students qualify by taking the Specialized High School Admissions Test (SHSAT). And it definitely doesn’t apply to the one audition school in New York City (LaGuardia High School). There might be good reasons for excluding some of these schools from the system, but redesigning the ADS could have been an opportunity to think critically about these reasons.

Imagine if the ADS was designed to make schools more integrated!

Some advocates are doing just that.

They're thinking about ways that we could design the admissions priorities that schools use to rank students. They believe that if we design the priorities with the express purpose of integrating schools, we will create an algorithm that is better at undoing an unjust system.

How would you change school priorities with this purpose in mind?

Exclamation mark icon

Impact

Who does the ADS affect and how?

Impact

Scale

Medium. Every year more than 80,000 students are affected by this algorithm.

Scope

Deep. DOE's matching algorithm decides where students go to high school, which can have an impact on their lives well beyond high school.

Vulnerability

This ADS interacts with all students, including those from more vulnerable backgrounds. However, the ADS is more difficult to navigate for students from familes who have less resources to spend to understand the ADS. School priorities also tend to disadvantage students with less resources.

Forward slash icon

Bias

What inputs in the ADS create problems?

Bias

Data

This algorithm uses data from students and schools ranking their preferences. These datasets can be very biased. Student rankings are based on their own perceptions of their competitivity in the process, their parents' knowledge about which schools are best, their desire to stay with their friends. School rankings are based on their admissions priorities, which just track what the schools want to focus on. If they want to focus on good grades, they can do that. If they want to focus on students in the neighborhood, they can do that.

Algorithmic Model

The algorithm maximizes top mutual matches for students and schools. This algorithm is used in other matching situations, like matching medical students to residencies.

Question mark icon

Explainability

How easy is it for humans to understand how the ADS works?

Explainability

The matching algorithm is explainable. It follows a clear series of steps and is based on the preferences of students and schools.

The New York Times has covered the matching algorithm and how it works, making this ADS pretty transparent.

Also, the DOE has made the process transparent to families through the High School Directory, a document that guides students through the high school application and matching process.

Asterisk icon

Automation

How much of the decision is made by the algorithm?

Automation

High

Students are automatically matched to their schools through the matching algorithm. The results are offer letters, not recommendations that are checked by another party.

Tilde (squiggly line) icon

Flexibility

Can the ADS be changed easily if people have feedback?

Flexibility

Feedback System

Students can appeal their matches to staff in DOE, but only for certain reasons like travel, housing, or medical hardships.

Ability and Access to Update

The algorithm is maintained by the DOE Office of Student Enrollment. Hypothetically, staff from that office could update the algorithm.

More students are matched, but unequal education continues.

DOE's matching algorithm sorts through a lot of preferences to mutually match schools with students. However, it doesn’t challenge school admissions priorities that disadvantage certain groups or keep schools segregated.

You can read more about DOE's algorithm in the High School Directory.

5


Get Active

We've seen how automating decision systems can increase discrimination and inequality.

But this isn't inevitable.

We can decide when the City should or shouldn't automate decisions. We can ask the tough questions to make sure that government systems don't leave out those of us who have historically been, or are currently being ignored. We can bring together our diverse experiences to make our voices stronger.

Build Power

Connect with other members of this advocacy community.

Advocate

Contact your local elected officials to keep pushing the work of the ADS Task Force.

Learn More

Dig deeper into the articles, papers, and books we used to make the website.

6


About Us

We're four data and design nerds determined to make government services the best they can be. This website is our master's thesis for the Harvard Kennedy School.

Drawing of Aki, person with red glasses, round gold earrings, and dark hair in a ponytail

Aki

Project Management
+
Data Wrangler

       

Aki Younge is a futurist thinker and racial justice advocate who believes in empowering communities to use data and technology to dismantle systems of oppression.

Drawing of Dee, a person with light pink glasses and dark medium-length hair worn down

Deepra

Content Development
+
Research

       

Deepra Yusuf is a native New Yorker, former government analyst, and design thinker committed to finding ways governments and technology can form ethical and symbiotic relationships.

Drawing of Elyse, a person with squiggly black earrings, teal eyeliner, and short gold hair

Elyse

Design
+
Engineering

           

Elyse Voegeli is a data analyst and designer committed to making tech accountable, accessible, and delightful.

Drawing of Jon, a person with dark round glasses and long black hair worn down.

Jon

Design
+
Content Development

   

Jon Truong works at the intersection of digital tools and social change in urban settings.

Special thanks to our advisor, Julie Wilson, the Jain Family institute for their support, Emily Chu for her technical mentorship, and Momin Malik for his expertise. Most of our icons are custom, but thanks to Font Awesome and Feather Icons Library for the others. All other visuals are our own. And finally, shout out to all the amazing people who user tested for us! You know who you are.

Check us out on Github!     Feedback or questions? Send us an email!