Computers, Privacy & the Constitution

Privacy and Discrimination: An Intertwined Violation

-- JosephHan - 01 Mar 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals.

Wouldn't it be a good idea to provide at least some evidence that the results actually achieved are "desirable"? Does reducing employment count as a desirable result? What are "tons of time"? Could we describe this as "increasing the share capital takes from production at the expense of labor"?

However, these “objective” human models may be more dangerous than human bias as it is more likely to go undetected and there is less active correction for bias.

Wouldn't it be expected for the reader to object that neither of these claims is self-evidently true? That it is both easier to monitor automated processes and to modify software to respond to biased results more easily than it is to make the same measurements and corrections to social processes?

Human relationships are complex and data can reveal far more than it first appears to. The use of personal data when analyzing human subjects algorithmically results in unlawful discrimination, regardless of any efforts made to counteract those effects.

Data is All Connected

Human data is interconnected and always reveals more than on the surface. Overt discrimination is unlikely when human data is collected; a company is unwilling to make a component race. However, many “impersonal” data points can make race a large component regardless of the intent of the person designing a model. Take zip codes for an example; the use of zip codes in assessing human subjects may on its face not seem like an issue. But when one realizes that zip codes are highly reflective of race and income, it is clear that the use of zip codes will clearly lead to harmful effects on marginalized communities. If a high school student is denied a loan due to his zip code being “high risk”, that could deny education and diminishes socioeconomic mobility. Names are another example. It is impossible to submit a job or housing application without a name. Yet if the model assessing the quality of the candidates use their names, it provides many insights into gender and race. Even data that appears facially neutral will have ties to other human data that results in discrimination.

Why should we say that "the use of ZIP codes is harmful" without asking how the information about residence location is used? The same information that can be used to deny poor people services can be used to provide additional services to places that need them more. That, after all, is the policy proposition involved in the census. Surely it would make sense to talk not about how ZIP codes are used, but about how tract-level census data is used?

The Feedback Loop Problem

Feedback loops in models will reinforce existing societal biases. The poster child for this phenomenon is predictive policing. The most common method of predictive policing is place-based; it uses pre-existing crime data to find areas and times where crimes are more likely to happen.

Like most models, input data was needed to get the algorithm off the ground, especially in AI models. However, this initial data is already biased by the policing tendencies of human police officers beforehand. It is not a new idea that police have used race when deciding what areas to patrol. That isn’t where the bias stops. Areas that are found to have high incidences of crime will be policed further, and increased policing will inevitably find more instances of crime. This is the feedback loop that algorithms can create. However, police departments will use the “objectivity” of the algorithm to deny any discriminatory effects. In fact, they can maintain a facade of neutrality using their computer outputs as justification for their actions; discriminatory actions that the police would have had a more difficult time justifying without this model that supports their biases.

Similar effects exist in other human selection methods. Programs that screen candidates for job opportunities will use previously successful candidates as input for their models. That training data will be composed of candidates selected by people, and the long history of discrimination in employment is apparent. While it may seem that an objective algorithm would get rid of previous human bias, in reality the algorithm only perpetuates it. There is a reason that proactive diversity efforts were made in order to combat intrinsic biases. Yet the notion of having a machine make those same racial assessments scares the public, even though that may be the necessary solution. Feedback loops are an inherent trait of human selection models.

There is No Transparency

Black Box

The increasing use of AI results in "black-box" algorithms that cannot be fully understood, even to the engineers creating the algorithms. Machine learning abstracts personal information and creates a model that is used to create an output; this output is informed by a neural network which essentially breaks the input down into components, performs various calculations and weighs those components, then creates an output based off of what the model was designed to accomplish. In this system, the components in the intermediate step are usually unknown; the users only care about the inputs and outputs. This is a boon for both the parties using these programs and the ones selling them; since the intermediates are unknown they can shield themselves from liability as they did not know that their algorithm had discriminatory effects.

What Happened to Me?

Another compounding issue that arises from “black-box” algorithms is the lack of guidance. Those that are being evaluated cannot be sure of the reason that they were selected or rejected by a certain algorithm. In fact, the companies that want the evaluation cannot be sure either, but it decreases their work tremendously while still providing more than enough candidates for whatever purpose they need their algorithm to achieve. Revealing the algorithm may work against the interests of the person deploying it; people will find a way to “game the system” for their own optimal results.

Conclusion

Human data reveals far more than on the surface. Data points are interconnected and it is impossible to isolate “neutral” data points from influencing decisions that should not take into account protected classes. This problem is only exacerbated by the immense amount of data that companies are collecting on people, regardless of whether they are customers, users, and even non-users. Even though one may think that a computer is less biased than a human making assessments on other humans, one must realize that those computer algorithms have been built by people with biases, many of which are subconscious. Legislating greater protections for people and their data will have many beneficial effects beyond simply greater privacy for the people and ensuing 4th Amendment concerns.

I think the best route is to reduce dependence on generalities. Most of the statements you make are undocumented, so the reader cannot pay closer attention to the details than you do, which reduces the value to any reader who wants to pay closer attention to the technical or policy groundwork than you have space or analytical attention for. Many of your apparent conclusions are either true or false in particular cases depending on factual context. "Black boxes" and "algorithmic transparency" are metaphors about software, not actual statements about how licensing of code, contractual or regulatory provisions concerning processing; or actual computer programs work. That's really not satisfactory: we don't use metaphors about metallurgy or aerodynamics to make air safety policy: we use actual engineering. Perhaps the best way to experience this in your own learning is to pick a single example of the phenomena you want to write about and dive all the way into it, to actually understand the technical and policy material all the way down to the details. You don't have to write at that level because that's the level to which you learned. But what you say will then be well-documented, and you will have the actual complexities before you, rather than the jarring of phrases.

Navigation

Webs Webs

r2 - 21 Apr 2024 - 13:20:27 - EbenMoglen
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM