| |
JosephHanFirstPaper 4 - 12 May 2024 - Main.JosephHan
|
|
META TOPICPARENT | name="FirstPaper" |
| |
> > | Machine Learning and Discrimination: A Hidden Crime
-- JosephHan - 12 May 2024
The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. Corporations have a large incentive to adopt such tools: it can decrease labor costs thus increasing profits. Examples of this include job application screening, loan/financing approvals, and rental applications. However, human relationships are complex and data can reveal far more than it first appears to. As more businesses begin to incorporate these tools, it is important to assess the full effects of those decisions. Deep learning trained AI models that assess human candidates have a high risk of unlawful discrimination in violation of the equal protection clause of the 14th Amendment.
Race is Easily Inferred
Although unlawful discrimination could happen regarding any of the protected characteristics in the 14th Amendment, we will analyze race due to its ease of inference and the importance of protecting people based on their race.
Zip codes can be used as a proxy for race. Zip codes are an effective representation of race and ethnicity information, particularly for white, black, and Latinx groups.
Similarly, names are also effective at predicting ethnicity. Studies have shown that models that analyze names by analyzing the sequencing of letters in the name can have very high accuracy. Additionally, census data regarding names can also be an accurate predictor.
Data may appear neutral, but they easily serve as a proxy for race. Race and social statistics are highly correlated and that can have a disparate discriminatory impact despite leaving out actual racial data.
Neural Networks and Machine Learning
A growing issue in the realm of algorithmic human selection is the increasing use of machine learning. In order to understand the harm that is occurring, it is essential to understand the technology being implemented.
A human user has two points of contact with any model: the input and the output. By feeding an AI model inputs, such as human candidates, the model will use its previous “experience” in solving similar problems to look at the input data and create an output, which in this example would be the humans selected.
Computations are completed through a neural network, which is designed to simulate the functioning of neurons in a human brain. Neural networks extract “features” from the input data, such as an applicant’s credit history, previous employment, income, name, gender, and race. Although these can be filtered out before feeding it to the model (such as gender and race), data points that are neutral on their face (such as zip codes) are often included. The model then gives various weights to these extracted features in a hidden layer. The points at which these intermediate calculations are done are called nodes. Using these intermediate calculations, the AI is able to come to an output: in this example, a rejection or approval. This process can be visualized below.
The weights and calculations are created through a process called “training”. During training, the model analyzes data it is given. A training set would be applicants and their data corresponding to a correct answer, such as whether an applicant should be approved or denied.
Deep Learning Hides the Operation of Neural Networks
Deep learning has become the new trend in AI development. ChatGPT is an example of a large language model that uses deep learning to train its AI. Deep learning is a subset of machine learning where there are at least two layers between the input and output layers.
There are two key distinctions: the model often learns “on its own” with little human intervention and deep learning tends to use many more intermediate layers in the neural network. Both of these features make it more difficult to deduce exactly which input features the model is considering and to what extent.
Intermediate layers complicate the calculations, and given that there are many more nodes in deep learning, any individual node will have a lesser effect on the final output. It is possible for the zip code data point to be a small factor in many nodes within the neural network. This means that a detailed look into the intricacies of the neural network may not show effects from zip codes, but has a large impact on the final output.
It is difficult to assess biases of deep learning models since there is no requirement for a human to give feedback to the model during the training. Human oversight is necessary to correct biases, yet a “feature” of this new technology is that this step is not required. The technological improvements of deep learning compared to machine learning increases the potential for discriminatory effects by impeding the ability to diagnose models and removing the necessity for human oversight.
Conclusion
The increasing use of deep learning models raises a serious concern regarding the unintended effects of such a model. Although it may use data that seems unbiased, the interconnected nature of our existing data is undeniable. Deep learning models are deeply complicated with little need or desire to really assess the intricacies of how it creates its outputs. In models that evaluate human applicants, this can result in discrimination based on protected classes, namely race. Strict regulations of these models is necessary in order to prevent this technology from infringing on the Constitutional rights of citizens. | | | |
< < | Headline text | | Previous Draft: Privacy and Discrimination: An Intertwined Violation
-- JosephHan - 01 Mar 2024 | | I think the best route is to reduce dependence on generalities. Most of the statements you make are undocumented, so the reader cannot pay closer attention to the details than you do, which reduces the value to any reader who wants to pay closer attention to the technical or policy groundwork than you have space or analytical attention for. Many of your apparent conclusions are either true or false in particular cases depending on factual context. "Black boxes" and "algorithmic transparency" are metaphors about software, not actual statements about how licensing of code, contractual or regulatory provisions concerning processing; or actual computer programs work. That's really not satisfactory: we don't use metaphors about metallurgy or aerodynamics to make air safety policy: we use actual engineering. Perhaps the best way to experience this in your own learning is to pick a single example of the phenomena you want to write about and dive all the way into it, to actually understand the technical and policy material all the way down to the details. You don't have to write at that level because that's the level to which you learned. But what you say will then be well-documented, and you will have the actual complexities before you, rather than the jarring of phrases.
\ No newline at end of file | |
> > |
META FILEATTACHMENT | attachment="Neural-Networks-Architecture.png" attr="h" comment="" date="1715545448" name="Neural-Networks-Architecture.png" path="Neural-Networks-Architecture.png" size="27310" stream="Neural-Networks-Architecture.png" user="Main.JosephHan" version="1" |
META FILEATTACHMENT | attachment="nn-ar.jpg" attr="" comment="" date="1715545742" name="nn-ar.jpg" path="nn-ar.jpg" size="14828" stream="nn-ar.jpg" user="Main.JosephHan" version="1" |
|
|
|
|
This site is powered by the TWiki collaboration platform. All material on this collaboration platform is the property of the contributing authors. All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
|
|
| |