Computers, Privacy & the Constitution

View   r5  >  r4  >  r3  >  r2  >  r1
JosephHanFirstPaper 5 - 14 May 2024 - Main.JosephHan
Line: 1 to 1
 
META TOPICPARENT name="FirstPaper"

Machine Learning and Discrimination: A Hidden Crime

-- JosephHan - 12 May 2024
Line: 41 to 41
 

Conclusion

The increasing use of deep learning models raises a serious concern regarding the unintended effects of such a model. Although it may use data that seems unbiased, the interconnected nature of our existing data is undeniable. Deep learning models are deeply complicated with little need or desire to really assess the intricacies of how it creates its outputs. In models that evaluate human applicants, this can result in discrimination based on protected classes, namely race. Strict regulations of these models is necessary in order to prevent this technology from infringing on the Constitutional rights of citizens.
Deleted:
<
<

Previous Draft: Privacy and Discrimination: An Intertwined Violation

-- JosephHan - 01 Mar 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals.

Wouldn't it be a good idea to provide at least some evidence that the results actually achieved are "desirable"? Does reducing employment count as a desirable result? What are "tons of time"? Could we describe this as "increasing the share capital takes from production at the expense of labor"?

However, these “objective” human models may be more dangerous than human bias as it is more likely to go undetected and there is less active correction for bias.

Wouldn't it be expected for the reader to object that neither of these claims is self-evidently true? That it is both easier to monitor automated processes and to modify software to respond to biased results more easily than it is to make the same measurements and corrections to social processes?

Human relationships are complex and data can reveal far more than it first appears to. The use of personal data when analyzing human subjects algorithmically results in unlawful discrimination, regardless of any efforts made to counteract those effects.

Data is All Connected

Human data is interconnected and always reveals more than on the surface. Overt discrimination is unlikely when human data is collected; a company is unwilling to make a component race. However, many “impersonal” data points can make race a large component regardless of the intent of the person designing a model. Take zip codes for an example; the use of zip codes in assessing human subjects may on its face not seem like an issue. But when one realizes that zip codes are highly reflective of race and income, it is clear that the use of zip codes will clearly lead to harmful effects on marginalized communities. If a high school student is denied a loan due to his zip code being “high risk”, that could deny education and diminishes socioeconomic mobility. Names are another example. It is impossible to submit a job or housing application without a name. Yet if the model assessing the quality of the candidates use their names, it provides many insights into gender and race. Even data that appears facially neutral will have ties to other human data that results in discrimination.

Why should we say that "the use of ZIP codes is harmful" without asking how the information about residence location is used? The same information that can be used to deny poor people services can be used to provide additional services to places that need them more. That, after all, is the policy proposition involved in the census. Surely it would make sense to talk not about how ZIP codes are used, but about how tract-level census data is used?

The Feedback Loop Problem

Feedback loops in models will reinforce existing societal biases. The poster child for this phenomenon is predictive policing. The most common method of predictive policing is place-based; it uses pre-existing crime data to find areas and times where crimes are more likely to happen.

Like most models, input data was needed to get the algorithm off the ground, especially in AI models. However, this initial data is already biased by the policing tendencies of human police officers beforehand. It is not a new idea that police have used race when deciding what areas to patrol. That isn’t where the bias stops. Areas that are found to have high incidences of crime will be policed further, and increased policing will inevitably find more instances of crime. This is the feedback loop that algorithms can create. However, police departments will use the “objectivity” of the algorithm to deny any discriminatory effects. In fact, they can maintain a facade of neutrality using their computer outputs as justification for their actions; discriminatory actions that the police would have had a more difficult time justifying without this model that supports their biases.

Similar effects exist in other human selection methods. Programs that screen candidates for job opportunities will use previously successful candidates as input for their models. That training data will be composed of candidates selected by people, and the long history of discrimination in employment is apparent. While it may seem that an objective algorithm would get rid of previous human bias, in reality the algorithm only perpetuates it. There is a reason that proactive diversity efforts were made in order to combat intrinsic biases. Yet the notion of having a machine make those same racial assessments scares the public, even though that may be the necessary solution. Feedback loops are an inherent trait of human selection models.

There is No Transparency

Black Box

The increasing use of AI results in "black-box" algorithms that cannot be fully understood, even to the engineers creating the algorithms. Machine learning abstracts personal information and creates a model that is used to create an output; this output is informed by a neural network which essentially breaks the input down into components, performs various calculations and weighs those components, then creates an output based off of what the model was designed to accomplish. In this system, the components in the intermediate step are usually unknown; the users only care about the inputs and outputs. This is a boon for both the parties using these programs and the ones selling them; since the intermediates are unknown they can shield themselves from liability as they did not know that their algorithm had discriminatory effects.

What Happened to Me?

Another compounding issue that arises from “black-box” algorithms is the lack of guidance. Those that are being evaluated cannot be sure of the reason that they were selected or rejected by a certain algorithm. In fact, the companies that want the evaluation cannot be sure either, but it decreases their work tremendously while still providing more than enough candidates for whatever purpose they need their algorithm to achieve. Revealing the algorithm may work against the interests of the person deploying it; people will find a way to “game the system” for their own optimal results.

Conclusion

Human data reveals far more than on the surface. Data points are interconnected and it is impossible to isolate “neutral” data points from influencing decisions that should not take into account protected classes. This problem is only exacerbated by the immense amount of data that companies are collecting on people, regardless of whether they are customers, users, and even non-users. Even though one may think that a computer is less biased than a human making assessments on other humans, one must realize that those computer algorithms have been built by people with biases, many of which are subconscious. Legislating greater protections for people and their data will have many beneficial effects beyond simply greater privacy for the people and ensuing 4th Amendment concerns.

I think the best route is to reduce dependence on generalities. Most of the statements you make are undocumented, so the reader cannot pay closer attention to the details than you do, which reduces the value to any reader who wants to pay closer attention to the technical or policy groundwork than you have space or analytical attention for. Many of your apparent conclusions are either true or false in particular cases depending on factual context. "Black boxes" and "algorithmic transparency" are metaphors about software, not actual statements about how licensing of code, contractual or regulatory provisions concerning processing; or actual computer programs work. That's really not satisfactory: we don't use metaphors about metallurgy or aerodynamics to make air safety policy: we use actual engineering. Perhaps the best way to experience this in your own learning is to pick a single example of the phenomena you want to write about and dive all the way into it, to actually understand the technical and policy material all the way down to the details. You don't have to write at that level because that's the level to which you learned. But what you say will then be well-documented, and you will have the actual complexities before you, rather than the jarring of phrases.

 
META FILEATTACHMENT attachment="Neural-Networks-Architecture.png" attr="h" comment="" date="1715545448" name="Neural-Networks-Architecture.png" path="Neural-Networks-Architecture.png" size="27310" stream="Neural-Networks-Architecture.png" user="Main.JosephHan" version="1"
META FILEATTACHMENT attachment="nn-ar.jpg" attr="" comment="" date="1715545742" name="nn-ar.jpg" path="nn-ar.jpg" size="14828" stream="nn-ar.jpg" user="Main.JosephHan" version="1"

JosephHanFirstPaper 4 - 12 May 2024 - Main.JosephHan
Line: 1 to 1
 
META TOPICPARENT name="FirstPaper"
Added:
>
>

Machine Learning and Discrimination: A Hidden Crime

-- JosephHan - 12 May 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. Corporations have a large incentive to adopt such tools: it can decrease labor costs thus increasing profits. Examples of this include job application screening, loan/financing approvals, and rental applications. However, human relationships are complex and data can reveal far more than it first appears to. As more businesses begin to incorporate these tools, it is important to assess the full effects of those decisions. Deep learning trained AI models that assess human candidates have a high risk of unlawful discrimination in violation of the equal protection clause of the 14th Amendment.

Race is Easily Inferred

Although unlawful discrimination could happen regarding any of the protected characteristics in the 14th Amendment, we will analyze race due to its ease of inference and the importance of protecting people based on their race.

Zip codes can be used as a proxy for race. Zip codes are an effective representation of race and ethnicity information, particularly for white, black, and Latinx groups.

Similarly, names are also effective at predicting ethnicity. Studies have shown that models that analyze names by analyzing the sequencing of letters in the name can have very high accuracy. Additionally, census data regarding names can also be an accurate predictor.

Data may appear neutral, but they easily serve as a proxy for race. Race and social statistics are highly correlated and that can have a disparate discriminatory impact despite leaving out actual racial data.

Neural Networks and Machine Learning

A growing issue in the realm of algorithmic human selection is the increasing use of machine learning. In order to understand the harm that is occurring, it is essential to understand the technology being implemented.

A human user has two points of contact with any model: the input and the output. By feeding an AI model inputs, such as human candidates, the model will use its previous “experience” in solving similar problems to look at the input data and create an output, which in this example would be the humans selected.

Computations are completed through a neural network, which is designed to simulate the functioning of neurons in a human brain. Neural networks extract “features” from the input data, such as an applicant’s credit history, previous employment, income, name, gender, and race. Although these can be filtered out before feeding it to the model (such as gender and race), data points that are neutral on their face (such as zip codes) are often included. The model then gives various weights to these extracted features in a hidden layer. The points at which these intermediate calculations are done are called nodes. Using these intermediate calculations, the AI is able to come to an output: in this example, a rejection or approval. This process can be visualized below.

nn-ar.jpg

The weights and calculations are created through a process called “training”. During training, the model analyzes data it is given. A training set would be applicants and their data corresponding to a correct answer, such as whether an applicant should be approved or denied.

Deep Learning Hides the Operation of Neural Networks

Deep learning has become the new trend in AI development. ChatGPT is an example of a large language model that uses deep learning to train its AI. Deep learning is a subset of machine learning where there are at least two layers between the input and output layers.

There are two key distinctions: the model often learns “on its own” with little human intervention and deep learning tends to use many more intermediate layers in the neural network. Both of these features make it more difficult to deduce exactly which input features the model is considering and to what extent.

Intermediate layers complicate the calculations, and given that there are many more nodes in deep learning, any individual node will have a lesser effect on the final output. It is possible for the zip code data point to be a small factor in many nodes within the neural network. This means that a detailed look into the intricacies of the neural network may not show effects from zip codes, but has a large impact on the final output.

It is difficult to assess biases of deep learning models since there is no requirement for a human to give feedback to the model during the training. Human oversight is necessary to correct biases, yet a “feature” of this new technology is that this step is not required. The technological improvements of deep learning compared to machine learning increases the potential for discriminatory effects by impeding the ability to diagnose models and removing the necessity for human oversight.

Conclusion

The increasing use of deep learning models raises a serious concern regarding the unintended effects of such a model. Although it may use data that seems unbiased, the interconnected nature of our existing data is undeniable. Deep learning models are deeply complicated with little need or desire to really assess the intricacies of how it creates its outputs. In models that evaluate human applicants, this can result in discrimination based on protected classes, namely race. Strict regulations of these models is necessary in order to prevent this technology from infringing on the Constitutional rights of citizens.
 
Deleted:
<
<

Headline text

 

Previous Draft: Privacy and Discrimination: An Intertwined Violation

-- JosephHan - 01 Mar 2024
Line: 57 to 96
  I think the best route is to reduce dependence on generalities. Most of the statements you make are undocumented, so the reader cannot pay closer attention to the details than you do, which reduces the value to any reader who wants to pay closer attention to the technical or policy groundwork than you have space or analytical attention for. Many of your apparent conclusions are either true or false in particular cases depending on factual context. "Black boxes" and "algorithmic transparency" are metaphors about software, not actual statements about how licensing of code, contractual or regulatory provisions concerning processing; or actual computer programs work. That's really not satisfactory: we don't use metaphors about metallurgy or aerodynamics to make air safety policy: we use actual engineering. Perhaps the best way to experience this in your own learning is to pick a single example of the phenomena you want to write about and dive all the way into it, to actually understand the technical and policy material all the way down to the details. You don't have to write at that level because that's the level to which you learned. But what you say will then be well-documented, and you will have the actual complexities before you, rather than the jarring of phrases.

\ No newline at end of file

Added:
>
>
META FILEATTACHMENT attachment="Neural-Networks-Architecture.png" attr="h" comment="" date="1715545448" name="Neural-Networks-Architecture.png" path="Neural-Networks-Architecture.png" size="27310" stream="Neural-Networks-Architecture.png" user="Main.JosephHan" version="1"
META FILEATTACHMENT attachment="nn-ar.jpg" attr="" comment="" date="1715545742" name="nn-ar.jpg" path="nn-ar.jpg" size="14828" stream="nn-ar.jpg" user="Main.JosephHan" version="1"

JosephHanFirstPaper 3 - 12 May 2024 - Main.JosephHan
Line: 1 to 1
 
META TOPICPARENT name="FirstPaper"
Changed:
<
<

Privacy and Discrimination: An Intertwined Violation

>
>

Headline text

Previous Draft: Privacy and Discrimination: An Intertwined Violation

 -- JosephHan - 01 Mar 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals.


JosephHanFirstPaper 2 - 21 Apr 2024 - Main.EbenMoglen
Line: 1 to 1
 
META TOPICPARENT name="FirstPaper"

Privacy and Discrimination: An Intertwined Violation

-- JosephHan - 01 Mar 2024
Changed:
<
<
The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals. However, these “objective” human models may be more dangerous than human bias as it is more likely to go undetected and there is less active correction for bias. Human relationships are complex and data can reveal far more than it first appears to. The use of personal data when analyzing human subjects algorithmically results in unlawful discrimination, regardless of any efforts made to counteract those effects.
>
>
The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals.

Wouldn't it be a good idea to provide at least some evidence that the results actually achieved are "desirable"? Does reducing employment count as a desirable result? What are "tons of time"? Could we describe this as "increasing the share capital takes from production at the expense of labor"?

However, these “objective” human models may be more dangerous than human bias as it is more likely to go undetected and there is less active correction for bias.

Wouldn't it be expected for the reader to object that neither of these claims is self-evidently true? That it is both easier to monitor automated processes and to modify software to respond to biased results more easily than it is to make the same measurements and corrections to social processes?

Human relationships are complex and data can reveal far more than it first appears to. The use of personal data when analyzing human subjects algorithmically results in unlawful discrimination, regardless of any efforts made to counteract those effects.

 

Data is All Connected

Human data is interconnected and always reveals more than on the surface. Overt discrimination is unlikely when human data is collected; a company is unwilling to make a component race. However, many “impersonal” data points can make race a large component regardless of the intent of the person designing a model. Take zip codes for an example; the use of zip codes in assessing human subjects may on its face not seem like an issue. But when one realizes that zip codes are highly reflective of race and income, it is clear that the use of zip codes will clearly lead to harmful effects on marginalized communities. If a high school student is denied a loan due to his zip code being “high risk”, that could deny education and diminishes socioeconomic mobility. Names are another example. It is impossible to submit a job or housing application without a name. Yet if the model assessing the quality of the candidates use their names, it provides many insights into gender and race. Even data that appears facially neutral will have ties to other human data that results in discrimination.

Added:
>
>
Why should we say that "the use of ZIP codes is harmful" without asking how the information about residence location is used? The same information that can be used to deny poor people services can be used to provide additional services to places that need them more. That, after all, is the policy proposition involved in the census. Surely it would make sense to talk not about how ZIP codes are used, but about how tract-level census data is used?

 

The Feedback Loop Problem

Line: 32 to 50
 

Conclusion

Human data reveals far more than on the surface. Data points are interconnected and it is impossible to isolate “neutral” data points from influencing decisions that should not take into account protected classes. This problem is only exacerbated by the immense amount of data that companies are collecting on people, regardless of whether they are customers, users, and even non-users. Even though one may think that a computer is less biased than a human making assessments on other humans, one must realize that those computer algorithms have been built by people with biases, many of which are subconscious. Legislating greater protections for people and their data will have many beneficial effects beyond simply greater privacy for the people and ensuing 4th Amendment concerns.
Added:
>
>
I think the best route is to reduce dependence on generalities. Most of the statements you make are undocumented, so the reader cannot pay closer attention to the details than you do, which reduces the value to any reader who wants to pay closer attention to the technical or policy groundwork than you have space or analytical attention for. Many of your apparent conclusions are either true or false in particular cases depending on factual context. "Black boxes" and "algorithmic transparency" are metaphors about software, not actual statements about how licensing of code, contractual or regulatory provisions concerning processing; or actual computer programs work. That's really not satisfactory: we don't use metaphors about metallurgy or aerodynamics to make air safety policy: we use actual engineering. Perhaps the best way to experience this in your own learning is to pick a single example of the phenomena you want to write about and dive all the way into it, to actually understand the technical and policy material all the way down to the details. You don't have to write at that level because that's the level to which you learned. But what you say will then be well-documented, and you will have the actual complexities before you, rather than the jarring of phrases.

 \ No newline at end of file

JosephHanFirstPaper 1 - 01 Mar 2024 - Main.JosephHan
Line: 1 to 1
Added:
>
>
META TOPICPARENT name="FirstPaper"

Privacy and Discrimination: An Intertwined Violation

-- JosephHan - 01 Mar 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. This saves tons of time and money for companies, while still achieving desirable results. Examples of this include predictive policing, job application screening, and loan/financing approvals. However, these “objective” human models may be more dangerous than human bias as it is more likely to go undetected and there is less active correction for bias. Human relationships are complex and data can reveal far more than it first appears to. The use of personal data when analyzing human subjects algorithmically results in unlawful discrimination, regardless of any efforts made to counteract those effects.

Data is All Connected

Human data is interconnected and always reveals more than on the surface. Overt discrimination is unlikely when human data is collected; a company is unwilling to make a component race. However, many “impersonal” data points can make race a large component regardless of the intent of the person designing a model. Take zip codes for an example; the use of zip codes in assessing human subjects may on its face not seem like an issue. But when one realizes that zip codes are highly reflective of race and income, it is clear that the use of zip codes will clearly lead to harmful effects on marginalized communities. If a high school student is denied a loan due to his zip code being “high risk”, that could deny education and diminishes socioeconomic mobility. Names are another example. It is impossible to submit a job or housing application without a name. Yet if the model assessing the quality of the candidates use their names, it provides many insights into gender and race. Even data that appears facially neutral will have ties to other human data that results in discrimination.

The Feedback Loop Problem

Feedback loops in models will reinforce existing societal biases. The poster child for this phenomenon is predictive policing. The most common method of predictive policing is place-based; it uses pre-existing crime data to find areas and times where crimes are more likely to happen.

Like most models, input data was needed to get the algorithm off the ground, especially in AI models. However, this initial data is already biased by the policing tendencies of human police officers beforehand. It is not a new idea that police have used race when deciding what areas to patrol. That isn’t where the bias stops. Areas that are found to have high incidences of crime will be policed further, and increased policing will inevitably find more instances of crime. This is the feedback loop that algorithms can create. However, police departments will use the “objectivity” of the algorithm to deny any discriminatory effects. In fact, they can maintain a facade of neutrality using their computer outputs as justification for their actions; discriminatory actions that the police would have had a more difficult time justifying without this model that supports their biases.

Similar effects exist in other human selection methods. Programs that screen candidates for job opportunities will use previously successful candidates as input for their models. That training data will be composed of candidates selected by people, and the long history of discrimination in employment is apparent. While it may seem that an objective algorithm would get rid of previous human bias, in reality the algorithm only perpetuates it. There is a reason that proactive diversity efforts were made in order to combat intrinsic biases. Yet the notion of having a machine make those same racial assessments scares the public, even though that may be the necessary solution. Feedback loops are an inherent trait of human selection models.

There is No Transparency

Black Box

The increasing use of AI results in "black-box" algorithms that cannot be fully understood, even to the engineers creating the algorithms. Machine learning abstracts personal information and creates a model that is used to create an output; this output is informed by a neural network which essentially breaks the input down into components, performs various calculations and weighs those components, then creates an output based off of what the model was designed to accomplish. In this system, the components in the intermediate step are usually unknown; the users only care about the inputs and outputs. This is a boon for both the parties using these programs and the ones selling them; since the intermediates are unknown they can shield themselves from liability as they did not know that their algorithm had discriminatory effects.

What Happened to Me?

Another compounding issue that arises from “black-box” algorithms is the lack of guidance. Those that are being evaluated cannot be sure of the reason that they were selected or rejected by a certain algorithm. In fact, the companies that want the evaluation cannot be sure either, but it decreases their work tremendously while still providing more than enough candidates for whatever purpose they need their algorithm to achieve. Revealing the algorithm may work against the interests of the person deploying it; people will find a way to “game the system” for their own optimal results.

Conclusion

Human data reveals far more than on the surface. Data points are interconnected and it is impossible to isolate “neutral” data points from influencing decisions that should not take into account protected classes. This problem is only exacerbated by the immense amount of data that companies are collecting on people, regardless of whether they are customers, users, and even non-users. Even though one may think that a computer is less biased than a human making assessments on other humans, one must realize that those computer algorithms have been built by people with biases, many of which are subconscious. Legislating greater protections for people and their data will have many beneficial effects beyond simply greater privacy for the people and ensuing 4th Amendment concerns.

Revision 5r5 - 14 May 2024 - 03:21:58 - JosephHan
Revision 4r4 - 12 May 2024 - 20:30:10 - JosephHan
Revision 3r3 - 12 May 2024 - 04:44:47 - JosephHan
Revision 2r2 - 21 Apr 2024 - 13:20:27 - EbenMoglen
Revision 1r1 - 01 Mar 2024 - 05:25:51 - JosephHan
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM