video-metadata

AI is a Bias Enhancing Machine

Watch on YouTube or below:

Description

PayPal: https://paypal.me/Guard13007
Blog: https://blog.tangentfox.com/
My Patreon (probably doesn’t work): https://www.patreon.com/guard13007

Links are kept outside of my YouTube descriptions after YouTube threatened me over linking to the New York Times:

I lost where I got the claim that someone can work a couple days and only make $2 in the global south. That’s why I said 100 people for the cost of 1 person in the USA. I wanted to give a more specific source for the claim that testament appears more frequently in LLM output but could not find a good single source. This claim is based on how English came to those regions from Christian missionaries during mass colonialism, leading to the most prominent English text in their past being the bible, skewing how English is used in those places. Now that modern colonialism is treating them as disposable workers a la gig workers, datasets contain a higher frequency of words and phrasing common to the global south.
Nature: AI generates covertly racist decisions about people based on their dialect
YouTube: You are a better writer than AI. (Yes, you.) by josh (with parenthesis)
YouTube: Has AI Killed Poetry? by Roughest Drafts and AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably (it is important to note that this only compares a general audience’s knowledge, and has too few participants to be a good quality study)
GPT-4 Passes the Bar Exam notably includes the phrase “Using a percentile chart from a recent exam administration (which is generally available online)” to obscure the fact that they were comparing to repeat test-takers - in other words, people who failed the MBE before - and that these results were barely a passing grade.
Bloomberg: Humans Are Biased. Generative AI Is Even Worse
Unmasking AI by Joy Buolamwini (I also used a clip of this TED talk by Joy.)
Oxford: Artificial Intelligence: A Very Short Introduction by Margaret A. Boden
YouTube: AI Is Not Designed for You by No Boilerplate
NASA Technical Memorandum: Neural Networks for Calibration Tomography and NIH: Quantification and Segmentation of Brain Tissues from MR Images: A Probabilistic Neural Network Approach (just referencing AI from the 1990’s as an example of how long it’s been around)
I didn’t bother to cite heuristics or their use in control systems because this is such a basic fact I didn’t think it important to give specific links to.
PEW Research Center: 60% of Americans Would Be Uncomfortable With Provider Relying on AI in Their Own Health Care
arXiv: Problems and shortcuts in deep learning for screening mammography
https://community.the-hospitalist.org/sections/fundermentals?page=2 (this link might not load, in which click here, scroll down, and click next page (and then do it again))
Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification (a university slideshow presenting findings from several studies); also (De)Constructing Bias on Skin Lesion Datasets is a neat example of spurious correlations; also this
When AI flags the ruler, not the tumor – and other arguments for abolishing the black box
BOINC: donate computing power to scientific research
When I say that model being fed all information about a person includes inherently sexist and racist information, I’m referring to the fact that racist and sexist policies affect demographic information, and thus race and sex are included as a factor that influences outcomes. Any data set that contains information correlated with race or sex has racist or sexist connotations because of how people have been treated in the past - even if you pretend bigotry doesn’t still exist.
The rest of what I’m talking about here is based on my knowledge in general, and how that applies to enhancements claimed by a few articles. I wanted a good overview of what is popularly presented as the best advancements in AI healthcare, so that I could present my perspective on them. As a result, I don’t have specific sources and claims, but four articles I glanced at to make my own statements about: Science News Today 10 Best Examples of AI in Healthcare, Inferscience 10 Artificial Intelligence Examples in Medicine Transforming Patient Care, Docus 10 Examples of AI in Healthcare: Diagnostics to Treatment, Philips 10 real-world examples of AI in healthcare. These lists are propaganda.
YouTube: The Myth of Mental Illness by Sisyphus 55

The following are articles and studies I looked at while working on this video, but did not use for the video:

Stanford: Covert Racism in AI: How Language Models Are Reinforcing Outdated Stereotypes
TechTarget: The impact of AI on social media
Springer Nature Link: The poverty of ethical AI: impact sourcing and AI supply chains

Since I spent a lot of this talking about medicine, and Dr. Rohin Francis posted a short about AI in medicine around the same time this published talking about how diagnosis is a very small part of the job of being a doctor, and so proposals in automating diagnosis don’t do very much.

Script

I have a note about why AI can’t write good fiction. The core is that AI is a “most popular” text machine. By their core design, LLMs must only output the most trite text possible. This is an unshakable problem with AI, but it also sucks at writing because of endless forced positivity, and never knowing when to stop. There’s a lot more to this, but that’s not the point I’m here for today. I’m here to talk about bias.

Why do LLMs constantly use the word testament? Because their training data was created as cheaply as possible. You can hire 100 people in a third world country for the same price as a single person in the USA, and not everyone speaks English the same way. Testament is a more common word where many AI training sets came from. While that bias is mostly harmless, it becomes much more harmful when the datasets encode racism. As an example, African American English is a dialect of English spoken by, you guessed it, black people in the USA. LLMs consider black people lesser than white people because of centuries of racism embedded in the majority of English text produced, which is the majority of text LLMs are trained on. (Cut because I forgot the order I was writing this in: This also applies to image generators, which struggle to show a black CEO or a white man as a housekeeper, and primarily show white men in general.)

You are an average normal person in most ways. You are probably not a writer. AI writing can superficially appear better than your writing when you have less experience. LLMs are very good at seeming credible and authoritative, but that is only an appearance, and they don’t learn. You are a better writer than AI. And you should watch this video, because it is very very good.

One thing I keep coming back to is not to just take the title of a study or what someone is saying about it as truth. This is another very good video about a misguided study attempting to prove LLMs can out-poetry humans. Every time a study claims AI supremacy in a topic that requires creativity, the study is woefully flawed. In the case of this poetry study, they compared an average person’s responses to human poetry and AI-written poetry. LLMs are good at saying the most common thing, and this tricks non-experts easily. Part of why I highly recommend watching this video is because experts are interviewed to show exactly how and why the results of the study don’t mean what they appear to at first glance.

This is kind of the point though. In the view of profit-seeking, all that matters is an average person paying for it. This is why CEOs think they can replace people with AI. Anywhere you can be tricked makes them more money because paying people is expensive. Also, if you remember the headlines about AI surpassing humans on the BAR exam, ~~I just wanted to highlight that they used AI-generated responses to questions, and only compared AI against new law students~~ (insert 2). LLMs can demonstrate more knowledge on a topic than the average person, but this isn’t (general) intelligence.

Insert 2:

they were comparing against law students who’d previously failed the MBE.

It’s all about popularity. A model must discriminate. That is the function of an LLM. A larger sample will always be treated preferentially to a smaller sample. No matter how much data you feed it, the majority is always favored, and the division separating minorities is always increased. No clearer case of this exists than that of image generators. The example here is again professions, comparing images generated of a person with a job title. Not a single image of a housekeeper showed a white man, and black CEOs were nearly as unfathomable.

The worst of this happens when someone is a member of multiple minorities. The most horrific example I know of is self-driving systems running over black women because they don’t see them. I learned of that example from Unmasking AI by Dr. Joy - I don’t know how to pronounc- Joy! Her book is the most thorough book on AI bias I’ve read, and I haven’t even finished it. And since we’re mentioning books, the 2018 edition of Oxford’s A Very Short Introduction book on AI is the best historical overview of AI I know of.

Sometimes bias can be positive. I think censorship is always bad, even though infohazards - (==on-screen==: information that it is inherently harmful to know about) - do actually exist. But humans are nothing if not inconsistent, and this inconsistency makes censoring LLMs difficult. Another video I enjoy, that looks upon AI much more favorably, and is much shorter, talks about how we don’t call successful AI AI, and how AI fails at complex tasks because it cannot have the training data to support niche things.

I think it’s important to acknowledge how AI has existed in various forms for a long time. It’s only recent increases in available computation and a focus on brute-force mass-data approaches that made it come into popular attention. And a forced effort by investors to prop up a technology that isn’t as good as they want it to be. Here, you can see papers about using neural nets, the core of AI, in tomography (==On-screen==: This is the same technology as CT scans.) and magnetic resonance imaging, from the 90’s. One of the simplest forms of AI is the heuristic, which is implemented in cruise control on cars, autopilot in aircraft, and rocket control systems.

But I want to focus on medicine to close this video, because medicine is the area of AI that has the most potential benefits. (Don’t hate the AI, hate the profit-driven companies who want to use AI in ways destructive to humanity.) I like this image from the PEW research center, because every statement represented here is correct, but also missing details. AI will make medicine better, AI will make medicine worse, AI is inherently unbiased, and AI reflects any human bias or bias present in the data used to train it. People viewing AI medicine favorably either ignore or are unaware of just how bad bias in medicine is. People against AI in medicine are unaware of the parts of medicine that lack bias. For those purposes, the use of AI already has improved lives and saved lives, globally.

Now, I need you to listen very carefully about what I’m saying with these categories. This is my opinion. My opinion is informed by understanding how it works and learning about its successes and failures (like, identifying cancer, something AI has done extremely well at, and extremely bad at, depending on what data was fed to it), but no one person can fully understand all of this, and I certainly have not spent as much time learning specifics as others have. The advantage of my perspective is that it is very broad. I like to learn about anything and everything I can, so I easily make connections between things that may not be recognized by someone with a very narrow focus. The disadvantage is that my perspective is necessarily shallower than an expert on any one detail. I can give you an overview that is mostly correct, but it is not perfect, and things change.

It is important to understand that this broad perspective will miss successes and failures on all aspects of these lists. Hell, that’s the reason I even added a third list, because there are some aspects of AI’s use in medicine that I know so little about that I cannot confidently say whether they will be positive or negative.

It’s very important you understand that while my opinion is strongly informed, I certainly am not an expert on medicine, nor is it possible to have read all information about these topics. The advantage of my perspective is that I like to learn anything about everything, which leads to making connections that can be missed by someone with a narrow focus. The disadvantage is that my view is necessarily shallower on details. My overview will be mostly correct, but things change, and I will make mistakes.

Add an ==on-screen== bit about how much I had to correct this script because of mistakes.

I’m going to first highlight imaging, because the most obvious failures of AI applied to medicine come from imaging, but imaging also has the most obvious potential and its failures are already well-understood. The failures I want to highlight are in cancer screening, ~~where images from an older machine were always flagged as cancer, because the only data kept from those older machines were scans that found cancer~~ (insert 1); and skin cancer screening, where a model was accidentally trained to recognize rulers instead of skin cancer, because only images of skin cancer included a ruler used to measure the size of the tumor. I didn’t hear about a racial bias in skin cancer, but considering how biased medicine already is against race, I’m sure that bias exists as well. This is why I highlight internal imaging as an improvement. We all look the same internally. Scans that do not show skin are just data of a person who is healthy or unhealthy. There is little potential for bias to cause problems, and what bias does exist, we are taking steps to eliminate.

Insert 1:

-where the model would treat different types of mammogram scanner as more likely to have cancer- (on-screen: more info in notes linked to from description)

Editing Tangent here. The actual study here was about removing biases from a model. The model would treat a different type of mammogram scanner as more likely to have cancer. It wasn’t about which images were kept as I’d thought, just the machine different. They they were able to remove most of the errors in the model, but it still showed a bias towards one type of scanner, which actually was the newer scanner, not the older scanner.

Drug discovery starts with analyzing molecules to determine what structures can have a desirable effect on human biology. Again, we are all the same on the inside, so using AI to find molecular structures that may be useful works for everyone, and is only the first step in a long process of testing to make sure that a drug actually does what we want. This is a speed-up process. An AI can develop drugs to test faster than a human can, and because we have models for simulating interactions as well, the candidate drugs can also be tested in simulation effectively before being created for real. If you’ve ever heard of BOINC - software for donating computing resources to scientific advancement that’s existed since 2002 - a significant portion of that effort was and is used for this purpose.

Surgery is really difficult because unlike the models used in education, most flesh looks almost the same as any other flesh, and there is a lot of internal variation in human bodies. AI models can help recognize subtle differences in internal structures without the bias risks present in population information because it should only be using visual information inside a body. If you haven’t noticed, the primary difference between things I highlight as positive or negative here is whether or not we’re only focused on internals. Bias in medicine comes mostly from demographics, aspects of a person that are quite literally skin-deep.

Likewise, pathology is all about identifying things going on inside the body, usually to do with microbes causing infection. This is an area where models can again outperform or significantly reduce the workload of a human.

Monitoring in-patients is the aspect I’m least confident in, but still willing to say will most likely be positive. This is because most of monitoring a patient is looking at vital signs and blood tests. These things don’t have an inherent bias attached to them. The reason I’m least confident this usage is because it is possible a patient’s demographic information will be also fed to a model. That will unfavorably mistreat minorities of any kind. This isn’t limited to race and sex by the way, but also includes short people, very tall people, skinny or fat people - anyone with an usual aspect of their body in any way the AI can access.

On the opposite end of the scale, I’m most confident in the idea of being able to monitor equipment and predict failures or needed maintenance, because machines analyzing machines presents the smallest risk area of developing biases.

Everything I’ve put as a way to hurt medicine has the possibility of being useful, but I put it in this list because right now, most of these usages are harmful. Predictive disease detection is as far as I can tell exclusively used to refer to feeding all information about a person to a model, and determining risks from that. This is fundamentally flawed because it includes a huge amount of racist and sexist information. It is unlikely a model produced this way can discriminate between factors actually influencing risk and factors that are influenced by mistreatment.

Virtual assistants is too broad a category to call out in specific detail, but the problem with replacing a person talking to a patient with a machine is that it’s emotionally degrading and will reinforce existing biases. This is the single most difficult bias to root out due to this being fundamentally based on a language model, and language usage is the most difficult to remove bias from. Dialect alone is so complex, and makes a big difference.

Personalized treatment plans are just like predictive disease detection in that it’s based on feeding mass data that includes huge bias into a model. The easiest details to point out are how black people and women have their pain symptoms systematically downplayed, to the point where some doctors still genuinely believe women don’t experience pain. Without that bias being eliminated, a personalized treatment plan will only truly be personalized for white men.

Mental health support has had the most prominent failures in news, but that is with models not intended for mental assistance. Models that have been developed and tested for mental health support do seem promising, but mental health is such a complex topic that I doubt it will be very effective, in part due to how it can feel demeaning to be told to talk to a model instead of a person, especially if they’re charging you for it, and I am confident that pricing for access to such models will be similar to paying a person despite the fact that it is much much cheaper even when you factor in creation costs and energy usage correctly. There’s also the fact that mental illness as a concept has a huge social component that is almost completely ignored.

Administration assistance is a euphemism for reducing the workload of the power structure that limits access to healthcare. This will ignore systemic problems. No, it will create a new kind of systemic problem. Likewise, patient management is a similar thing, deciding arbitrarily who gets what treatment rather than based on evidence.

Again, these things are not absolutes. Things I’ve listed as harmful can be done positively, and things I’ve presented as positive can have bias introduced to them. The things I’ve separated out to reflect my uncertainty are things that can easily include lots of bias or very little bias depending on implementation. In my opinion, they are the least predictable. That’s why I put them there.

I guess I got a little off-topic by focusing so much on medicine there, but all of this is a part of how AI is a Bias Enhancing Machine.