Creating Just Language Technologies with Su Lin Blodgett

The Good Robot Podcast
Dec 26, 2022
24 min read

Updated: Feb 10, 2025

In this episode, Microsoft Senior Researcher Su Lin Blodgett explores whether you can use AI to measure discrimination, why AI can never be de-biased, and how AI shows us that categories like gender and race are not as clear cut as we think they are.

Su Lin is a senior researcher in the Fairness, Accountability, Transparency, and Ethics in AI (FATE) group at Microsoft Research Montréal. She broadly interested in examining the social and ethical implications of natural language processing technologies; and develops approaches for anticipating, measuring, and mitigating harms arising from language technologies, focusing on the complexities of language and language technologies in their social contexts, and on supporting NLP practitioners in their ethical work. She has also worked on using NLP approaches to examine language variation and change (computational sociolinguistics), for example developing models to identify language variation on social media.She was previously a postdoctoral researcher at MSR Montréal. She completed my Ph.D. in computer science at the University of Massachusetts Amherst working in the Statistical Social Language Analysis Lab under the guidance of Brendan O’Connor, where she was also supported by the NSF Graduate Research Fellowship. She received my B.A. in mathematics from Wellesley College. She interned at Microsoft Research New York in summer 2019, where I had the fortune of working with Solon Barocas, Hal Daumé III, and Hanna Wallach.

READING LIST:

Language (Technology) is Power: A Critical Survey of" Bias" in NLP

SL Blodgett, S Barocas, H Daumé III, H Wallach

Demographic dialectal variation in social media: A case study of African-American English

SL Blodgett, L Green, B O'Connor

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets

SL Blodgett, G Lopez, A Olteanu, R Sim, H Wallach

Twitter Universal Dependency Parsing for African-American and Mainstream American English

SL Blodgett, J Wei, B O’Connor

A Survey of Race, Racism, and Anti-Racism in NLP

A Field, SL Blodgett, Z Waseem, Y Tsvetkov

Image Credits: Teresa Berndtsson / Better Images of AI / Letter Word Text Taxonomy / CC-BY 4.0

TRANSCRIPT:

KERRY MACKERETH:

Hi! We're Eleanor and Kerry. We're the hosts of The Good Robot podcast, and join us as we ask the experts: what is good technology? Is it even possible? And what does feminism have to bring to this conversation? If you wanna learn more about today's topic, head over to our website, where we've got a full transcript of the episode and a specially curated reading list with work by, or picked by, our experts. But until then, sit back, relax, and enjoy the episode.

ELEANOR DRAGE:

In this episode we talk to Su Lin Blodgett, a researcher at Microsoft Research in Montreal, about whether you can use AI to measure discrimination, why AI can never be de-biased, and how AI shows us that categories like gender and race are not as clear cut as we think they are. I hope you enjoy the show.

KERRY MACKERETH:

Brilliant. So thank you so much for joining us here today. So just to kick us off, could you tell us a bit about who you are? What do you do? And what's brought you to thinking about gender, race and technology?

SU LIN BLODGETT:

Yeah, thanks. Thanks so much for having me. My name is Su Lin Blodgett. So I'm a researcher at Microsoft Research in Montreal, where I work on the ethical and social implications of language technologies, so I am working towards more equitable language technologies. So I'm really interested in how we anticipate and uncover harms arising from these technologies, how we can involve more people in developing them, and how we can support practitioners developing these technologies in their ethical work, in their ethical efforts.

ELEANOR DRAGE:

Fantastic. Thank you. So we're The Good Robot. And we ask everybody, what is good technology? Is it even possible? And how can feminism and anti racism help us get there?

SU LIN BLODGETT:

Oh, yeah. What a big and important question. Does anybody answer that question without saying, what a big question first?

ELEANOR DRAGE:

Only one or two!

SU LIN BLODGETT:

So I don't think there's a single way to characterise good technology. But I do think there are some things that need to happen to make technology good. So first, and most importantly, I think technology has to be developed consensually and democratically. So I think this means like with people's full knowledge and full consent, with full participation in design and development and deployment. So I think this goes beyond just like you're doing user studies where we have technology and we ask whether people like it, right? This ideally means fully sharing power, where people are fully able to decide whether technology should exist at all. If so, what problem it should be solving and what a good solution looks like. I think it's really hard to practise because it requires technologists to step back and de-centre ourselves and our ways of thinking about the world. I think our technology also involves unpacking our assumptions. So we usually have a lot of that when we develop technology and good technology involves taking a good hard look at them. So for example, even the assumption that the technology is beneficial at all right? So good technology involves looking at the benefits that we're claiming, and checking if these benefits are actually realised. And if so, how they're distributed across people, right? Do some people get the benefit of technology more than other people? Or even our assumptions about what a good solution looks like. So for example, I think we tend to default assume that inclusion is a good thing, for example, like if I have a translation system that including more languages is inherently always a good thing. And often, I think this is, machine translation is hugely beneficial for so many people. But there are risks that can come along with inclusion, right. So for example, making languages available for translation might also make them and their speakers available for increased surveillance, right, or might mean making a community's cultural and linguistic resources available to the wider world in ways that they might not want. Or, depending on who develops the data might means that somebody who's not the community profits from these cultural resources or something. So even things that we kind of default assume are good, like inclusion are surprisingly fraught. I mean, maybe unsurprising given the state of the world, right. And so I think good technology involves kind of like, taking a look at our assumptions, figuring out what they are, and making them explicit and kind of unpacking them. And this is these are things that feminism and anti racism can help us with, right, like all this work, feminist, anti racist, decolonial, disability, so on so forth, they help us understand the social and historical context were developing this technology in, right, like, what's the water that we're swimming in? Who has power? Who doesn't have that power? Who is developing technology, what the likely outcomes are? That kind of thing.

KERRY MACKERETH:

That's really fascinating. And I really love the way that you say, we really need to start almost completely from the bottom and question, you know, is this even a good thing we're working towards is inclusion, always the value that we should be supporting and pursuing with technological innovation? I also liked how you talked about feminism and anti racism, providing tools for us to be able to ask those kinds of critical questions, but something that people get quite excited about is the possibility of technologies helping us with feminist and anti racist aims. So something I wanted to ask you is, do you think it's possible to quantitatively measure forms of stereotyping?

SU LIN BLODGETT:

Yeah, this is a great question. I'd like to think so. So language is, you know, clearly one of the most important ways we transmit stereotypes, both in the way that we name social groups and also the ideas that we transmit about them, right. So, for example, if I'm talking about gender and I say both genders are opposite genders, right? I'm already conveying ideas about gender categories and their relationship to each other. So I'd like to think that we can automatically pick up these patterns in language, either in like the training data that our systems are trained on, or in a language that the system us generating. So I think it'd be very nice to be able to do this. I think doing so is really hard. So there are so many ways that stereotypes, ideas about people and what they do or what they can do, there's so many ways that these emerge in language and they can be really hard to pick up an automatic way. So I have an example. This is an example my friend gave me when he was playing around with one of the large language models where you kind of put in a prompt and automatically generate some kind of continuation. So the prompt of the model was very general, it was “a Black woman opened the door”. And the language model generated this output that had a description of the woman and then a whole dialogue between the woman and some other person. And the description says something like, “her hair is a mess. Her clothes are unkempt, her eyes are red and swollen”, and in the dialogue she comes off as really rude or even aggressive. So she's asked about the whereabouts of some person and she refuses to give any information. And she says “how the fuck should I know, get out of here”. So like, there's a lot of ideas about Black women conveyed here, right? Like she's physically a mess. And she's aggressive and angry. And she's profane. And these are all ideas that are routinely conveyed about Black women. And these are things that I think we should not want output language models to output by default, anytime you mentioned a Black woman. But it's also really hard to capture the stereotyping and just a few words, you should go look for it in an automatic way in your data because these harmful ideas aren't just emerging from just a few words, necessarily, but they come out over the course of this like fairly extended dialogue, right? And your sense that this person is aggressive or rude or whatever emerges from the kind of complicated things about dialogue that humans are good at picking out right but there’s not a single word that says this woman is aggressive. So I think I'd like to think that in principle, this is possible to do in an automatic way. But I think if we want to do that, we're going to have to be firstly really clear what we're looking for. Right? So in this case, perhaps stereotypes about Black women being aggressive, we're gonna have to, like name explicitly, what kinds of problematic ideas we're looking for. And then we would need approaches for picking up on how these might emerge more subtly, in language, perhaps, than kind of like, you know, like, particular individual words. That's really hard.

ELEANOR DRAGE:

Yeah, it's really interesting. I think that, you know, coming from a humanities background, I think, you know, we have the assumption that you can't, you can't quantitatively measure this kind of thing. But it gets you to ask some really profound questions, actually, and probably some really difficult questions. And that may be a really useful thing for technologists to do.

SU LIN BLODGETT:

One thing that occurs to me, right is that like, these are transmitted, right from person to person, like, you know, over the course of our lifetimes, when people do adopt, pick these up, pick these ideas up, right. So like, there are clear, there must be clear patterns in the language or people will pick them up, and to learn them. And then it transmits them again, like afresh. That's how these ideas persist. Right. And so the question is for me, the question is, is that a signal that we could pick up on automatically, but I'd like to think there sort of is, because how is it that humans kind of learned these ideas and transmit them so effectively?

ELEANOR DRAGE:

Yeah and it may tell us something interesting about how that process has happened over time. I think what I was gonna say is, can you tell us what language models are and what language technologies are, and where we might encounter them?

SU LIN BLODGETT:

Yeah, for sure. So language technologies is an umbrella term that I use, okay - this is really gonna sound ridiculous, but I promise I'll give examples - for technologies that have to do with human language. So nowadays, these are typically driven by AI and machine learning. These include technologies for speech and for text. So there's also work on Sign language as a modality, but I think it's far more limited, less common. So my background, my current work is focused on text rather than speech. And language models. So you mentioned also language models specifically are like a particular kind of language technology, they are models that have been trained to, well it depends on how you frame the problem but they're, they're generally trained to assess probability distributions over language. And so when we think of language technologies, there are some applications that we probably encounter relatively routinely and we might actually actively seek them out. So here I'm thinking of speech to text right on your phone. I'm thinking of predicted replies when you're texting or emailing right, like suggested next word, or that suggests some kind of reply, or machine translation or automated captioning systems, right. They're also some applications I think, that we might not be so thrilled to encounter, we might encounter them a little more involuntarily. Here thinking of customer service, chatbots, right, or automated phone trees, where you're using it a bit involuntarily, but at least you're aware that you're using it. And then I think most concerningly there are applications that we might encounter where we're not even aware that they’re in use at all, or we get no say how they're used or what the outcome is. So here I'm thinking of automated resume screening. So we're presently using a system that automatically scans resumes or CVs to decide who to interview but this is all behind the scenes. So you as an applicant, you don't know how this is being used or how it affects your chances, or this one is great - in 2019. There were some articles about US Citizenship and Immigration Services. So this is the federal agency in the US that decides which immigrants to admit to the country. In their internal manual officers were instructed to sift through non-English social media posts of refugees and to use free online machine translation services like Google Translate. Or they're increasingly companies selling student monitoring tools like in the US, these won't be bought by entire school districts. And the tools themselves are looking at any text written by students at the school like the G Suite or Office 365. So nominally, these are to protect students from harming themselves or others. But in practice, this is large scale surveillance that kids and parents might not even be able to opt out of. Right, so. So although language technologies take, like many, many forms, they're increasingly capable, or at least they're being seen and marketed as increasingly capable. And while they can potentially be really beneficial, they're also increasingly pervasive in ways that we as a society might not be super thrilled about.

KERRY MACKERETH:

That is really, really important work that you're doing kind of illuminating the ways that these technologies are getting rolled out, and maybe some expected, but also a lot of unexpected and unknown ways. Something I want to ask you about on that note was the political choices that are made around language and which speakers get prioritised in the production of these technologies. Because we know that language is really, really deeply political. And so how do you see those political commitments operating?

SU LIN BLODGETT:

Yeah, what a good question. Yeah. So when we're studying language or language technologies, we're always making these kinds of political commitments, I think, either implicitly or explicitly. So for me, I think it's really important to think about the fact that nothing we do when we develop a study language technologies, or I mean, I guess any kind of technology in general, is value neutral, right? So we're always making choices, either implicitly or explicitly. And these choices live in social contexts. So the question, I guess, for me is, what kinds of commitments are we making when we make these choices? So I can give some examples. So one example that I've been really interested in thinking about recently is, for example, if you are deciding about how to evaluate your language technology, and you might decide to test it on grammaticality, is the speech or text produced by your system grammatical? Well, what's interesting about this is that what is grammatical is historically contested. Right? So there are many speakers whose language is often considered ungrammatical, right? So in the US, this includes speakers of African American English, right? This is a variety of Afro American English or African language, speakers of Appalachian English, right, and so forth. And many people still see one of the primary functions of school right as teaching, like the grammatical or standard variety of the language, right, so if you test for grammaticality, you have to define what grammaticality is. And then you're making a claim about whose language counts as good language, right? Or which speakers count as good speakers. If you are deciding to include minoritized language varieties, right, let's say you want to make sure your system performs just as well in African American English as mainstream English - it is a good thing, right? Even those choices can be fraught, right? So for the purposes of this test, you have to decide what counts as African American English or not, because you need a pile of language that's American English and a pile of language that is not so you can test for performance differences between the two. But what counts as African American English or which speakers is also a fraught choice for historical reasons.And they're oftentimes I think, where people will disagree on what language technology should do. And people really have different perspectives. And one thing I think we as technologists have to do is really think about how to navigate the kind of dissensus right, and which perspectives count or which perspective to weight more heavily or prioritise. So for example, suppose you're developing a hate speech detection system, right? The goal is you take in a piece of text and the output says whether it’s hate speech or not hate speech u encounter the phrase “all lives matter”, right. In the US, there are people who will say this is not hate speech. There are also people who will say it is because it's used to downplay or shut down anti-racist speech, right? So as a designer of a language technology, you've got to make a decision, does it count as hate speech or and so you're unavoidably right, deciding whose perspective counts, right? Or whose perspective kind of you're, you're upweighting here. And it’s tough because a lot of times we're accustomed to thinking about language as objective and neutral objects, right? Like, like something is in French or isn't in French, like the boundaries are really clear, right? Or, you know, what is hate speech or not hate speech? We think it should be obvious. But the fact is that these things are not always obvious, like the boundaries between language varieties are really porous. People will often disagree on what a system should do or not do. So I think it's hard but necessary to think about these kinds of decisions that we're making, as we're kind of developing the language technologies, even when we're doing really good things like testing them for kind of problematic behaviour, bias or harm.

ELEANOR DRAGE:

It's so nice to have these really fantastic examples of how really, really complicated debates have to be boiled down to binary answers: yes or no, when it comes to programming. And I am so glad that people like you are doing that hard work. Because those debates are so difficult to grapple with without having to transform them into choices that can be engineered. I next wanted to ask you about tech workers who really want to get involved in ethical work or think ethically about these kinds of choices, but aren't really sure how to? It's a big question. So it depends what kind of ethical work I guess we're talking about. But what's your take on that?

SU LIN BLODGETT:

Yeah, I guess I've been thinking a lot about how researchers can support practitioners, right? Because I think as much as we'd like to live in this sort of ideal world where the power to develop technologies is meaningfully shared, and tech workers can really decide what they want to work on. And we have all the time in the world to design and to assess potential harms and this kind of thing, I think it's not the world that we have. And so I think it's actually I do actually think the research community has a lot of work to do with thinking about the barriers and the constraints and the incentives that practitioners experience, it doesn't really directly answer your question for you as to what the tech workers do. But I do think there's a lot of work researchers can do to kind of complement the work the practitioners are doing and support them in their work. So I think it's important to understand realistically, right, what practitioners are working with, what they're able or not able to do. So we can actually develop measurement and mitigation approaches that are useful, right. I do also think that - and there's a lot of very thoughtful work in different places talking about this is, to lower the barrier of doing ethical work as much as possible. So making it as easy as possible for people to document data and models to anticipate potential problems, right, to test them. It might even make things like these sorts of things available to the public, right, like make audits available to the public and make it apparent when these technologies are being deployed, so that people can sort of community audit them, because sometimes nothing puts pressure on tech companies to do the right thing like public pressure. This means making available measurement approaches and mitigation approaches, like benchmark data sets are like techniques for kind of anticipating or measuring issues. And researchers needs to be as clear as if you're developing these kinds of approaches to be as clear as possible about the possibilities and limitations of these things. So it's like you're putting a benchmark in the world and you say, you can use it to measure this problem. Be clear about what actually tells you and what can help you diagnose and what it can write. So people can like really use them in a meaningful way and know what they're doing with it. I also think researchers can do a lot of good by changing incentives in the research community, right? Because I think these kinds of things like the datasets and the models developed by researchers get used or adapted by industrial practitioners. And so I think there's a lot of good that we can do by changing how we do things and I also firmly believe that nothing will change unless we put incentive structures in place to make things change, right. So we ought to have incentives say to publish thoughtfully created datasets that don't have demeaning swimsuit pictures of women in them, for example, right, like there shouldn't be, or be required to talk about kind of like ethical concerns in the things that we write, right to make these things like available as like a topic of discourse, like in the research community, right, and for the public. Yeah, but what a good question. It's hard.

KERRY MACKERETH:

Absolutely. And I love the way that you've said, you know, what can we as researchers do to kind of really support people in this ethical work? And how can we help make that bar lower? Because I think it is, you know, a collective project that requires everyone in this wider AI ecosystem to be involved if we really want to make meaningful changes around the production of ethical tech? And this question, and your answer is, you know, one of great personal relevance to Eleanor and my work, because we work with big tech company to think about how tech practitioners on the ground are engaging with questions and AI ethics. And one of the things we look at a lot is bias. And you've done some fantastic work with your co-authors. And you've shown that computer scientists’ motivations for trying to eradicate bias from AI, which is a major AI ethics precept at the moment are often quite vague and inconsistent. So why is that? And what is the current state of the de-biasing landscape?

SU LIN BLODGETT:

Yeah, good question. So to be clear, I, I want to start by saying I don't want to diminish all this work on bias that's emerged, right? I think it's laid a lot of really important groundwork for what we know about how systems can go wrong, right, and covering all uncovering all the ways that systems can go wrong. So I really don't want to diminish all the work that people have put into identifying bias. That said, I think often when we're using terms like bias to describe some kind of undesirable system behaviour, we're often not thinking critically about what we mean by bias, right, we're not explicitly justifying what we think is harmful or for whom. And I think when you have a vague or inconsistent notion of what bias means this might lead to a lot of other problems. So it might mean that we collectively, right as a field, or a research community, or the public, might have a hard time explicitly debating what we think is concerning or harmful, because you're not actually making that available as a topic for like, explicit discussion, right? It might mean that it's really hard to assess whether any given de-biassing approach is appropriate or useful, right, because you haven't clearly identified the problem that's supposed to fix. As a side note, I'm like, I'm not really not a fan of the word de-bias, either, because it suggests that you can fully remove bias, which is not possible. It might mean that when you have multiple, like gender bias, or racial bias approaches sitting side by side, it's actually hard to know whether they're picking up on the same concern, right, or, and whether you should be comparing them or not. I think it means it's hard to give practitioners any guidance for what to look for. Right. So like, Be on the lookout for bias as not very concrete advice! I think it's much more clear, if you say be on the lookout for, you know, occupational, gender stereotyping, right? This is what it might look like - And then you can kind of develop tests for this kind of thing, or whatever, whereas its much harder when you start from the place of bias. So that was what we were concerned about in this paper that you mentioned. And in our paper we were also encouraging work in this area to engage really explicitly with what we think bias is, right, ideally, by engaging with literature outside NLP, right, and with people affected by these technologies. To your question about the landscape, I think the landscape is moving, I think slowly, but it is moving in this direction. We're not yet I think at the point where everyone developing language technology is thinking about political bias. But thanks to all this work on bias, we do have really ample evidence, I mean, more than enough evidence of how language technologies can go wrong, enough so that there's really no longer an excuse for not thinking about it. Right. So I think I think, you know, it just no longer holds water as a reason not to have worried about these concerns. And I do think the word home bias itself is gradually starting to think or engage more critically, right? I think there's an increasing amount of work that's taking a look at existing bias measurement approaches and saying, Okay, hang on, is this actually what we want? Is it especially useful? What's the underlying concern is picking up on. So I think there's a lot of work, there's a lot of work still to do. But today, at least, I'm cautiously optimistic about a lot of the work that I've seen kind of emerging recently.

ELEANOR DRAGE:

Yeah, amazing. Can you give for listeners who haven't come across examples of how tech workers try and ‘remove bias’, Can you give us a couple of examples of how that works, what they might do to try and remove bias?

SU LIN BLODGETT:

Yeah, yeah. Um, so this can happen at a lot of different kinds of stages of the NLP pipeline. So you can imagine that a typical kind of pipeline for developing language technology, right, will often involve gathering, monitoring and documenting the training data. And then you train your model, right. And then you evaluate it in a cycle, because ideally, you're kind of iterating. And so often, people try to intervene, people have tried to intervene at different places in this pipeline. So sometimes people will try to mitigate bias by changing things in the training data, right? For example, if your training data skews heavily towards - it mentions men far more often than women, maybe you'll try to, like add new sentences to talk about women, right? Or something like that. Or sometimes you will intervene in the model, right? So you might change the objective function for the model to try to get it to learn something different, you might try to poke around at like what the model has learned. So for example, in a lot of modern machine learning NLP technologies, you might have words that live in a vector space where you have continuous representation of that word so every word right is like a three dimensional vector, right? And so a lot of people will look at the geometric representation and say, okay, like, are all the occupations that are stereotypically feminine clustered together in the space? And maybe try to manipulate how these words are represented, so they're not all clustered together in this way where I find the geometry undesirable. Sometimes people will intervene more like in a moment of output, right? So if people might have lists of things that technology is like, shouldn't output or, or otherwise kind of, like try to restrict, like, what technologies will do. So there's like a lot of like places where people will kind of try to intervene in what a language model like technology might do or not do.

ELEANOR DRAGE:

Yeah, thanks. I guess it's really good to see this vast array of work because when we were interviewing engineers, at this big tech company. And we were asking them, you know, what is de-biasing? How do you do it? And people were responding, quite understandably talking about bias as a mathematical concept, because that's what they were used to, that was their frame of reference, rather than how Kerry and I, you know, got used to it through AI ethics. So you can see how if you go into trying to de-bias things with a different definition of what bias is that you might come up with some extremely different kinds of work. So, we know that gender bias is often the default kind of bias, when we talk about de-biasing. People think about gender bias immediately. And that reflects how in the world - in universities, for example, where Kerry and I work, there's the Gender Studies Department, and it is the umbrella for everything else. Well, the other axes of oppression like race and disability and ageism, that's where that work happens. So why do we think this is? And really what's the problem with engineers thinking that gender, specifically binary gender - is easier to tackle than, for example, race or disability?

SU LIN BLODGETT:

Yeah, this is such an interesting question. And also one that I would love in return to pick your brains on because you are, as scholars of, you know, gender technology. Yeah, I guess my first thought is, when we are trying to say like measure for clinical bias, right, related to different social groups, when you're creating your measurement, you're usually relying on some piece of language being proxies for social groups, right. And I think sometimes these proxy relationships seem more straightforward than other times. So I think binary gender in English is seen as straightforward because then maybe all you have to do is collect gendered words like pronouns, or relationship terms, like mother or father, right. And then you see how your system behaves with these words. And for these words the relationship between the words of the social category women and men, it seems really straightforward. Like, mothers are always women, fathers are always men, like easy. And so your method now assumes that you have Words for this kind of straightforward relationship. And then you're done. Right? And then you say, this can be extended to race, and then you're very happy. And for example, coming back to data augmentation, for example, right, you might augment your training data so that every time you see sentences with he or him, you create a sentence with she or her, and you say, this is my solution for creating better training data. And that the way you augment your training data seems very straightforward. I think this breaks down really quickly when you try tackling other social categories, because the relationships between like words, and those social groups can suddenly appear more fuzzier, right, like, all of a sudden, it's more complicated. Example, there aren't pronouns in English that have to do with race or disability. So now we're in the business of like, picking names that you think are good proxies, or race, right? Or your language that describes this ability, right, because you want to see our systems behave in this kind of language. And then doing something like augmented data starts to feel really fraught, right? Like, do I just replace every instance of white in my dataset with another racial group? Or every name that’s prototypically white with some other name, right? What does it even mean to have a good augmented data set? Right? People think of these as counterfactuals, but like, what does it mean to have counterfactuals in this sense? Because especially because, like people in the world, right, like, are not treated equitably, right, like don't exist, kind of like in like, you know, like social structure, like social hierarchies in the same ways. And so you're thinking a lot about how you would create this kind of change? And then I think, I'm guessing, but I'm imagining that what people realise as they do this is that actually all these social guide categories are fuzzier and more complicated than they thought, like categories, like race are socially constructed and contested. And the relationships between language and social categories are always shifting, and that even categories like gender, shockingly, are also socially constructed. And then they don't know what to do. So it doesn't feel like a satisfying answer. And I'd really love to know what you think. But I think the upshot is that we have a lot of methods developed for binary gender, that make a lot of assumptions about the relationship between languages and categories. And these assumptions possibly don't even hold for gender. But they break down really, really quickly when you try to port them elsewhere. That makes for a really interesting state of play with bias mitigation approaches.

KERRY MACKERETH:

That's so fascinating. And there's something definitely I think, Eleanor and I can really resonate with because one issue we've had is when we tried to, for example, measure gender and representations of scientists in film, like we have methods that we've developed around gendered pronouns around character descriptions that work, but then as soon as we tried to measure, you know, other kinds of predictive characteristics or how they knew it, and based on that methodology, it just stops working and it really really doesn't work and so you know, everything was saying totally resonates with our experience as well. Um, you're such an incredible wealth of knowledge, and I feel like we could keep on talking forever. But sadly, we've already come to the end of this episode. I feel like you've answered so many questions and also raised so many more, I just want to say a huge thank you for both of us, it's really been such a pleasure.

SU LIN BLODGETT:

Thank you so much it was such a pleasure.

ELEANOR DRAGE:

This episode was made possible thanks to our previous funder, Christina Gaw, and our current funder Mercator Stiftung, a private and independent foundation promoting science, education and international understanding. It was written and produced by Dr Eleanor Drage and Dr Kerry Mackereth, and edited by Laura Samulionyte.

Creating Just Language Technologies with Su Lin Blodgett

Recent Posts

Join our mailing list