Voice Recognition Technologies and Border Control with Pedro Oliveira

The Good Robot Podcast
Mar 14, 2023
21 min read

Updated: Feb 10

In this episode we talk to Pedro Oliveira, a researcher and sound artist based at the Akademie der Kunste in Berlin. Pedro does amazing work investigating border control technologies that listen to asylum seekers and claim to be able to discern where they came from from the way they speak. In this episode we discuss why these kinds of technologies rely on the assumption that there is an authentic way that a migrant from a particular place should sound. Our quest to unravel vocal authenticity takes us through frequency, timbre, and 1960s synthesisers from East Berlin.

Pedro is a researcher, sound artist, and educator committed to an anticolonial study of listening within the violence of border work. He has previously worked as a Postdoctoral fellow at the Helsinki Collegium for Advanced Studies; a lecturer in Musicology and Media Studies at the Humboldt-University Berlin; as well as a teaching and research associate in Media and Cultural Studies at the Heinrich-Heine Universität Düsseldorf. Artistic residencies include the Max-Planck Institute for Empirical Aesthetics in Frankfurt, and IASPIS/EMS, both in Stockholm. He is a founding member of the research platform Decolonising Design; he holds a PhD from the Universität der Künste Berlin, and an MA from the Hochschule für Künste Bremen.

Reading List:

See https://oliveira.work/writing/

Transcript:

KERRY MCINERNEY:

Hi! I’m Dr Kerry Mackereth. Dr Eleanor Drage and I are the hosts of The Good Robot podcast. Join us as we ask the experts: what is good technology? Is it even possible? And how can feminism help us work towards it? If you want to learn more about today's topic, head over to our website, www.thegoodrobot.co.uk, where we've got a full transcript of the episode and a specially curated reading list by every guest. We love hearing from listeners, so feel free to tweet or email us, and we’d also so appreciate you leaving us a review on the podcast app. But until then, sit back, relax, and enjoy the episode!

ELEANOR DRAGE:

KERRY MCINERNEY:

Brilliant. So thank you so much for joining us here today. Just to kick us off, could you tell us a bit about who you are and what brings you to working with technologies that make and interpret sound from decolonial and feminist perspectives?

PEDRO OLIVEIRA:

Hi, my name is Pedro Oliveira, I'm originally from Brazil, but I live and work in Germany. I came to this topic, I mean, really, like, I’ll quickly say that I wrote my PhD thesis about the relationship between sound and police violence in Brazil. But I wrote it here in Germany, and because I wrote it from a so-called decolonizing perspective, I got often asked, or, you know, it was often implied in the kind of questions that I got that these kinds of things only happen over there, like across the ocean back, you know, in the Americas, and so on. And I got a little bit annoyed with that. So I started looking at, you know, the consequences of coloniality that were happening within the borders of Germany. And that led me to study the technologies that have been deployed in the border and asylum systems of the country from the same perspective.

ELEANOR DRAGE:

And I'm so glad you did, because it's been hugely influential to my research looking into technologies of the border. And the idea that the border is encroaching on us. So everybody existing within the country, and not even going to the border still is forced to reckon with bordering technologies. So even though you look a lot at bad technology or things we might want to think of as really unfriendly, what is good technology? Is it even possible? And how do decolonial or feminist ideas help us work towards it?

PEDRO OLIVEIRA:

First of all, thanks for the influential thing. It's always really interesting to hear how the work resonates. It's a tough question, I think, because I don't think any technology is inherently good or bad. I think like, you know, my work appears both as academic and artistic depending on the cognitions that I get usually. But I think like, as an academic, as a researcher, I think my first impulse is to analyse the relations and the networks and the web of connections and histories that enable and allow for certain technologies to exist. But on the other hand, I mean, historically, we have seen ways in which technologies have been purposefully misused in order to advance, you know, anti-racist, anti-fascist, feminist and decolonizing agendas. So I think like to us to ascribe a certain value to technology in itself, it might be a different way of looking at it, which I'm not exactly in agreement with, because I think it's more going into the nitty gritty of how technologies come to exist, that we figure out the ways in which they have been used, and the ways that they want to be used. And that also opens up the possibility of, you know, completely repurposing what they mean, and what they can mean, for different ends. So you know, like, it's not, it's not the same, like, it's usually like, it's a big, almost a cliche, for instance, in media theory to say that, you know, most technology that we use in everyday appliances in the sector comes from military research, and so on, which is, you know, it's true, but at the same time, that does not immediately ascribe a certain moral judgement of good and bad to them, but rather, it helps us understand the conditions in which certain technologies emerge. And so, you know, I'm kind of like going around the question, but just to say that, I think it is a matter of history, sizing technology in order to understand how it emerges, rather than just applying, you know, a matter of good and bad. I don't know if that's an answer even that, but you know, but that power would, that's how I would approach it in a way.

ELEANOR DRAGE:

It definitely makes sense.

KERRY MCINERNEY:

Yeah, no, that's really interesting. And I think like, it's an interesting tension as well, that emerges between the different people we have on the podcast, people who kind of say, well, fundamentally, the histories of these technologies means that they can't be redeemed, we must refuse them just completely reject them. Whereas we have other guests, people like my Indira Ganesh, who are interested in how do we hack or jam or play with technologies in ways that give them different kinds of joyful and liberatory purposes. And so for our lovely listeners, we definitely recommend you check her episode out. I actually want to talk about one of your projects, where you look at something called dialect recognition software, which is used by the German Federal Office for migration and refugees. To validate migration claims, I have to put my cards on the table here, my previous research was on immigration detention. And so I'm extremely concerned and sceptical about the use of these kinds of technologies to try and determine truth, especially because within the asylum process, truth is often portrayed as this really singular thing, which doesn't really map up to our experiences of the world and how complex those experiences really are. So could you tell us a bit about your work on this, what the Federal Office is doing? Why these technologies you think don't work, and also why they're also very dangerous for how we think about the asylum process?

PEDRO OLIVEIRA:

Yeah, I think a good way to start talking about it is exactly what you just said, like this idea of a production of a truth. Right, like, and, you know, relating back to what I was saying in the previous question, I think a lot of a lot of trust is put on technologies to deliver something that is, you know, more accurate, or more objective about, like, you know, something that is inherently chaotic, and, you know, dare I say even disorganised by nature, which is, you know, human beings and the web of relationships that constitute the world. So technology often comes as this narrative that seeks to put order into the world, right, like, because it's based on I don't know, mathematics, or, you know, calculus and so on. And this is just one of many paradigms that could have been used in order to, you know, to, for technological development in a way, right, like, it's just one that was chosen in the labs in, you know, in Bell Labs, in IBM, you know, in all these places. So, that production of truth is of course always in the interest of advancing a certain agenda. So again, it's not the technology in itself, that it's good or bad, but rather the agenda by which it is applied, which seeks to advance in this case, the enforcement of you know, the borders of a nation state, which is you know, has economical, and also racial configurations in order to exist, right, like, so what secures the nation state, especially like fortress Europe, is a strong, racist, and, you know, economics under the guise of racial agenda, if you want to be really like summarising it. So, the dialect recognition software comes as a way of validating those premises, you know, in a way, it's not new in itself. So dialect testing has been used in border control and asylum seeking systems since the 90s. But what was used was linguists and so called non-native experts, no, sorry, non-linguist native experts. And because these were basically human beings doing the job, it was under heavy criticism from linguists, from lawyers from, you know, because it is inherently bound to fail relic from, you know, biases, prejudices, you know, people are often racist, and, you know, have their own also their own agendas in that sense. So the replacement with that with a technological solution with this kind of like, almost like those Ex Machina solutions, you know, that comes to the rescue of the inherent bias of human beings comes as a way of, again, to validate the production of a certain truth. What is not said is that 1), these technologies are also very finicky at what they do, because they rely on annotations on databases, they rely on algorithms that have been developed by companies with again, with advanced certain agendas, you know, for for commercial, and also ideological purposes as well. But also the fact that these technologies are actually very new, and they don't work as they claim they do. And I am very sceptical that they can ever work in the way that they claim they do. But what the Federal office does is to establish a certain leeway for what counts as truth. So a certain boundary, like a very, very porous boundary of what can count as truth. And that is enough for them to validate the claims that you know, the software does what it's supposed to do. Just to be like, really, really quick about how it works is that before when you had linguists doing this work, it they would use recordings from the the hearing of an asylum seeker in which they come in front of the officers, of the case workers and they have to tell the story of the journey, etc, etc. And this was recorded and sent to third party companies for the evaluation to be made. What the German Federal Office does is that they schedule a specific meeting for the asylum seeker to go into the Federal Office and go into a room in which the software is set up, it's usually a telephone device, and they have to describe a picture. And this picture is usually like a random picture. They don't disclose what this picture is. But you know, I've heard from accounts that it's really like a whatever picture. And based on two minutes of a recording, the software gives the list of distributed probabilities of someone speaking specific languages. And then the case, it's on the hands of the caseworker to decide whether or not that result can count as evidence for the decision on the asylum on the asylum application. So even though they still automate this process with software it is still in the hands of the caseworker to decide the weight of this test in the evaluation of an asylum claim. So it's basically, what I usually say in my work is that it's making the process faster and slicker, but not for the asylum seeker. And you know, instead it's making it slicker to take the weight off the federal offices shoulders, so that they can say, well, the computer is more objective, the computer makes it faster, so they can process more people. And of course, deport more people.

ELEANOR DRAGE:

And this is what technology has been used for right to outsource xenophobia and racism in particular ways. What is kind of astounding about what you're saying is that technology can be working fine, as in, it can be functional, it is not producing errors, but still not work. As in it doesn't do what you think it does. And what's amazing about you and other sound engineers, or other people who work with sound is that, you know, people like me who know nothing about sound can say that is not right, that's funky, and not good. But I don't know anything about frequency and timbre and you can show the absurdity of searching for an authentic place of origin or an authentic migrant somewhere in the breakdown of this voice. Right. So why are they looking - why is this federal office in Germany looking for authenticity in all the wrong places?

PEDRO OLIVEIRA:

I mean, just to look for the authenticity is already like, you know, again, it's also bound to fail, right? Like, because who defines what authenticity is? And this is also like, again, bound by the constraints of, you know, this fiction that is the nation state and the fiction that is borders, you know, so it's not, I mean, what you said earlier makes a lot of sense. You know, it's not that it's producing errors because the federal office defines what counts as an error, right? Like, because it's not giving, like yes or no, it's giving probabilities. And you know, these probabilities are, you know, as subjective as the claim of authenticity in itself. So, you know, from a cultural perspective, this claim for authenticity is already wrong, right? Like, what counts is an authentic inhabitant of a certain country, what counts is a native person from a certain country, you know, in the history, for instance, in the history of the African continent, we see how finicky this, this can be because the borders were defined somewhere else, they were defined in Berlin, right? And when it comes to the reason why this software in the first place, which was because of the migration movements coming from the Levant region, and from the Gulf, and etc., it also, it also plays out on geopolitical issues, what counts as an authentic person from Syria, what counts as an authentic person from, you know, even from Palestine, which is not even recognised as a state by many countries, you know, who are actually doing this evaluation. So, that search for authenticity, they are looking for it actually, in the body. And that's what's horrible about it, because, you know, there's, I mean, I'm not the only one saying this, but a lot of these machine learning and artificial intelligence technologies are basically Victorian science and racial science updated, right? So they are actually looking at how language resides in the physical characteristics of the body. So that's why one of the claims that I'm also making with my work is that even though they say they're measuring phonetics, and this is written in text by the federal office, that they are measuring phonetics, there is nothing in their process that validates that claim that they are measuring phonetics, and when you look at how, in the methods that are used for speech and language recognition, like state of the art research in computer science and machine learning, and so on, you have two methods to kind of do that test. One is what's called phonotactic, which measures phonetics and the other is the acoustic one. For a phonotactic approach to be true or to be used you have to have a textual transcription of the recording, which is not present in any of the documents that the Federal Office has disclosed so far. And it's also not present in the test result file that usually caseworkers and lawyers get. So everything tells me that they're using this acoustic approach. And what is this acoustic approach? It is actually measuring the resonant frequencies of the vocal tract. So it's considering the vocal tract to be a filter. And it's measuring, like the peaks using what they call MEL frequency and cepstral coefficients, I'm not going to get into that, because I also have a limited understanding of what that is, but I'm, you know, I'm looking more and more into it. But they are measuring these resonant peaks. And their claim is that people from that come from the same country that speak the same dialect, not even the same language, but the same specific dialect within the language, they will have a physical constitution of their vocal tract similar to a large corpus of speakers. So it's basically measuring authenticity in the body. And if that doesn't scream racial science I don't know what does, right. So that's one of the claims that they're making, because these resonant peaks are actually the measurements that are used, you know, in computer music or you know, in other applications of the same processes. It's used to play around with timbre, what in German is also called…... So that's one of my hypotheses, that they are mistaking one thing for the other and making truth claims about timbre, which is also very elusive. And, you know, there's a wealth of scholarship that investigates how timbre is elusive, and there is nothing to be said about identity, and because timbre is subjective, timbre is in the ear of the listener as Nina Eidsheim would say.

KERRY MCINERNEY:

That's just so fascinating. And it's really terrifying. But also important, I think, to kind of hear about this from the vocal perspective. Something that I researched was a way that scarring on people's bodies and different kinds of evidence of physical wounds was seen as a more objective kind of truth by the Home Office, and that was seen as a way of trying to find truth in the body. But yet, even then - this is the UK Home Office context I'm talking about it would still - they would still find ways of kind of twisting that narrative around saying, Oh, well, this isn't really the truth. And so, you know, I thought that there was this impossible double bind of this obsession with authenticity, this obsession with a singular idea of the truth, and also a complete unwillingness to ever acknowledge, you know, the reality that, you know, this might be the truth, but it's comes out with a different kind of political outcome or imperative. And so you're just not willing to accept that. And so, you know, it's really, you know, again, horrible, but fascinating to hear about that transposed onto the voice and sort of that kind of also as a particular pseudoscientific kind of locus or outcome of the application of these technologies.

PEDRO OLIVEIRA:

Yeah. And also, what it does is that they what they are saying is that there is only one way of defining a migrant, right, like, which, you know, because it facilitates their work and also facilitates their claims to enact racist policies, right, like, so it's really what they wanted to they want to congeal this notion of a migrant, what a migrant is. And you know, and it's what pains me to think is what a colleague, a good friend Shahram Khosravi says is that it is marked by the scar, as you said, but he says, like it's marked by the journey, you have to prove that you survived even though you’re not supposed to have. And this is what they say, is that the marks of the journey is the proof that you shouldn’t have survived but you do, and this is what validates the “truth” for these federal offices.

KERRY MCINERNEY:

Yeah, absolutely. And for our lovely listeners, as well, we'll put that on our reading list. It also reminds me I think of Nayak's work on the idea of like the worthy victim and how US anti trafficking law in particular has to do set up around this idea of worthy victimhood, you have to show not only you know, did you suffer these particular things and survive these particular things, but also, you're seen as this kind of virtuous victim, who then is deserving of protection as opposed to them and unvirtuous one.

ELEANOR DRAGE:

I would love to ask you both things about this as well. But the next thing I really want to make sure that we get time to talk about is the low-fi or non-digital techniques that you use to investigate AI systems, it’s quite kind of archaic. I really like these attempts to look behind the logic of AI systems, slow them down, actually listen to what they are claiming to do. So can you give us some examples of how you do this? What kinds of low-fi or non-digital techniques that you use?

PEDRO OLIVEIRA:

I mean, it's, it's almost like something that I keep repeating because I'm seeing that it's kind of becoming my thing of repeating this over and over. But um, the thing is, I think, and that might even be a little bit clashing with my first answer about, like, you know, What technologies do, but I don't think I can talk about the things that I'm talking about using the same methods. Because I think that, again, like it's, it's, it's the conditions is the historicity of these techniques that interests me, because I think that, you know, different pathways might be found when you look at what's behind. And again, I mean, again, paraphrasing, Denise Ferreira da Silva, a Brazilian philosopher who says, like, the question is not ontological it is methodological, it is not like, what this is, but how it comes up with answers about what this is, right? So I'm interested in that. And for that, I feel that I have to look at the claims being made with different techniques and with different methodologies. So for me, this becomes like a Lo Fi thing - I say Lo Fi, because I'm not using machine learning, I'm not even using programming to create work, basically. And when I say that I'm slowing them down is because a lot of these calculations, a lot of these truth claims are made in the frequency domain, right, like in the mathematical domain, the domain of the Fourier Transform, in the domain of, you know, all these mathematical processes, spectral analysis, and so on. And this happened, like in the fraction of a second, this happened in the computer domain, right, like in the calculation domain. And I'm interested in taking the same metaphors, the same things and applying them to sound, to what I call the listening domain, which will be the time domain. And, you know, it's not only because the metaphor is interesting, but I also think that by exposing other ways in which the same concept can work, we might find different answers to the same questions you know. So, the Lo Fi process becomes also an exercise in aesthetics for me, because as I said, a lot of my work also comes out in artistic manners. And I also think that sometimes listening is able to touch into things that you know, discourse cannot do. So I think theory has a limit, you know, there is only so much you can do theorising about things, there are certain moments that you have to feel, what it's why it's there in order to get a glimpse of, you know, what it's doing. So an example of what I do - I mean, it's also bound to my personal tastes, right, like an example of what I do is that I use a lot of synthesis. You know, I play a lot with synthesisers in my work. And I try to, and this is like, really a methodological self constraint that I do in my work that I try to use as many analogue processes as possible. And to think of electricity as a way of creating those connections in a different way than a computer would do. Because I like electricity, it is also a little bit wild. And you know, it's also a little bit unpredictable sometimes. So analogue synthesis is a good way of modelling, if you will, in a different way, these processes, and that is not only the technique by itself. And I think that's important to say because when I say I'm doing a low five slowing down, it's not only from a technical standpoint, but I'm also looking at the long history of it. So when they say I'm slowing it down, I'm also looking at each process of the software and finding its long history. So a lot of my work also taps into the long history of listening in Germany, to establish where these things come from, you know, this dialect recognition software doesn't come out of nowhere, and just emerges as this, you know, magical solution. It has a long history in an entire nation trying to make sense of otherness through listening. And this is for me the slowing down part, that I'm looking at each part of the process of the software and saying like, well, this comes from somewhere. And this somewhere is still more or less the same, but just enacted in a different way. The technical part is how i It's how I demonstrate I mean, it's not even demonstrating but how I rehearse it, how I play out with it, you know, how I use it as a kind of a playground in the rehearsal space for not only establishing these connections, but also thinking about other pathways that you know, other than reactions that this could have taken. So that's the Lo Fi and slowing down thing that I usually claim is part of my process. It's also sort of a tongue in cheek, you know, this Lo Fi thing, because I also think there's a tendency of advancing artistic work with the most state of the art technique, which I think it's fine, I'm not making any more judgments with that. But I think there's a lot to be gained by looking at it from a really tongue in cheek, Lo Fi, do it yourself, you know, perspective as well. And that's what makes sense for my own process and my own thinking.

KERRY MCINERNEY:

Absolutely. And I'd love to hear a particular example of that. Because we know that dialect recognition uses an MEL filter to vectorize input voices. And you found this extraordinary old synthesiser and the Kunste Academy in Berlin, that has an MEL filter. And that got you thinking about synthesisers and dielectric ignition together. And so we'd love to hear a bit more about this particular synthesiser and your thoughts on the dual use of technologies like this. So this filter in particular as a racist and colonial technology, when mobilised by the German Federal office today, and you're also then the synthesiser sound of 1960s East Germany?

PEDRO OLIVEIRA:

Yeah, I mean, this is work in progress. So I mean, everything that I will say is bound to change, you know, even by the time this is published, but it's, you know… I think the first thing that I would like to say is that there's not nothing inherently racist in the filter, right, like, the filter itself is not racist, because I also think that, you know, technologies, you know, we cannot input human agency into technologies, I think it's, you know, they're always in relation to their use, right, like, so. It's always like how racism emerges as an assemblage between the technology itself and its use in certain things. So filtering is the first step in the dialect recognition, it's what determines what's the vector space in which the test will take place, roughly speaking. And the state of the art technique for that is this MEL filter, which comes from this psychoacoustic phenomenon, or, you know, scale that was developed in the 1940s that seeks to approximate a measurement device or a set of metrics to the ways in which supposedly the human ear functions. I haven't found yet the exact reason why this is the most used technique in speech and dialect recognition. But funnily enough, this kind of filter, this kind of configuration, or filter topology to use a more technical term, does not appear in musical instruments, at least that I knew of, until I bumped into the synthesiser in the studios of the Akademie der Kunste in Berlin. And that was really random because I was not looking for that, it was just like the way that the director of the music studio showed me because it's a prototype of a one of a kind synthesiser that was built in East Germany, and has a very particular sound and so on. And when I looked at it, it was written that it had a MEL filter and that he was like, wait, I have never seen this before. So you know, again, coming to what I was saying earlier, like this, this connection, like the speculative connection with the long history of listening in Germany, is what I'm trying to do with the work that I'm currently developing, at Akademie der Kunste, in which I'm looking at the history of why this filter was put into that synthesiser, why that specific design, and more importantly, why it does not appear anywhere else but in speech and dialect recognition software. So this gap is what interests me, because there's nothing inherently good or bad about that filter, it is not inherently also more accurate or more precise than any other type of filter design, not for synthesisers nor for speech and dialect recognition. But it appears there you know, and it's again another connection with Germany as a country, which is the only country that is using the software so far, but kind of like develops this as a part of their asylum process. Some, um, you know, creating - I'm describing it as a juxtaposition of these histories, and that's kind of what I'm trying to do with the work that I'm currently developing. And I can play you know, I can send you a letter and then you can put in the in the recording, like some recordings that I've been doing with the synthesiser so you know, people get a glimpse of how it sounds like.

ELEANOR DRAGE:

So thank you so much for joining us. These were terrific answers and amazing insight into the work you do that I will continue to follow really closely. For listeners, a lot of the sounds we're talking about, a lot of Pedro’s work will be featured in the reading list. We'll create links, so you can go and check out his work yourselves. But from us, thank you so much for being on the show. It was brilliant.

PEDRO OLIVEIRA:

Thank you so much. I really appreciate the invitation and the opportunity to talk a little bit about my work. It's always a pleasure and it's always you know, for me, it goes beyond me as an artist. I think it's important, important things that need to be talked about and need to be you know, put forward and fought against.

ELEANOR DRAGE:

This episode was made possible thanks to our previous funder, Christina Gaw, and our current funder Mercator Stiftung, a private and independent foundation promoting science, education and international understanding. It was written and produced by Dr Eleanor Drage and Dr Kerry Mackereth, and edited by Laura Samulionyte.

Voice Recognition Technologies and Border Control with Pedro Oliveira

Recent Posts

Comments

Join our mailing list