Thumbnail for Can AI help us save endangered languages? - What in the World podcast, BBC World Service by BBC World Service

Can AI help us save endangered languages? - What in the World podcast, BBC World Service

BBC World Service

14m 1s2,346 words~12 min read
AI audio transcription
Transcript source

AI audio transcription

This transcript was generated from the video's audio because no usable YouTube caption track was available. The transcript below is server-rendered so it can be read, searched, cited, and shared without opening the original YouTube player.

Timestamped outline
Pull quotes
[0:00]kan K and I'm then going to say to it, can you translate this phrase and identify the language?
[0:00]So it's believed that we're going to lose about half of the world's languages by the end of this century without further intervention or protection.
[0:00]And around half of the world languages are now endangered, but some researchers think that AI could help preserve them.
[0:00]But at the same time, AI is also being accused of creating more language inequality because it is only trained on a handful of dominant languages.
Use this transcript
Related transcript hubs

[0:00]kan K and I'm then going to say to it, can you translate this phrase and identify the language? So it's believed that we're going to lose about half of the world's languages by the end of this century without further intervention or protection. Around the world, languages are disappearing. And when they go, it's not just the words you lose. You can lose stories, songs, history, ways of seeing the world. And around half of the world languages are now endangered, but some researchers think that AI could help preserve them. But at the same time, AI is also being accused of creating more language inequality because it is only trained on a handful of dominant languages. So, can AI stop languages from going extinct or will it just contribute to their demise? I'm Hannah and this is what in the world from the BBC World Service.

[0:54]We have got Sophia Smith Galer here in the studio with me, who's a journalist and an author, and you have written a book about endangered languages, because this is a real problem now if around half of the world's languages are at risk of going extinct. Is it becoming more of a problem, or are languages disappearing more quickly? Yes, linguists agree that this seems to be happening at an unprecedented rate. In my work, I look at it as linguicide, this term that was coined in the 80s by a linguist who wanted to describe the the many threats that languages experience and the outcome, the outcome which is erasure and loss. So it's believed that we're going to lose about half of the world's languages by the end of this century without further intervention or protection. And what that means for those languages is that they will no longer be spoken day-to-day as they are now. Which languages in particular are at risk and what are the big threats to them apart from I've mentioned AI? They honestly are all around the world.

[1:56]If you look at Europe alone, uh there are many in Italy for example, you'll find many in Russia. There are extraordinary number in the United States, North America in general, South America, I mean, they are everywhere. If you look at a map, endangered languages are everywhere.

[2:16]Linguists have learned through their research that two of the most common variables that you'll see in communities that have lost languages are the building of roads and the rising of socio-economic status. So what tends to happen is as soon as a community becomes more connected with the wider world, as soon as they're able to access high quality education, healthcare, what every community around the world deserves, a language hierarchy can be positioned where some languages are conceived of as more prestigious, more likely to get you a good job, good opportunities in life.

[2:51]Other languages get relegated as less important, less valuable. So, how many speakers of a language do you have for that language to be classified as endangered? What makes an endangered language? This is really cool, because you've just made an assumption that any of us were to you sort of think if if a language is down to not many people, not many people anymore. That must be a sign it's endangered. I mean, there's one language that I looked at in the book which is Karuk and that's a language in Northern California today. And at its highest point, linguists believe it only ever had 1500 speakers. But in that part of the world, a language with what may seem to you and I a small number of speakers was perfectly sustainable for a really, really, really long period of time. In today's globalized world, of course, these speaker numbers uh suggest that the language may be more under threat. Certainly in the Karuk language's case it's experienced extreme linguicide from the United States and from colonialism that's taken place in the United States. But uh it's not really the number of speakers. That can be a tell, what linguists will look at, and they use a big chart to help them go through this. Uh one of the first danger signs is are parents passing this on to their children? And it's as soon as parents aren't passing it to their children anymore. It's uh as soon as hang on a second, is this language anywhere in public life, we're in the education system. It's questions like that that begin to tell linguists whether a language is is under threat or not. And what is at stake here? What happens to a community when a language disappears? The one of the most shocking and commonly cited examples that linguists will use when they think about this are findings from British Columbia where they looked at communities, indigenous communities that had been able to maintain their their heritage indigenous language, communities that hadn't.

[4:57]And they identified that youth suicide rates were a lot higher in the places where they hadn't been able to maintain the language. Now, causation, correlation, it doesn't mean it that doesn't suddenly mean that a language is what protects you, but could it be with further research that we better learn the answer. Could it be, for example, that the ability to maintain a language shows a self-continuity of of what has been important and integral to a community that offers them resilience down the line? Uh in my work where I look really closely at the effects of language loss to both people and a community, uh in families, for example, where you have grandparents with very different linguistic portfolios to their children. You have in one family, uh people cannot necessarily communicate on a level where everything may be understood and that they can understand each other. These languages connect us to our cultural history, our identity, uh the knowledge bases that our ancestors have have built and preserved. Commonly in oral tradition over years and years and years, that prepare us and help us adapt for the future as well. They really aren't just historic artifacts. So people are increasingly turning to AI and using it as a way to preserve endangered languages. What are some of the projects that you've been been seeing? I went to Ghana and I interviewed language activists there who were building a translation tool for a smartphone that used AI and served Ghanaian languages that are otherwise completely absent on the big tech translation offerings. And I remember when I spoke to this particular developer and he was talking through all the different languages he was getting on there, it's immensely challenging. Think about it, you've not only got to handle text, but ideally you want a speech and audio recognition, so you want to be able to speak into it and listen to it back. That's incredibly important for anywhere where there may be lower literacy levels, where people don't want to text and type and read, they want to just speak. And it's so hard building up the amount of training data that is required for that. And Sophia, we've been hearing from someone who is working on a project like this. This is Ivory who's studying in the US. One of my favorite languages I've worked on is this language called Nushu. So Nushu, um, originated from ancient China and it peaked during the 1600s. It's the world's only known language that was created by women for women. So my project helps preserve this language by teaching AI models how to translate Nushu from Chinese. And so with a small sample of data, we're able to teach this model essentially to learn this language. And preserving endangered languages like Nushu is really important because they represent uh a key part of our culture, uh it has a lot of meaning in society, so that is why I do the work I do. So Ivory's grandma spoke Nushu and Sophia, your grandma also spoke an endangered language uh Emilian, right? And I want to test how good these AI translation capabilities are on chat bots when it comes to endangered languages. So you've got your phone there? So I used a different large language model the other day, and I can't remember what what expression I put in, but I put it in just to out of curiosity. And it actually accused me of writing in a conlang or constructed language. An example of that would be say Valerian in in Game of Thrones. So it had guessed that the very old European language now endangered that my nonna spoke uh was was I made up. So now I'm going to type in in quotation marks, I'm going to put nama K and I'm then going to say to it, can you translate this phrase and identify the language? And I've deliberately written it how I guess my mom would write it, because Emilian does not have, and this is the Piacentay variety of Emilian, it doesn't have a standardized writing system. What's it said? The phrase Namaka doesn't clearly match a standard phrase in a widely recognized language as written. So I can't confidently identify or translate it. Very disconcerting and would certainly be disconcerting to the the language activists that I have spoken to and linguists that I've spoken to. Well, we know this is something that AI does, it makes these mistakes, it hallucinates. So that's one of the risks presumably when it comes to using AI to look at endangered languages. Of course, so one example that I am familiar with is regarding the Manx language. The Manx language is a very cool language and it's the language of the Isle of Man. It's one of the languages of the the Celtic nations. One of the most famous translation tools in the world has put out a translation tool for Manx and it was generating completely inaccurate translations. They translated the word for Manx in the language which is Gwk as English. They translated it as English, which is sort of mind-bogglingly bad to do that. So a a speaker, an expert of these languages would have been able to point out, hang on, you shouldn't be using this because we know that there are some errors that may not have yet been corrected, et cetera. We absolutely know that these errors exist and in order for them not to happen again, these massive platforms need to take some responsibility and put some money into things if they actually want to create proper ethical tools. So what are some of the risks when it comes to using AI to preserve endangered languages? Some of the most potent risks are data security and data rights, whose data is being used, has consent been given, can can data be protected if new information is created with a tool that employs AI. Are you are you actually building something that a speaker community has genuinely told you they want? Have you assessed need? Have you actually looked at cost benefit and taken it really seriously? If you haven't, then you could be making a tool that further disenfranchises people who if they have an endangered language in their community, they've already probably been disenfranchised in some other way. So you are going to exacerbate that, and my personal biggest worry about AI use, like with a lot of technological use, is that it inherits the inequality of the analog world. And obviously in a utopian world, we would only use technology to try and eradicate that inequality. And I'm not saying that there aren't tools that can't do that, I really hope there are. Uh but we've got to be so careful. Do you feel hopeful about the future of language preservation or do you think we're already too late? You can have all the will of a grassroots community and activism in the world, but if you don't have institutional support and power, then a lot of these efforts, um I'm not never going to say that they're futile because they're always going to preserve much more than would be preserved otherwise. But all of this work can't happen on on goodwill and and volunteering alone. There has to be institutional support in many countries that will come as recognition in constitutions that these languages exist and they mean something and are worth something. We need a mix of both. You can't have institutional support really without grassroots community interest, but equally you can't have um lots of very able volunteers trying their best and assume that that a language will continue to be spoken and protected and defended without institutions stepping in. Sophia, thank you so much. Thank you. Now, we know lots of you listen to our podcast or watch us to try to improve your English. So if you've got ideas for stories that you would like to be hearing about, things that you think we should be talking about, please do get in touch. You can leave us a comment below. We're on Instagram @BBCWhatintheWorld and we're on WhatsApp. And if you haven't already, you can watch and listen to our episode on how English is taking over the Internet. And I want to read out some comments that you guys left us on that. So this is from Juan, who wrote, no doubt, English is a bridge to connect countries, cultures, etcetera, and open opportunities in different fields. But I really enjoy learning other languages because they help me to understand human cultures more deeply. And Pete said, we need to protect all of our languages. Language diversity is as important to human society as biodiversity. I'm a native English speaker and I've learned three other languages, mind your languages, don't allow English to downgrade your culture. So keep those comments coming, but for now, I'm Hannah, this is what in the world from the BBC World Service, and we'll see you next time.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript