Interview with Oreen Yousuf: Researching African Scripts from a Distance

SEI Team, 2025

Even as a relative newcomer to the world of script encoding, Oreen Yousuf has already made a prolific impact with his research on African scripts. He first stepped on to the scene with an exhaustive report on the status of over fifty African scripts, submitted to the Unicode Consortium in 2023.

His work has considerably advanced our understanding of the landscape of African scripts, which is ripe with new inventions and novel approaches to script education.

In this interview, Oreen discusses the practical challenges of collecting data from non-traditional media and doing fieldwork at a distance for this high-activity region.


  1. Tell us a little about yourself and the work you do for Unicode and SEI.

My name is Oreen Yousuf. I’m a PhD student in natural language processing at Uppsala University in Sweden, but I’m originally from the U.S. I’m currently conducting research on African scripts for the Script Encoding Initiative, focusing especially on West Africa. In recent months, though, there’s been a lot of activity in the east and southeast of Africa that’s drawn my attention.

I started working with SEI in December 2022. I remember reaching out to Debbie Anderson about halfway through my master’s. I had ended up in a Wikipedia rabbit hole, reading about various languages. That’s when I stumbled upon an article on a script from Malawi in Southeast Africa. I saw one picture at the top of the article and thought, “Oh, I’ve never seen this before.”

I wanted to play around with it, so I searched online for a keyboard. I have a Windows PC, but there wasn’t an option to add that script. I thought, “How do I access this script I’m seeing?” I couldn’t find anything online. That led me down a rabbit hole of figuring out how customized keyboards work and what an unencoded script is. From there, it was nonstop—I stayed up until 5 a.m. researching how scripts, keyboards, fonts, and encoding all connect.

Headshot of Oreen Yosuf
  1. Can you dive into the work you’re doing for the Script Encoding Initiative?

My focus is what I’d call online fieldwork—this is where my patience and skills are best applied. When I first started, two big scripts really stuck with me, which I tackled first. One was the Mwangwego script from Malawi. There’s actually quite a bit of material available online—you can even find videos of the creator giving short lectures on how the script works in English, which is helpful since I don’t speak any Malawian languages.

Nolence Mwangwego teaching a classroom the Mwangwego script at a chalkboard
Nolence Mwangwego teaching a classroom the Mwangwego script (Source: Mwangwego Script Proposal)

The second one I worked on was a script called Masaba from Mali in West Africa, which Unicode recently published on the Document Registry. This one was particularly challenging because there was only one primary source on the script in Western literature. Later, I found out that there were Malian sources in the Malian Academy of Languages, but outside of Mali, the only reference was a French article from 1987 by a French academic named Gérard Gautier.

Initially, I couldn’t find anything beyond that article. In fact, the Wikipedia page for the Bambara language had just one line about the Masaba script, mentioning this single article and noting it was unknown whether the script was still in use or had died out. That ambiguity drove me to dig deeper to determine if the script was still alive.

It was incredibly difficult. I don’t speak French, which is still widely used in Mali, nor do I speak any Malian languages. So I tried searching in French and Bambara using Google, but still, nothing came up. After months of searching, I was about to give up when I found a lead through Masakhane, a grassroots organization focused on natural language processing for African languages.

During a project meeting with Masakhane, one of the members mentioned he was from Mali and spoke English. That caught my attention immediately. I reached out after the meeting, and it turned out he was the Secretary-General of the Malian Academy of Languages. We’ve been working closely ever since, which has been crucial because he’s multilingual and deeply connected to local communities.

When I first mentioned Masaba to him, he hadn’t heard of it. But then, purely by chance, his neighbor happened to be from the region where the Masaba script originated. They started chatting one day, and it turned out his neighbor knew about the script. That was a breakthrough! It was so serendipitous. From there, we managed to connect with the community that still uses Masaba.

He was also instrumental in introducing the concept of Unicode to the community. Even among engineers and tech-savvy individuals, explaining what Unicode is can be challenging—it’s something we often take for granted. But when you’re introducing it to people who have never heard of it, you need a lot of patience.

A handdrawn map of Mali denoting each region and neighboring countries in the Masaba script
Map of Mali denoting each region and neighboring countries in the Masaba script (Source: Masaba Script Proposal)
  1. What are some of the challenges you face doing online fieldwork?

With African scripts, persistence is essential. You have to reach out to people repeatedly, even if you get no response initially. Before finding my key contact in Mali, I reached out to numerous academics at universities across the country. Sometimes, the email links were outdated, or I’d never hear back. In countries with less reliable infrastructure, communication gaps due to power outages are common. For instance, my contacts in Malawi often face weeks-long interruptions, which slows the process significantly.

Even if you have a straightforward question, it can take months to get an answer due to infrastructure challenges, language barriers, and the simple fact that people have other priorities. You just have to keep trying, even when it feels discouraging.

To find actual sources, it’s a mixed bag. Sometimes it’s through email, but more often, it’s through WhatsApp. If I contact one person who doesn’t know what I’m talking about but refers me to someone else, it’s usually through WhatsApp because it’s easiest for them—and now it’s become easiest for me as well. Facebook is also still heavily used.

A Masaba script group sending screenshots over Whatsapp of text
A Masaba script group sending screenshots over Whatsapp of text typed in the Keyman keyboard
  1. What keeps you motivated to work on these scripts?

My parents are originally from Ethiopia, and growing up, I always heard stories about language policies there. It’s very complicated, but at one point, linguistic rights were extremely limited. Everything—school, business, government, and international trade—was conducted in one dominant language. My parents knew that language, but it wasn’t their native tongue. So, I grew up hearing about how they couldn’t go to school in their language and had to learn another language for everything.

That’s just language—even when the alphabet is the same. But with scripts, it’s different because scripts add another layer of othering. It’s culturally sensitive for these communities, especially in regions like West Africa, which have seen numerous scripts emerge over the past 200 years.

Map of Invented African Scripts
Invented African scripts (Source: Kelly 2011)

Take the Bamum script from Cameroon, for example. One unique aspect is that it’s written left to right—not because that’s rare, but because the king who invented it in the late 1800s wanted to differentiate his kingdom from a neighboring empire that wrote right to left. It was a conscious decision to distinguish his culture from the dominant one.

Traditional Bamum text
Traditional Bamum text from Schmitt 1963 (Source: Bamum Extension Proposal)

Today, Latin, Arabic, and Cyrillic scripts are extremely dominant. Minority languages, even if they had their own scripts in the past, often adopt these dominant ones. So if a community invents a script, you can see why it’s so important to them. Of course, that’s not the only reason to include a script in Unicode. You still need to show evidence of its use in manuscripts, pamphlets, or other documents to meet the requirements.

Finishing work someone started in 1966 is a great honor for me. You wouldn’t believe how happy people are when you tell them, “I want to put your letters on computers.”

  1. What would you like observers to understand about encoding African scripts?

Right now, I’m tracking maybe two dozen different communities at various levels of script maturity, in terms of how wide these scripts have spread and been accepted in their community. I see very few similarities between script communities that are at the same level in terms of how they got there. 

For example, Adlam from West Africa is extremely successful. It was created in the 20th century. In contrast, the Masaba script is from 1930—almost a hundred years ago. But it has never spread beyond a cluster of villages in Western Mali. It’s an extremely rural part of the country. 

Screenshot of Noto Sans Adlam
Noto Sans Adlam, a free font available on Google Fonts

All the people that use Adlam—they’re in 27 countries. Fula—the language Adlam transcribes, has 40 million speakers. Conversely, the Masaba script is in three to four villages of a few hundred people, total. You could boil it down to that and call one more successful than the other. 

Finishing work someone started in 1966 is a great honor for me. You wouldn’t believe how happy people are when you tell them, “I want to put your letters on computers.”

But at the same time, there are a lot of factors you have to reconsider about the types of use you might see. For Adlam, people are spread across multiple continents, sending mail, at the turn of the 21st century, doing business ventures and traveling across state lines. Whereas for Masaba, the villages are twenty minutes apart from each other. Does that make their script less valid? My point is that the success stories of a particular script should not dictate how one looks at another script story. 


Learn more