A Refresh for SEI - Script Encoding Initiative

Welcome to the new website and blog of the Script Encoding Initiative! This project has been long in the works, aimed at making it easier to follow along with our work and learn about the complex worlds of scripts and digitization. We’ll do our best to post regular updates here on our current projects, share features of our collaborators, and host guest posts diving into tricky aspects of this field.

If you’re stumbling onto our page for the first time, the first thing you might want to know is:

What is the Script Encoding Initiative (SEI)?

SEI is a project that was started in the Linguistics department of the University of California, Berkeley in 2002.

Debbie ANderson and Rick McGowan with a macbook displaying global writing systems — Photo of Debbie Anderson and Rick McGowan for San Jose Mercury News, 2003

Over the past two decades, SEI has worked on getting historic and minority writing systems into the Unicode Standard by preparing rigorous script proposals that are trusted and easy to accept. These proposals, in turn, have helped expand digital inclusion and cultural preservation.

While Unicode is most popularly known for publishing emoji for us to use on our phones, it has a much larger mandate to support the vast breadth of the world’s writing systems. For text to be processed by digital devices in a consistent way (no tofu or mojibake or gobbledy gook), there need to be consistent back-end representations of characters – sequences of 0s and 1s that a computer can process, matching to the symbols that are meaningful to humans.

Since 1991, Unicode has been steadily assigning unique sequences of 0s and 1s to each unit of, aspirationally, every writing system in the world. It is impossible to overstate how labor-intensive and careful this work needs to be. Alphabetic systems are fairly simple, but even consider a special cursive flourish of a Latin letter – how can someone unfamiliar with the script determine whether that special flourish is a different way of drawing a known character or another character entirely? You might be able to imagine the great degree of work that goes into identifying the right character repertoire for a writing system, describing how the characters all join together (again, fairly simple for alphabetic systems, but what about complex scripts?), finding the right names for them, and more.

Unicode started with the biggest, most important (politically and commercially) scripts. That left a big vacuum for the hundreds of smaller scripts in the world. The very first version of the SEI website puts words to the situation well:

Because proposals for the encoding of minority and historical scripts often entail significant research, and their user communities have little economic or political voice, script proposals have not been submitted to the Unicode Technical Committee (UTC) in any regular manner.

It has been estimated that at the current slow pace of encoding, many scripts will still be unencoded in ten years. This means that effectively, many linguistic minorities and scholarly communities could be permanently left behind in the information age

More than ten years have passed since that message, and many scripts are still unencoded. But, SEI has made a significant impact in improving the situation. Since its founding in 2002, about 120 scripts were added by SEI research associates to the Unicode Standard. Today, that amounts to over two-thirds of the Standard!

Why does this matter? Well, fonts and keyboards and software applications that are built in accordance with the Unicode Standard gain access to a well-established global communications paradigm. Virtually all digital devices today use Unicode, and vendors prioritize updating their software every year to align with the latest Unicode updates. Text that is Unicode-compliant is stable, meaning you can rest assured that your digital and digitized documents will be saved for the future. Unicode inclusion is undeniably critical to a script’s long-term preservation and vitality.

SEI is completely unique in its role as an intermediary between script communities, spanning from scholars to everyday users. For a long time, it was the only academic member of the Unicode Consortium–the only entity that did not answer to a mandate to serve the bottom line with its software internationalization work. This position is valuable in allowing us to take on projects for smaller, but historic or culturally important, communities.

What’s new with the website?

Much like the Standard itself, the design of the SEI website has evolved quite a bit over its lifetime.

Wayback machine snapshot of SEI website from 2004

An early snapshot of the SEI website from October 2004

A slight update in October 2005

Wayback machine snapshot of SEI website from 2008

A new logo added in 2008

Wayback machine snapshot of SEI website from 2016

The first major re-design in 2015

Wayback machine snapshot of SEI website from 2024

The most recent iteration of the site, moved to Github from 2022-24

In previous iterations, though formatting had changed, the content remained largely the same, with minor updates as formerly unencoded scripts crossed the threshold into Unicode.

In this latest re-design, we had two major goals. Firstly, we wanted to make our existing content easier to find. Per our view, the most important page on our site is the Unencoded Scripts page. There are many other resources to learn about scripts, such as ScriptSource, Endangered Alphabets, Wikipedia, and Ethnologue. However, SEI’s primary value-add is our expert assessment of the script’s status with respect to Unicode encoding.

To those ends, we focused on highlighting and adding commentary to our own scripts list, synthesizing information on a script’s real-world status alongside its progress in Unicode to produce a simple rubric to glean a script’s “Unicode readiness.” We’ll share more about the process of developing this rubric in a future post. But for now, it’s worth noting that this information is readily accessible–searchable, sortable, and locatable on a map on our Scripts to Encode page. We’ve re-organized our site map to make this, and other key information, more readily available to visitors.

Screenshot of interactive map from Scripts to Encode page — Map preview of unencoded scripts

Secondly, we wanted to make room for new educational materials: press features we’d done, presentations and academic publications we’d authored, and of course blog posts we wanted to publish. Our designer, Juliette Flécheux, and web developer, Noémie Garnier, did a fantastic job finding ways to host this multimedia content (and everything else on the site!) in creative and accessible ways. We’re thrilled with the outcome, and hope that as you explore the site, you will be too!

What’s next?

There’s lots to look forward to in the coming months. Stay tuned for posts on our fieldwork fellowships, features of SEI research associates, explainers on Unicode proposal-writing and fonts, and more.

The best ways to keep up with our work are to subscribe directly to the blog via RSS, follow us on Instagram, or use the link at the bottom of the page to sign up for our monthly newsletter.

More soon, and thank you as always for following along!