The Script Encoding Initiative
The Script Encoding Initiative (SEI) is a research project housed in the Department of Linguistics at the University of California, Berkeley. Founded in 2002 by Dr. Deborah Anderson, SEI was created to support the inclusion of scripts in the Unicode Standard, the global system that offers guidelines for the consistent representation of text on digital devices.
At the time of SEI’s founding, many scripts used for historical, religious, and contemporary writing were not yet supported in Unicode. This meant that they could not be reliably typed, displayed, or shared in digital formats. SEI was established to help fill that gap.
SEI works closely with linguists, community representatives, software engineers, and international standards bodies to prepare the detailed proposals required to add scripts and characters to Unicode. To date, SEI has contributed to the successful encoding of over 120 scripts, including Egyptian Hieroglyphs, N’Ko, Unified Canadian Aboriginal Syllabics, Tangut, and Hanifi Rohingya. Its work spans historic and modern eras, as well as a wide range of geographic regions.
Although SEI collaborates with the Unicode Consortium and the ISO/IEC 10646 standards process, it operates independently. Its primary role is to support the technical, linguistic, and historical research needed to prepare formal proposals, often in close partnership with communities whose scripts are not yet digitally supported.
Over time, SEI’s work has expanded beyond proposal development to encompass the broader landscape of script digitization. The initiative now collaborates on font and keyboard design, produces research reports on the status of scripts, and curates teaching materials for the classroom. SEI also maintains a blog and shares scholarship, through publications and presentations, on the historical and political dimensions of Unicode and the digital encoding of the world’s writing systems.
Over 150 scripts remain outside the Unicode Standard, with more being created or revived today. SEI plays a key role in evaluating which scripts are ready for inclusion, while also examining the broader infrastructures—technical, social, and political—that shape how writing systems take form in the digital world.
Frequently Asked Questions
A single script can serve multiple languages—for example, the Latin script is used by English, German, and Vietnamese—while some languages, like Hindi-Urdu, may use multiple scripts, such as Devanagari, Arabic, or Latin.
Unicode encodes scripts, not languages. Translating between script and language involves a range of algorithms and contextual data, much of which is handled by other technologies overseen by the Unicode Consortium, such as the Common Locale Data Repository (CLDR) and International Components for Unicode (ICU).
Scholars and users vary in their understanding of the terms “script” and “writing system” (For example, see Miton & Moran (2021) and glossaries by Google Fonts and Microsoft). Writing system sometimes refers to a set of symbols, while a script refers to the specific set associated with a spoken language. Given the diverging understandings of the terms, SEI tends to use the terms broadly and interchangeably.
Unicode and ISO/IEC 10646 are closely aligned standards that define how text is represented digitally. The Unicode Standard is maintained by the Unicode Consortium, a nonprofit organization based in Mountain View, CA, while ISO/IEC 10646 is an international standard maintained by the treaty-based International Organization for Standardization (ISO). Both share the same character repertoire and code points.
SEI works independently but engages with both organizations by preparing proposals that are reviewed by the Unicode Technical Committee (UTC) and ISO working groups involved in character encoding.
SEI is entirely funded by external grants and private donations. These funds primarily go towards supporting staff salaries, contracts for research projects and proposals, and membership dues to standards bodies. We are extremely grateful for the organizations and individuals that have helped sustain this project for over twenty years!
SEI has two dedicated staff, and a wide network of volunteers and collaborators working part-time on specific projects.
SEI publishes updates in quarterly liaison reports presented to the Unicode Technical Committee and routinely on the blog. The Scripts to Encode page also lists active SEI projects.
We’re always happy to hear from people interested in contributing to existing projects or learn of your ongoing work. If you have relevant expertise or information about an active project, please reach out through our contact form. We also welcome expressions of interest in inactive or future projects. These can help us plan ahead. Because most of our work is grant-funded, we typically identify which proposals we’ll pursue 2-4 years in advance, but knowing who is interested can be helpful if opportunities or flexibility arise.
Send us a note through the contact form!