A Newcomer’s Impressions on the World of Script Encoding
Before I joined the Script Encoding Initiative, my understanding of Unicode and script encoding was limited to its basic role in digital devices and the personal struggles I had encountered typing bidirectional text in my Arabic classes. I was not expecting how involved the field was — its breadth spanning geopolitical factors, complex histories, industry politics, and above all, interactions between people.
I joined the team as SEI’s new program manager in April 2025 after obtaining a B.A. in Language Science at the University of California, Irvine. Though my background in linguistics gave me a general understanding of the language-related technical terminology being used day-to-day, the field of script encoding is niche and multidisciplinary enough that I found myself approaching it with fresh eyes. Familiar concepts (e.g. “grammar”) were being used in new ways (e.g. “script grammar”), and I was suddenly interacting with terms (e.g. “character”, “glyph”) at a level of detail and complexity that I previously did not think possible.
Right away, I was intrigued by the intermediary position that SEI seems to occupy in the world of script standards. We not only act as a liaison between the Unicode Consortium and script communities, but also corral experts with vastly different technical skills to collaborate on projects — bridging creative and technical fields from type design to text processing.
After nearly a year with SEI, I find that I am still learning the ins and outs of this field. This post presents a snapshot of the observations I’ve noticed thus far.
First Impressions
Shortly after being hired, I attended my first Script Encoding Working Group (SEW) meeting on May 9th, 2025. These monthly meetings represent the first arena where new proposals are deliberated upon by Unicode standards-makers. The SEW meetings can last around 6 hours, and involve 10-15 experts over Zoom discussing each proposal to determine what kind of feedback needs to go back to the authors, or if the script or character can be recommended to the Unicode Technical Committee for future encoding.
This was simultaneously my first exposure to the technical deliberation process of the working group as well as an eventful day for SEI, with several SEI-related items on the agenda:
- A review of the Rejang code block [L2/25-162]
- An introduction to the Minim Dag Noore script [L2/25-136]
- A memo about our newly-posted Script Readiness Rubric
The Rejang script, a variant of the indigenous Surat Ulu script from the Sumatra island of Indonesia, was already encoded in the Unicode Standard. To encode the rest of the script variants (which can be thought of as something like “dialects” of a writing system), SEI had been working with a handful of experts: Anshuman Pandey, Febri Muhammad Nasrullah, and Ariq Syauqi. During the course of this broader work, Ariq, one of our research fellows, had come across several inconsistencies in the existing Rejang code block, which he was presenting to SEW for the first time during this meeting.
The issues ranged from the misnaming of the block (named “Rejang” instead of the more-inclusive “Surat Ulu”), to potential conflations between characters, to an error in the representative glyph. The last point was quickly accepted, while the others warranted further discussion. From this dialogue, I noted the parameters of the kind of changes that fall under the working group’s jurisdiction, and that some details Ariq had highlighted would be handled by technologies further down the encoding pipeline (like fonts).
In the next hour, we switched regions to West Africa. Oreen Yousuf, one of SEI’s current contributors, presented his first proposal for the Minim Dag Noore Script. Invented in only 2006, this complex script is used for the Mooré language across Burkina Faso, Ivory Coast, Mali, and Senegal. This was a difficult script to present for the first time — with so many structural details to explain (the script’s bidirectionality, extensive character repertoire, joining behavior, and more), it took some time for SEW to get to the first question that often matters for encoding new scripts: is the script ready to be solidified in the Unicode Standard?
To wrap up the day, Anushah Hossain presented SEI’s Script Readiness Rubric to the working group, which is used in several features of SEI’s newly-launched website. Since SEI’s resources are meant to complement SEW’s own guidance, some participants recommended dedicating a full group discussion to the rubric. The memo walked through SEI’s Scripts to Encode page, an interactive list of scripts that remain to be encoded in the Unicode Standard, which can be sorted by encoding readiness according to this new rubric. The memo also described our Tips for Proposals page, a set of general criteria compiled by SEI that can be used as guiding principles when determining whether a script is viable for Unicode encoding. We received useful wording updates to the page, and were happy to see that these new materials were broadly welcomed by the group.
Unspoken Conventions
In the months since attending that first SEW meeting, I’ve noticed a few patterns amongst the group that may impact how successful a review session goes. I’ve often seen members of the working group double down on rules — both implicit and explicit — about Unicode encoding or the proposal process that they view as common knowledge, but in reality are not well known outside of Unicode’s internal committees nor clearly communicated to proposal authors.
For example, SEW has sometimes stated a preference for a preliminary proposal — a shorter, abridged version of a full-sized script proposal — to introduce a new script to the working group before they decide to proceed with the details of its encoding. Because SEW meetings have a tight schedule, there is a lot of ground to cover. Preliminary proposals offer a quicker overview of a newly-proposed script that can be more easily discussed in the 20-30 minute time slot often allocated to each agenda item (though agenda items can frequently run overtime).
However, the only mention of this preference for a preliminary proposal in the SEW guidelines is that “experience has shown that it is often helpful to discuss preliminary proposals before submitting a detailed proposal.” Because of this lack of clarity, I have at times witnessed harsh critiques for failing to follow convention.
In a similar vein, because standards-makers are faced with complex technical details to sort out, I noticed that SEW tries to streamline the encoding process by evaluating how to make newly-proposed scripts fit into a set of familiar encoding models that have already been used for previous scripts. This is due to there being a limited number of shaping engines1 that help render scripts after encoding, so SEW takes this future step into consideration during discussions. However, this process is not well-known to the public. So, when much of SEW’s decision-making gets caught up in comparing a proposed script to existing encoded examples, proposal authors are sometimes left in limbo because the discussion is not something they can easily chime in on.
Though it is reasonable to streamline the encoding processes based on conventions that are naturally established over time, SEW’s tendency not to clearly articulate these requirements in an accessible forum results in a large gap between the expectations of the standards-makers and the ability for proposal authors to meet them.2
Where SEI Steps In
Since Unicode refrains from stating hard and fast rules for the script review process, this is where SEI has an important role: to draw from our past experience and provide guidance about what the most effective proposal strategy might be. Generally, we engage in the following ways:
- Reviewing, reformatting, and revising proposals before they are even submitted to SEW
People who are “good at writing proposals” tend to be received better by SEW since the proposals more closely fit the format that the working group has intuitively become comfortable with and accepted as the de facto norm. SEI will edit, review, and provide feedback to proposal authors who are unfamiliar with SEW’s norms or who need a higher level of copy editing before submission, restructuring the information in a way that meets the precedents and expectations of the group.
- Note-taking and translating action items to proposal authors
This has become one of my main responsibilities during SEW meetings. Taking notes first started as a way for myself to make sense of the discussion — jot it down now, comprehend it later — but I have found that, like me, our script collaborators often struggle to follow all the details of the conversation. SEW meetings are designed for the working group members to discuss and debate with each other. Since the details can be rapid-fire and get lost in the chaos, SEI’s presence is critical for recording all of this information in a digestible way for the proposal author to reference later. SEW provides meeting minutes as well, but they are often full of acronyms and technical jargon that would benefit from elaboration.
Thus, SEI acts as something like a translator between the standards-makers and the proposal authors, particularly newcomers or those who don’t speak English as a first language. While esoteric details are a usual experience for a non-specialist entering a specialist field, this shows that SEI is needed to distill Unicode’s abstract language so that authors know what to expect and have a resource to whom they can ask questions.
- Serving as a connection to SEW for our proposal authors
SEW has previously stated that SEI’s involvement (or that of any other organization) in a script project carries no official “stamp of approval” to sway the group’s decision-making. However, our proximity to the working group and the level of vetting we invest into script proposals inevitably positions us to advocate more strongly for the proposals we’ve become familiar with, which can result in a more sympathetic response, even if reviewers may not wholly agree with a proposal.
Proposal work is long and arduous, so our institutional support gives minority script communities an “in” into the world of script encoding that they may have found difficult to navigate — or even know about — on their own. In many cases, we’ve heard that SEI was the primary accessible connection to Unicode for script communities.
Future Steps
Given these challenges, I’ve been thinking of new resources we could put together to smooth the process further. It would be nice for SEI to lay out a timeline or series of steps so that proposal authors generally know what to expect from the encoding process (pre-SEW, during SEW, and post-SEW). Once a proposal passes the first objective, what typically happens next? Though SEI often answers these kinds of questions in direct communications, putting together a publicly-accessible resource might provide prospective proposal authors and even user communities with a better expectation of how long encoding will take and what to expect next. If the process is laid out clearly, it can make encoding feel less like a monumental task because it would be more evident which “step” of the encoding process it is in, even if that process can take years and years.3

On Unicode’s side, future steps remain relatively uncertain. Discussions continue to circulate within both SEW and the Unicode Technical Committee about refining encoding criteria or changing the review process, but until a specific idea is proposed in writing, there will be no concrete changes.
In the meantime, Unicode’s script proposal guidelines could be updated or expanded upon bit by bit. Recently, SEW created a submission form that clarifies and improves the components needed for a script proposal — a notable step in making these processes more transparent. However, the connection to the bigger picture of the overall encoding process and incurred expectations is still relatively unclear.
SEI navigates an interesting and tricky space between standards-makers, contributors, implementers, and users. Whether this entails decoding subtle communication nuances, digesting technical details, or clarifying how complex processes fit into the bigger picture, we want to make the script encoding process more transparent and intuitive for script communities. In observing the different steps, it feels even more clear to me why SEI is needed in the world of script encoding.
- Shaping engines are responsible for correctly displaying Unicode text using glyphs from the font file. There are several main shaping engines (which may vary according to each operating system), creating many moving targets for encoding models to try and meet. ↩︎
- Part of the hesitation rightly arises from the desire to handle each submission on a case by case basis, because there are often one-off factors that need to be considered. ↩︎
- We are currently working with UC Berkeley PhD student Julian Vargo on estimating how long it takes a script to move through each of these encoding processes. ↩︎