The Making of a Unicode Release: A Peek Inside Unicode 17.0

Anushah Hossain, 2025

Last Tuesday marked the release of the latest edition of the Unicode Standard, 17.0. It included four new scripts—Beria Erfe, Sidetic, Tai Yo, and Tolong Siki—and a total of nearly five thousand new characters.1 Since its first release in 1991 with just 23 scripts, the Unicode Standard has steadily grown to cover 172 scripts and almost 160,000 characters and symbols—on average about six new scripts per release.2

While these figures sound exciting, it can be hard to understand what exactly this all means—why it’s impressive, and what changes for everyday users. This post offers a look behind the scenes, with case studies of two scripts from Unicode 17.0 that SEI helped bring into the standard.


What’s in a Unicode Release?

Each release bundles together newly approved scripts and characters, code charts showing their representative shapes, and data files that specify properties about a script (like whether it is a letter, a number, a combining mark, etc.). Releases also include updated annexes and technical reports that explain other important details about text processing. 

Each new version of the Unicode Standard is managed by the Unicode Technical Committee (UTC), including decisions on what new scripts to encode. New scripts begin as proposals, usually prepared by scholars, type designers, community representatives, or projects like SEI. These proposals are first reviewed by UTC’s Script Encoding Working Group (SEW). SEW evaluates the evidence, repertoire, and technical details before recommending proposals to the UTC. UTC provisionally assigns code points and passes new additions onto other working groups that help produce the standard. 

All the details undergo months of alpha and beta review in which outside parties provide feedback. These drafts are often simultaneously considered by the ISO/IEC 10646 committee and working group, where feedback gets solicited from national body representatives and experts from around the world. This lengthy process, often moving in fits and starts, is intentionally slow to try and gather information before details are set in stone in the published Unicode and ISO standards. 

people seated around a table in a classroom, looking at laptops and a projector screen
Deliberations during the ISO/IEC SC2/WG2 gathering in Prague in 2024

Does a published Unicode encoding mean that all 172 scripts are instantly available on your computer or phone? Not exactly. Unicode only defines the characters—it doesn’t compel vendors to provide “full stack language enablement.”

Most companies update their systems to keep pace with each new Unicode release, which means the raw code points will be recognized, but the effort may stop there. If a user community isn’t especially visible or profitable, or the script is particularly difficult to implement, companies may hesitate to make the upfront investment in fonts, keyboards, or proper rendering support. Those steps are what truly bring a script to life for its community, and they often happen only after sustained public pressure.3


Script Journeys in Unicode 17.0: Sidetic and Tolong Siki

SEI was involved in helping multiple scripts along in Unicode 17.0. We partially funded and encouraged work on Beria Erfe, a modern script used in Chad and Sudan, whose proposal was ultimately co-authored by several community representatives and scholars. SEI’s Technical Director, Anshuman Pandey, wrote the proposals for Sidetic and Tolong Siki, with lots of feedback from scholars and community members. The Tai Yo proposal, a historic script used in Vietnam and Laos, was authored independently by researchers Viet Khoi Nguyen, Cong Danh Sam, and Frank van de Kasteelen. 

All of these projects were years in the making. Take the stories of Tolong Siki and Sidetic, for example.

a tablet with many lines of sidetic text
Athenodoros memorial (S9). Courtesy of Zinko and Zinko, cited in L2/21-111.
two coins with figures and a line of text
Sidetic letters on a coin, struck c. 360–330 BCE. Courtesy of Lars Rutten, cited in L2/22-235.

Sidetic is a historic script used for an extinct Indo-European Anatolian language, dating from the 5th-2nd centuries BCE. As the last of the Anatolian alphabets to be encoded, alongside Carian, Lycian, and Lydian, its inclusion completes a long-standing puzzle in documenting the writing systems of the region. The script consists of 29 letters, many influenced by Greek, but has not been fully deciphered, so the phonetic values of some characters remain uncertain.

For years, scholars thought there wasn’t enough evidence to encode Sidetic. It was even listed on Unicode’s “Not the Roadmap” page under “Things rumored to be scripts” reflecting a widespread sense that there simply wasn’t enough evidence. That changed as scholarship advanced, especially with the catalog of characters prepared by Johannes Nollé, which provided a stable reference system using numbered names (“N01,” “N02,” etc.).4

Building on this, Anshuman prepared proposals starting in 2019, with multiple revisions through 2023.5 The script was provisionally assigned code points by the Unicode Technical Committee in January 2023, but further debate followed. At an ISO meeting in June 2024, one member pushed for phonetic names for characters where possible. Scholars responded, confirming that Nollé’s numbering system was the accepted scholarly practice, and so those names were retained (though name annotations, which can change, were added).

tables of sidetic characters, showing 29 letters. character names on the right, listed only as numbers (e.g. n01)
Sidetic code chart from final proposal by Anshuman Pandey (L2/23-019)
tables of sidetic characters highlighted in yellow, showing 26 letters. on the right, characters with numbered names (e.g. N01) and annotations underneath (e.g. "vowel a").
Sidetic code chart published in Unicode 17.0 with name annotations and revised character list

Even late in the process, changes occurred. In April 2025, we learned that some scholars disagreed on the naming of some characters in the Unicode code chart. These were characters that went beyond Nollé’s catalog. Instead of cementing in values that had uneven acceptance, a solution was found to leave out the three contentious characters until later.

These episodes illustrate how much iteration can happen before a script makes it into the standard: Sidetic was formally accepted for Unicode 17.0 in November 2024, but the details were still being hammered out right up to release. Historic scripts such as these are valuable to encode so that scholars can publish materials with them. But, they can be difficult to describe appropriately, often having to rely on just a few sources. It is a matter of providing the right amount of information that allows encoding, without over-defining values that could change with another discovery.

As an aside, the lack of sources presents challenges not only for encoding but also for typography. Sidetic has been part of the Missing Scripts program at the Atelier National de Recherche Typographique in Nancy, France, where type design students create some of the very first fonts for scripts on their way into Unicode. One of those students, Fangzheng Li, recently traveled to Turkey to photograph Sidetic inscriptions in situ, producing higher-quality images and even a 3D model of the tablets. Fangzheng shared, “Moving from grainy archival scans to the tangible reality of the inscriptions themselves was a crucial step, allowing me to translate not just the shapes of the letters, but the logic and rhythm of the hands that carved them.” Such work makes impressive strides towards producing typographic reconstructions of historic scripts with integrity.

close up of sidetic inscription showing strokes on blue-grey rock
Raw image of stone inscription (courtesy Fangzheng Li)
Many points and lines in purple modeling a stone inscription from the side
Point cloud generation of stone inscription (courtesy Fangzheng Li)
snapshots of computer reproduction of stone inscription
Polygon mesh of stone inscription (courtesy Fangzheng Li)

If Sidetic represents the challenge of encoding a fragmentary ancient script, Tolong Siki shows the other side of the spectrum: a modern script invention (often called a “neography”) for a living language. Created in 1988 by Dr. Narayan Oraon in India’s Jharkand state, Tolong Siki was designed to give the Kurukh language its own distinct script.

The script first entered Unicode discussions in 2010, when Anshuman submitted an initial proposal. Over the next decade, he kept in close contact with the script’s creator and user community, monitoring how it developed and spread. In his follow-up proposal in 2022, he reflected:

I have monitored the usage and development of Tolong Siki over the past twelve years. The script’s creator, Narayan Oraon, has provided updates through the years. I have been contacted regarding the status of Tolong Siki in Unicode by numerous members of the user community, among whom Ashwin Kumar Kispotta has provided me with substantial document [sic] of the script.

A recurring challenge for encoding neographies is knowing whether they represent a passing experiment or durable system that a community will adopt. Neographies often undergo significant design changes in their early years, which can be problematic given the permanence of Unicode encoding. In the case of Tolong Siki, Anshuman’s long record of observation helped demonstrate that the script had achieved stability, acceptance, publication, institutional support, and official recognition, all of which were detailed in his updated proposal. 

Excerpt from follow-up proposal for Tolong Siki (L2/23-024) justifying encoding

The evidence was convincing. Tolong Siki was provisionally assigned code points at the January 2023 UTC meeting and formally accepted for Unicode 17.0 in November 2024. During that span, ISO representatives from India expressed hesitation about the script in June 2024, wanting to consult further with state authorities, but by the following year they were comfortable proceeding with its encoding. 

Proving the readiness of a neography for encoding is still never straightforward. Proposals are judged on a case by case basis, often with incomplete information. Unicode reviewers tend to be cautious, wary of whether an invention will endure and careful not to inflame inter-group tensions by privileging one script over another.

The case of Kurukh Banna, another script for the same language invented around the same time, illustrates this difficulty. Kurukh Banna is reportedly in widespread use in Odisha, another Indian state, but lacks the same level of official recognition. From afar, it is difficult to assess how Kurukh Banna and Tolong Siki fit into the broader ‘script marketplace,’ where they also compete alongside established scripts such as Devanagari, Odia, and Bangla. The contrast between the two suggests how difficult it is to draw clear lines: scripts with seemingly similar profiles may meet different outcomes in the review process.

For SEI, these kinds of cases call for further research, something we plan to do in our upcoming projects on the politics of neographies.

cover pages from children's books with Tolong Siki text
Excerpt from Tolong Siki proposal (L2/23-024)
cover pages from children's books with Kurukh Banna text
Excerpt from Kurukh Banna proposal (L2/24-101)

For now, we hope these walk-throughs illustrate some of the effort and discretion that goes into preparing new scripts for Unicode inclusion. Unicode 17.0 may look like a set of numbers and code charts, but behind it are years of negotiation, deliberation, and collaboration. Each script’s journey shows how encoding is as much about people and politics as it is about technology.

  1. 4,847 to be exact, 4,298 of which extend the CJK unified ideographs blocks ↩︎
  2. See the scripts added in each release on Unicode’s “Supported Scripts” page. ↩︎
  3. Read about the challenges of getting full stack support in The Atlantic‘s story about Adlam and Rest of World‘s story about Nastaʿlīq, for example. ↩︎
  4. Nollé, Johannes. 2001. Side im Altertum: Geschichte und Zeugnisse. Band II, Griechische
    und lateinische Inschriften (5-16)– Papyri-Inschriften in sidetischer Schrift und Sprache.
    Inschriften griechischer Städte aus Kleinasien, 44. Bonn: Rudolf Habelt Verlag. ↩︎
  5.  The initial Sidetic document was followed up by a preliminary proposal (L2/21-111) in 2021, a revised proposal in 2022 (L2/22-235), and the final proposal in 2023 (L2/23-019). ↩︎