Tulu Munisi

A transliteration-based keyboard enabling digital use of the Tulu language.

Chaitanya Vats

Work

Play

About

Work

Play

About

Problem Statement

Despite being Unicode-encoded, Tulu lacks phonetic digital input tools, making everyday typing inconsistent, exclusionary, and unreliable for its speakers.

Despite being Unicode-encoded, Tulu lacks phonetic digital input tools, making everyday typing inconsistent, exclusionary, and unreliable for its speakers.

Despite being Unicode-encoded, Tulu lacks phonetic digital input tools, making everyday typing inconsistent, exclusionary, and unreliable for its speakers.

Duration

Duration

4 Months

4 Months

Role

Role

Solo Designer

Solo Designer

Interaction Design

UX

Linguistic

HCI

Design Outline

The project seeks to address these gaps by building a fully custom web-based transliteration keyboard system

(shaped by script-specific logic, dialectal variation, and user needs) this system includes:

(shaped by script-specific logic, dialectal variation, and user needs) this system includes:

A transliteration engine capable of phonetic-to-Unicode mapping,

A transliteration engine capable of phonetic-to-Unicode mapping,

A shaping-aware input pipeline congruent with Indic orthographic rules.

A shaping-aware input pipeline congruent with Indic orthographic rules.

A multi-dialect predictive text engine capable of learning variants rather than standardising them away.

A multi-dialect predictive text engine capable of learning variants rather than standardising them away.

A Unicode-compliant font that renders the script reliably across browsers.

A Unicode-compliant font that renders the script reliably across browsers.

Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.

Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.

Project TImeline

Project Timeline

Contextual Inquiry

Research

Ideation

Design

Design

Prototyping

Prototyping

Development

Feedback

Feedback

Loom Walkthrough

Context

Tulu Nadu

What

Tulu is a Dravidian language spoken primarily in coastal Karnataka and northern Kerala, with a rich oral tradition and a deeply rooted cultural identity.
  • Historically, the language was written using the Tigalari script, a South Indian Brahmic script used across the region for centuries, especially in manuscripts related to religion, literature, and administration.

  • Over time, the dominance of Kannada and Malayalam scripts in education and print gradually displaced Tigalari from everyday use.

  • While widely used in everyday life, performance, and ritual, Tulu has long faced challenges in written representation.

Who

Tulu is spoken by approximately two to three million people, primarily within communities native to the coastal regions of Karnataka and northern Kerala. The language is used across diverse social and cultural groups, including Tuluva Hindus, Jains, Christians, and Muslims.

Within these communities, the language varies in vocabulary, pronunciation, and influence from neighboring languages, resulting in multiple dialectal forms shaped by history, occupation, caste, and regional interaction.

Where

Tulu is predominantly spoken in the coastal districts of Dakshina Kannada and Udupi in Karnataka, and in the Kasaragod region of northern Kerala—an area collectively known as Tulu Nadu.

Within this region, the language exists in several dialectal forms influenced by geography and community identity, including Brahmin Tulu, Jain Tulu, Harijan Tulu, and variants shaped by contact with languages such as Kannada, Malayalam, and Beary. These variations reflect the linguistic richness of Tulu but also pose challenges for standardized digital input systems.

Today, Tulu exists in an unusual linguistic position: vibrant in speech, yet fragmented in writing. The revival of Tulu-Tigalari has gained momentum in recent years, driven by scholars, practitioners, and communities seeking to reconnect with their literary heritage.

However, digital support for the script remains limited. Many characters are newly encoded in Unicode, fonts are inconsistent, and input methods are still emerging.

Tulu Tigalari

Script of the Tulu Language

Tigalari is a Southern Brahmic script which was used to write Tulu, Kannada, and Sanskrit languages. It was primarily used for writing Vedic texts in Sanskrit. It evolved from the Grantha script.

Today, majority of tulu speakers are not literate in the Tigalari script. Subsequently, Kannada, Malayalam and English are used to write the language digitally.

Research

Primary and Secondary

Stakeholder Interviews

Six voices on a script's digital future

Conversations with engineers, journalists, poets, and practitioners surfaced a linguistic landscape shaped by deep expertise, generational tension, and a shared anxiety about Tulu's survival in digital spaces.

01/06

Pioneering Software Engineer

Skeptic

Challenged the very foundations — arguing Unicode is structurally inadequate for Indic scripts, requiring what he described as "eleven dimensions" of complexity. Dismissed English-based keyboards, dialect-sensitive models, and younger speakers' language competency in equal measure.

"The only major Tulu corpus is publicly searchable but its source code stays closed — trapped by licensing and credit disputes."

02/06

Tulu Software Engineer

Critical

Confirmed the closed nature of existing lexical resources and gave concrete technical guidance: grammar sources, data-scraping constraints, and the hard limits of current Roman–Tulu fonts. Emphasised that meaningful progress requires a fully Unicode-native Tigalari font and open, expandable language data.

"Open data isn't just a preference — it's a prerequisite for any tool that claims to serve the community."

03/06

Tulu Filmmaker

Advocate

Having used Tulu fonts in a feature film, brought a practitioner's view of real transliteration workflows. Mapped current pain points and confirmed a broader cultural shift: Roman-script Tulu has become the de facto register for digital and informal expression.

Cultural production is already happening in Roman Tulu. The tool needs to meet speakers where they are.

04/06

Senior Tulu Engineer

Pragmatist

Advised against prioritising a Tigalari keyboard given that most speakers today cannot read the script. Yet acknowledged a real and growing counter-trend: younger users increasingly want to learn Tigalari, and Roman script dominates online Tulu communication.

The script question is not resolved — it's generational. Design must hold space for both realities.

05/06

Tulu Poet

Cautious

Grounded the conversation in lived use: online transliteration and translation tools regularly distort meaning, especially for non-standard dialects. Meaning is fragile, and tools that flatten dialect variation cause real harm to the language's expressive range.

Reinforced the need for transparent, dialect-aware models over confident but opaque automation.

06/06

Retired Tulu Journalist

Advocate

Offered the sharpest contrast — enthusiastic about digital efforts and confident that new tools can serve the language well. Validated key decisions around vowel representation and welcomed attempts to encode Tulu sounds normatively and accurately across scripts.

A reminder that not all experts resist change; some have been waiting for this work.

User Persona

Designing for the fluent but unscripted speaker

She speaks Tulu every day but has never typed it. Aishwarya represents the majority: a generation navigating language without the tools to express it digitally.

Aishwarya

Primary Persona

Age

22

Location

Mangalore, Karnataka

Occupation

College Student

Language Fluency

Tulu

English

Kannada

"She already speaks the language. She just needs a tool that speaks it back — in the scripts she grew up with, and the ones she's still discovering."

"She already speaks the language. She just needs a tool that speaks it back — in the scripts she grew up with, and the ones she's still discovering."

Background

Aishwarya grew up speaking Tulu at home with her family and friends but was educated primarily in English and Kannada. While she is fluent in spoken Tulu, she has never formally learned to read or write the traditional Tulu script. Like many in her generation, she communicates digitally through messaging apps and social media, often switching between English, Kannada, and Tulu.

Goals

  • Preserve the language in everyday digital life

  • Communicate naturally in Tulu through messaging and social media

  • Type quickly without learning a new layout or script

Pain Points

  • No reliable keyboard for typing Tulu directly

  • Inconsistent transliteration in Latin characters

  • Keyboard switching breaks conversation flow

  • Predictive text has no knowledge of Tulu words

Observed Behaviours

  • Types Tulu in Latin transliteration

  • Active on messaging apps and social media

  • Uses improvised or shortened spellings

  • Switches between English & Kannada keyboards

  • Code-switches mid-sentence

Core Need

  • No reliable keyboard for typing Tulu directly

  • Fast, predictable, and consistent output

  • Near-zero learning curve — feels like any other keyboard

Secondary Research

Theoretical Framework

The project is grounded in critical design lenses to move beyond simple translation toward true inclusion:

Postcolonial Computing

Dismantling power structures that render certain languages "backend" and others "frontend".

Shadow Infrastructures

Recognizing the informal workarounds, such as hacked fonts and closed-source dictionaries, that Tulu speakers use to survive digitally.

Design Justice

Ensuring marginalized communities control and benefit from the design process through participatory methods.

The Digital Language Divide

Digital infrastructures such as keyboards and operating systems often enforce a standard of English supremacy. For Tulu speakers, this results in a layered accessibility problem:

Dialectal Exclusion

Generic predictive text models fail to recognize the linguistic diversity of the community, such as Brahmin, Jain, and Beary-influenced forms of Tulu.

Fragmented Writing

While vibrant in speech, Tulu has been displaced in writing by Kannada and Malayalam scripts, leaving the traditional Tigalari script with limited digital support.

Infrastructural Bricolage

Tulu youth often force the Roman (English) keyboard to speak Tulu, leading to lost phonemes and a frustrating user experience.

Infrastructural Friction

Users must negotiate with autocorrect systems that view native Tulu words as "errors"

Dialectal Exclusion

Generic predictive text models fail to recognize the linguistic diversity of the community, such as Brahmin, Jain, and Beary-influenced forms of Tulu.

Infrastructural Bricolage

Tulu youth often force the Roman (English) keyboard to speak Tulu, leading to lost phonemes and a frustrating user experience.

Fragmented Writing

While vibrant in speech, Tulu has been displaced in writing by Kannada and Malayalam scripts, leaving the traditional Tigalari script with limited digital support.

Infrastructural Friction

Users must negotiate with autocorrect systems that view native Tulu words as "errors"

Revised Definitions

Designing a

Transliteration System

Detailed Problem Statement

Although the script has recently been encoded in the Unicode Standard, the surrounding digital ecosystem remains incomplete: mainstream keyboards, input methods, predictive text engines, and fonts do not yet support the script in a usable, everyday format.

As a result, Tulu speakers continue to rely on Latin transliteration, Kannada, or improvised hybrid typing systems that vary widely across dialects and social contexts.

This creates a layered accessibility problem: users cannot type Tulu consistently across devices;

The script renders unpredictably due to incomplete shaping support. Some sounds and glyphs get excluded too.

Generic predictive text models fail to recognise the linguistic diversity of the community, including Brahmin Tulu, Jain Tulu, Harijan Tulu, Beary-influenced forms, and rural vs. coastal variations. These failures disproportionately affect those whose dialects fall outside the informal “standard,” reinforcing inequalities within an already minoritised linguistic landscape.

Design Outline

The project seeks to address these gaps by building a fully custom web-based keyboard system, shaped by script-specific logic, dialectal variation, and user needs. This system includes:

A transliteration engine capable of phonetic-to-Unicode mapping,

A shaping-aware input pipeline congruent with Indic orthographic rules.

A multi-dialect predictive text engine capable of learning variants rather than standardising them away.

A Unicode-compliant font that renders the script reliably across browsers.

Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.

The primary output is a web-based keyboard system designed to restore agency to Tulu speakers by providing a consistent, self-determined way to write their language digitally.

The Design Phase

Phonetic Inventory

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

A study into the unique phonological structure of Tulu is essential in developing the Phonetic Inventory, so as to not exclude the unique sounds of the language

A study into the unique phonological structure of Tulu is essential in developing the Phonetic Inventory, so as to not exclude the unique sounds of the language

The phonological study of a language involves looking at data (phonetic transcriptions of the speech of native speakers) and trying to deduce what the underlying phonemes are and what the sound inventory of the language is.

Starting out, it was useful to see the transliteration schemes practiced by the dominant language of the region, Kannada. In doing so, we were able to identify cases where transliterating text caused ambiguous translations, i.e. instances where multiple sounds were mapped on the same characters, excluding unique dialectal sounds, resulting in misrepresentation.

The Design Phase

Keyboard Layout

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Key Characteristics of a Good Input Method

Control

The user should be in full control. The input method should not "dictate" or guess what the user wants to type, a criticism aimed at probabilistic keyboards.

Privacy

The tool must be private and not "spy on you." The article states that being free and open-source is a prerequisite for ensuring this

Availability & Ownership

A good IME should work offline, be available on all your devices (PC, phone), and not be a proprietary tool that can be withdrawn by its vendor (like Google's abandoned desktop IMEs)

Correctness

It must generate correct and standard Unicode. It should not, for example, substitute a visually similar character like the number '0' for the Malayalam anuswaram 'ം'.

Well-maintained

The software should have active maintainers, a public bug tracker, and regular updates to fix issues and adapt to new operating systems.

Easy to Learn

While ease of learning is important, the author stresses that users should expect to put in a reasonable effort to master a lifelong skill, just as they did when learning to write by hand.

Documentation

It must have clear, up-to-date documentation for users.

Key Layout

Key Layout

Key Size

Apple’s own design guidelines, which recommend that all clickable interface elements be of least 6.85 × 6.85 millimeters because anything below that would yield very poor click accuracy. (Microsoft and Nokia also recommend a minimum hit area of approximately 7 × 7 millimeters). Predictably, this results in misspellings.

Fitt's Law

T = a + b · log₂(1 + D/W)

Tulu has several sounds absent from English , including retroflex consonants (ṭ, ḍ, ṇ) and long vowel markers — which Latin-script typists approximate using digraphs or diacritics. If these sequences land in the far or mid zones, every Tulu message incurs a disproportionate finger-travel cost.
Fitts' Law frames this as a design constraint: the most frequent Tulu phonemes must map to the home and near zones, even if that departs from standard QWERTY conventions for English users.

Using Genetic algorithms and multi-objective Pareto optimization will help arrive at the mathematically optimal keayboard layout.

Key Findings (Post Testing)

Phonetic Clusters

Designing keyboards for scripts with many glyphs benefits from phonetic and phonology-aware layouts (map by place of articulation / phoneme clusters) rather than just visual/orthographic clustering — this helps learnability for novices.

Real Estate Trade Off

For soft keyboards, direct mapping (one tap → target glyph) is generally faster than approaches that require extra keystrokes (dead key) to produce diacritics, but direct mapping demands more screen real estate or layer switching. Bi et al. found K5-Direct faster than dead-key approaches for many languages.

Latency Vs Usability

Long-press (press & hold) is commonly used to expose diacritics on mobile soft keyboards and is a practical compromise — but discoverability and delay tuning matter a lot; novice users often find press-and-hold less intuitive. Empirical/UX evidence and community reports show long-press latency and discoverability strongly impact subjective flow.

IME + Predictive Text

For Indic languages, transliteration + predictive candidate bars (IMEs that convert Roman input to script) are powerful: they reduce keystrokes by letting the IME resolve phoneme sequences into glyphs and can hide complexity from users. Surveys and literature on machine transliteration / IMEs show this is a high-value approach for many users.

In a Nutshell:

  • Long-press is a usable tool but is not a silver bullet, it helps on small screens, but it can break typing flow if used as the only method or if the long-press delay is poorly tuned.
  • Predictive transliteration and a small diacritic layer are strong complementary strategies.
  • Phonetic Clustering of glyphs is a unique interaction benefit that Indic languages enjoy.
  • The trade off between keys on the keyboard and real estate limits need to be balanced.
  1. Diacritical Shift Layer

A diacritical shift layer assigns an additional “shift state” specifically for characters with diacritics. Instead of crowding the base keyboard with multiple versions of a letter, users press a dedicated modifier (like Shift, Alt, or a custom key) to access vowel signs, aspiration marks, nasalization, or script-specific diacritics.

This approach keeps the primary layer clean and familiar while still providing fast access to necessary variations. It works particularly well for Indic scripts where consonants or vowels have multiple forms but remain structurally related.

  1. Long Press Modal

Long-press interactions borrow from mobile keyboard UX. Holding a key reveals a small pop-up menu of related characters: variants, diacritics, or conjuncts. The user slides or taps to choose the desired symbol.

This approach feels natural for touch interfaces and minimizes visual clutter. It’s particularly helpful when characters share a base shape or phonetic root, for example, all forms of “ka,” or all vowel signs derived from the same core vowel. It makes the layout compact but still expressive.

  1. Dead Keys

Dead keys are keys that don’t produce a character immediately when pressed but instead modify the next character typed. For example, pressing a “diacritic dead key” followed by a base consonant can generate a combined glyph.

This is common in European keyboards (e.g., accent keys) and useful for scripts requiring combining marks, viramas, or other modifiers. Dead keys reduce the total number of keys needed and also allow for flexible combinations—not just predefined glyphs—making them ideal for densely shaped scripts like Tulu-Tigalari.

  1. Complete Phonetic Mapping

A complete phonetic mapping aligns each Tulu-Tigalari sound directly with a Latin letter or combination in a predictable way. Users type the sound they speak, and the system converts it into the correct script form.

This design supports intuitive typing for beginners, removes the need to memorize visual glyph positions, and adapts well to predictive text engines. It also helps unify dialectal variations by mapping multiple phonetic inputs to the same script output when appropriate. Complete phonetic mapping is especially powerful when paired with transliteration logic and Unicode shaping rules.

  1. Diacritical Shift Layer

A diacritical shift layer assigns an additional “shift state” specifically for characters with diacritics. Instead of crowding the base keyboard with multiple versions of a letter, users press a dedicated modifier (like Shift, Alt, or a custom key) to access vowel signs, aspiration marks, nasalization, or script-specific diacritics.

This approach keeps the primary layer clean and familiar while still providing fast access to necessary variations. It works particularly well for Indic scripts where consonants or vowels have multiple forms but remain structurally related.

  1. Dead Keys

Dead keys are keys that don’t produce a character immediately when pressed but instead modify the next character typed. For example, pressing a “diacritic dead key” followed by a base consonant can generate a combined glyph.

This is common in European keyboards (e.g., accent keys) and useful for scripts requiring combining marks, viramas, or other modifiers. Dead keys reduce the total number of keys needed and also allow for flexible combinations—not just predefined glyphs—making them ideal for densely shaped scripts like Tulu-Tigalari.

  1. Long Press Modal

Long-press interactions borrow from mobile keyboard UX. Holding a key reveals a small pop-up menu of related characters: variants, diacritics, or conjuncts. The user slides or taps to choose the desired symbol.

This approach feels natural for touch interfaces and minimizes visual clutter. It’s particularly helpful when characters share a base shape or phonetic root, for example, all forms of “ka,” or all vowel signs derived from the same core vowel. It makes the layout compact but still expressive.

  1. Complete Phonetic Mapping

A complete phonetic mapping aligns each Tulu-Tigalari sound directly with a Latin letter or combination in a predictable way. Users type the sound they speak, and the system converts it into the correct script form.

This design supports intuitive typing for beginners, removes the need to memorize visual glyph positions, and adapts well to predictive text engines. It also helps unify dialectal variations by mapping multiple phonetic inputs to the same script output when appropriate. Complete phonetic mapping is especially powerful when paired with transliteration logic and Unicode shaping rules.

The Design Phase

Transliteration Schemes

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Transliteration schemes dictate how glyphs of one language are represented in another script.

IPA (International Phonetic Alphabet) and IAST (Indian Alphabet of Sanskrit Transliteration) are two of the most relevant systems used to depict indian languages in the latin script.

However, before creating a transliteration scheme, it was important to gain an understanding of existing transliteration systems.

For this, I looked at Unicode proposals published by Vaishnavi Murthy and the transliteration scheme developed for the Tulu Lexical project, which spanned over several decades and comprised of 6 volumes.

The Design Phase

Dialect Sensitive

Predictive Text Engine

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

The initial Text engine made using an internet scraped corpora contained code mixed language of several dialects of Tulu. This is not a helpful approach, as the resulting suggestions were not meaningful to any speaker.

Need for a custom engine

A custom predictive text engine becomes essential for Tulu because generic language models cannot account for the multi-dialectal variation, inconsistent orthographies, and shifting transliteration practices that shape everyday usage.

Unlike standardized languages with fixed spellings, Tulu exhibits wide variation across coastal, hinterland, Brahmin, and border dialects: each with its own pronunciation patterns, suffixes, and preferred word forms. A generic model trained on limited or mixed data tends to “correct” these natural variations into dominant patterns, effectively flattening dialects and erasing linguistic diversity.

A custom engine allows you to define the linguistic rules yourself: mapping multiple dialectal inputs to their correct Tulu-Tigalari forms, supporting alternate spellings, preserving community-specific vocabulary, and integrating ritual, agricultural, and colloquial registers that mainstream corpora never capture. It can be trained on curated datasets rather than noisy or biased corpora, ensuring that prediction reflects authentic speech patterns rather than assumptions drawn from other languages.

Most importantly, a custom predictive text engine enables cultural and linguistic sovereignty, ensuring that the technology adapts to Tulu, not the other way around. It supports a living, diverse language rather than enforcing a rigid standard imposed by external systems.

Heavy Trie Data Structure

This entry illustrates how the heavy trie data structure accommodates the linguistic complexity of Tulu by storing far more than just surface-level spellings. Instead of a simple word list, each node holds multiple layers of information: variant forms, dialect tags, phonetic representations, and regional metadata.

In this example, the variant “antɛ” is linked to its underlying lemma “aandɛ”, marking the Brahmin dialect and a general regional usage, while also capturing a Latin phonetic form “antɛ”.

Overall, this enriched entry design ensures that prediction remains dialect-aware, tolerant of spelling variation, and grounded in authentic spoken Tulu, rather than forcing the language into rigid standardization.

By structuring the data this way, the trie directly addresses issues that generic predictive text models struggle with:

Dialect diversity: Different communities pronounce or spell the same word differently. Storing dialect and region prevents the engine from “normalizing” everything into one dominant form.

Variant lemma mapping: Users may type any variant, but the engine can still map it to a canonical lemma for prediction or display, preserving both flexibility and consistency.

Phonetic alignment: The latin_phonetic value allows the system to bridge user input with the appropriate Tulu-Tigalari output, crucial for transliteration-based typing.

Sparse frequency data: Even when frequency is unknown, the trie can still prioritize words through structural cues, variant clusters, and context.

Overall, this enriched entry design ensures that prediction remains dialect-aware, tolerant of spelling variation, and grounded in authentic spoken Tulu, rather than forcing the language into rigid standardization.

The Design Phase

Unicode Optimised

Tigalari Font

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

What Unicode Actually Solves

Before Unicode, scripts around the world existed as ad-hoc encodings: 8-bit code pages, local hacks, proprietary glyph maps, font-specific logic. The same binary value could represent entirely different characters depending on the font. A document created in one system often appeared as nonsense on another. For minority languages, this instability made digital writing unreliable and unsafe for preservation.

Unicode shifted the paradigm from “fonts define meaning” to “the script has a single global definition.” Each character receives a permanent code point, its identity is independent of any font. Fonts become visual interpretations rather than encodings themselves.

For languages like Tulu–Tigalari, Unicode establishes:

  • A global, cross-platform guarantee that the script will display correctly

  • A public definition of every letter, sign, conjunct, and diacritic

  • A future-proof infrastructure that will outlive software trends

This stability is crucial for languages that are rebuilding their digital presence after centuries of exclusion.

The Gap Between Unicode Encoding and Real Usage

Once a script is encoded, the next challenge is: what font will render it?

Unicode only defines code points; it does not supply actual glyphs. If browsers, operating systems, and apps lack a font for a newly encoded script, users see:

Blank (Tofu) Boxes

Misaligned Vowel Signs

Fallback Latin Characters

Broken Conjuncts

Blank (Tofu) Boxes

Misaligned Vowel Signs

Fallback Latin Characters

Broken Conjuncts

This is the situation currently faced by Tulu–Tigalari. The script is encoded in Unicode 16.0, but no OS ships a system font yet, and extremely few third-party fonts fully support shaping behaviour.

Thus, you technically can encode text, but you cannot reliably read it. This is where the idea of a Unicode Standard Font becomes essential.

But what is a Unicode Standard Font?

A Unicode Standard Font is a font designed to fully implement:
  • Every encoded character in the script

  • The shaping rules required for proper display

  • Stylistic fidelity to the historical script

  • Complete coverage of conjuncts, ligatures, and contextual forms-

  • Consistent baseline, metrics, and readability across digital environments

Practically, this means the font must work in:
  • Browsers

  • Mobile keyboards

  • Word processors

  • PDF generators

  • Social media interfaces

  • Operating systems

A script without such a font is technically “supported by Unicode,” but functionally inaccessible.

Why Unicode Fonts Are Complex for Indic Scripts

Indic scripts are not simple left-to-right glyph mappings. They require:

Shaping Engines

  • Reordering vowel signs

  • Suppressing inherent vowels

  • Forming half-forms

  • Attaching diacritics

  • Creating conjunct ligatures

Conjunct Explosion

Scripts like Tulu–Tigalari have dozens of consonant clusters.

A good Unicode font must handle:

  • Full conjunct ligatures

  • Stacked consonant forms

  • Vertical/horizontal ligature rules

  • Post-base and pre-base forms

Contextual Substitutions

Scripts like Tulu–Tigalari have dozens of consonant clusters.

A good Unicode font must handle:

  • Placement rules

  • Script-specific posture and calligraphic logic

  • Surrounding characters

  • Vowel signs

Prototyping

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

The first version of the keyboard prototype contains diacritics of consonants mapped on new keys on the default qwerty layout. To not disturb the muscle memory of users, the accented characters are positioned beside the unaccented variants. 

For the accented vowels, an algorithm maps the presence of double vowels and replaces such instances with accented vowels. This mimics the natural understanding of the speakers, and acts as a soft educational tool for teaching pronunciation of such characters

Change of Software

A move away from Keyman Developer toward a fully programmed web keyboard becomes necessary when the project demands flexibility, control, and long-term sustainability.

Keyman is excellent for rapid prototyping, but it operates as a closed environment with its own rules for compiling layouts, shaping behavior, and distributing keyboards. As the Tulu-Tigalari system grows more complex—with custom Unicode shaping, phonetic mapping, predictive text, dynamic transliteration, and multiple interaction modes—Keyman’s constraint-driven model becomes limiting.
A web-native keyboard, built in JavaScript, allows complete control over input logic, conjunct formation, UI states, layout switching, and integration with machine-learning prediction engines. It also ensures that the keyboard works across platforms without installation barriers, making it accessible to users with varying devices and operating systems. Most importantly, a programmed web keyboard becomes an open, extensible system—one that can evolve organically with community needs, script updates, and new research in digital linguistics.

Usability Test

Testing scenario

User performed a natural typing task: writing a WhatsApp conversation describing how to cook rice.
Hardware: iPhone.

Goal

Observe linguistic accuracy, typing flow, prediction utility, and overall typing experience.

Summary of Findings

1. Character Input Conflicts

The double-tap system for generating retroflex and accented consonants (e.g., ṭ, ḍ, ṇ) caused conflicts with genuine double letters.

Users were unable to type unaccented double consonants such as tt or nn, as the system automatically converted them to accented forms.

The design thus limited orthographic flexibility in contexts where Tulu phonology or emphatic spellings require gemination. This also poses problems in case of translanguaging, as english includes double consonants.

2. Non-functional accented keys

Certain diacritic keys — particularly ḷ and ṇ — failed to output on mobile devices, even though they rendered correctly when typed through the Apple Latin keyboard and the desktop prototype.

This indicates an implementation inconsistency between the Keyman touch layout and context rules: the touch layer overrides rule-based transformations, preventing output substitution.

3. Absence of gesture or glide typing

Keyman’s architecture does not support gesture input, producing a break in habitual motor patterns and increased cognitive load for letter-by-letter typing.

Participants instinctively attempted glide typing (swipe input) for speed, revealing an expectation shaped by mainstream keyboard behaviors.

User Behavior & Observations

Participants defaulted to English spellings (e.g., anna maadu rice) whenever the target Tulu diacritic was inaccessible.

They valued visible representation of native phonemes but resisted systems that break familiar typing habits.

Design Implications

Replace the double-tap rule with a long-press or multitap mechanism to allow coexistence of gemination and diacritic entry.

Adding unique keys to the keyboard without compromising usability is an option, but needs to be judiciously curated.

Migrate to an open-source Android base (e.g., AnySoftKeyboard or FlorisBoard) to integrate glide typing and full control of predictive logic.

Ensure touch layout explicitly outputs precomposed Unicode characters rather than relying on context rules.

Final Prototype

Tulu Transliteration Keyboard

Thanks for Scrolling

This is the end of this project

More Projects

(I'm always up to something ;))

Contact

chaitanyavatscv@gmail.com

+91 9522203221

Create a free website with Framer, the website builder loved by startups, designers and agencies.