
Tulu Munisi
A transliteration-based keyboard enabling digital use of the Tulu language.
Problem Statement
Interaction Design
UX
Linguistic
HCI
Design Outline
The project seeks to address these gaps by building a fully custom web-based transliteration keyboard system
Loom Walkthrough
Context
Tulu Nadu
What
Tulu is a Dravidian language spoken primarily in coastal Karnataka and northern Kerala, with a rich oral tradition and a deeply rooted cultural identity.
Historically, the language was written using the Tigalari script, a South Indian Brahmic script used across the region for centuries, especially in manuscripts related to religion, literature, and administration.
Over time, the dominance of Kannada and Malayalam scripts in education and print gradually displaced Tigalari from everyday use.
While widely used in everyday life, performance, and ritual, Tulu has long faced challenges in written representation.
Who
Tulu is spoken by approximately two to three million people, primarily within communities native to the coastal regions of Karnataka and northern Kerala. The language is used across diverse social and cultural groups, including Tuluva Hindus, Jains, Christians, and Muslims.
Within these communities, the language varies in vocabulary, pronunciation, and influence from neighboring languages, resulting in multiple dialectal forms shaped by history, occupation, caste, and regional interaction.
Where
Tulu is predominantly spoken in the coastal districts of Dakshina Kannada and Udupi in Karnataka, and in the Kasaragod region of northern Kerala—an area collectively known as Tulu Nadu.
Within this region, the language exists in several dialectal forms influenced by geography and community identity, including Brahmin Tulu, Jain Tulu, Harijan Tulu, and variants shaped by contact with languages such as Kannada, Malayalam, and Beary. These variations reflect the linguistic richness of Tulu but also pose challenges for standardized digital input systems.
Today, Tulu exists in an unusual linguistic position: vibrant in speech, yet fragmented in writing. The revival of Tulu-Tigalari has gained momentum in recent years, driven by scholars, practitioners, and communities seeking to reconnect with their literary heritage.
However, digital support for the script remains limited. Many characters are newly encoded in Unicode, fonts are inconsistent, and input methods are still emerging.

Tulu Tigalari
Script of the Tulu Language
Tigalari is a Southern Brahmic script which was used to write Tulu, Kannada, and Sanskrit languages. It was primarily used for writing Vedic texts in Sanskrit. It evolved from the Grantha script.
Today, majority of tulu speakers are not literate in the Tigalari script. Subsequently, Kannada, Malayalam and English are used to write the language digitally.
Research
Primary and Secondary
Stakeholder Interviews
Six voices on a script's digital future
Conversations with engineers, journalists, poets, and practitioners surfaced a linguistic landscape shaped by deep expertise, generational tension, and a shared anxiety about Tulu's survival in digital spaces.
01/06
Pioneering Software Engineer
Skeptic
Challenged the very foundations — arguing Unicode is structurally inadequate for Indic scripts, requiring what he described as "eleven dimensions" of complexity. Dismissed English-based keyboards, dialect-sensitive models, and younger speakers' language competency in equal measure.
"The only major Tulu corpus is publicly searchable but its source code stays closed — trapped by licensing and credit disputes."
02/06
Tulu Software Engineer
Critical
Confirmed the closed nature of existing lexical resources and gave concrete technical guidance: grammar sources, data-scraping constraints, and the hard limits of current Roman–Tulu fonts. Emphasised that meaningful progress requires a fully Unicode-native Tigalari font and open, expandable language data.
"Open data isn't just a preference — it's a prerequisite for any tool that claims to serve the community."
03/06
Tulu Filmmaker
Advocate
Having used Tulu fonts in a feature film, brought a practitioner's view of real transliteration workflows. Mapped current pain points and confirmed a broader cultural shift: Roman-script Tulu has become the de facto register for digital and informal expression.
Cultural production is already happening in Roman Tulu. The tool needs to meet speakers where they are.
04/06
Senior Tulu Engineer
Pragmatist
Advised against prioritising a Tigalari keyboard given that most speakers today cannot read the script. Yet acknowledged a real and growing counter-trend: younger users increasingly want to learn Tigalari, and Roman script dominates online Tulu communication.
The script question is not resolved — it's generational. Design must hold space for both realities.
05/06
Tulu Poet
Cautious
Grounded the conversation in lived use: online transliteration and translation tools regularly distort meaning, especially for non-standard dialects. Meaning is fragile, and tools that flatten dialect variation cause real harm to the language's expressive range.
Reinforced the need for transparent, dialect-aware models over confident but opaque automation.
06/06
Retired Tulu Journalist
Advocate
Offered the sharpest contrast — enthusiastic about digital efforts and confident that new tools can serve the language well. Validated key decisions around vowel representation and welcomed attempts to encode Tulu sounds normatively and accurately across scripts.
A reminder that not all experts resist change; some have been waiting for this work.
User Persona
Designing for the fluent but unscripted speaker
She speaks Tulu every day but has never typed it. Aishwarya represents the majority: a generation navigating language without the tools to express it digitally.

Aishwarya
Primary Persona
Age
22
Location
Mangalore, Karnataka
Occupation
College Student
Language Fluency
Tulu
English
Kannada
Background
Aishwarya grew up speaking Tulu at home with her family and friends but was educated primarily in English and Kannada. While she is fluent in spoken Tulu, she has never formally learned to read or write the traditional Tulu script. Like many in her generation, she communicates digitally through messaging apps and social media, often switching between English, Kannada, and Tulu.
Goals
Preserve the language in everyday digital life
Communicate naturally in Tulu through messaging and social media
Type quickly without learning a new layout or script
Pain Points
No reliable keyboard for typing Tulu directly
Inconsistent transliteration in Latin characters
Keyboard switching breaks conversation flow
Predictive text has no knowledge of Tulu words
Observed Behaviours
Types Tulu in Latin transliteration
Active on messaging apps and social media
Uses improvised or shortened spellings
Switches between English & Kannada keyboards
Code-switches mid-sentence
Core Need
No reliable keyboard for typing Tulu directly
Fast, predictable, and consistent output
Near-zero learning curve — feels like any other keyboard
Secondary Research
Theoretical Framework
The project is grounded in critical design lenses to move beyond simple translation toward true inclusion:
Postcolonial Computing
Dismantling power structures that render certain languages "backend" and others "frontend".
Shadow Infrastructures
Recognizing the informal workarounds, such as hacked fonts and closed-source dictionaries, that Tulu speakers use to survive digitally.
Design Justice
Ensuring marginalized communities control and benefit from the design process through participatory methods.
The Digital Language Divide
Digital infrastructures such as keyboards and operating systems often enforce a standard of English supremacy. For Tulu speakers, this results in a layered accessibility problem:
Revised Definitions
Designing a
Transliteration System
Detailed Problem Statement
Although the script has recently been encoded in the Unicode Standard, the surrounding digital ecosystem remains incomplete: mainstream keyboards, input methods, predictive text engines, and fonts do not yet support the script in a usable, everyday format.
As a result, Tulu speakers continue to rely on Latin transliteration, Kannada, or improvised hybrid typing systems that vary widely across dialects and social contexts.
This creates a layered accessibility problem: users cannot type Tulu consistently across devices;
The script renders unpredictably due to incomplete shaping support. Some sounds and glyphs get excluded too.
Generic predictive text models fail to recognise the linguistic diversity of the community, including Brahmin Tulu, Jain Tulu, Harijan Tulu, Beary-influenced forms, and rural vs. coastal variations. These failures disproportionately affect those whose dialects fall outside the informal “standard,” reinforcing inequalities within an already minoritised linguistic landscape.
Design Outline
The project seeks to address these gaps by building a fully custom web-based keyboard system, shaped by script-specific logic, dialectal variation, and user needs. This system includes:
A transliteration engine capable of phonetic-to-Unicode mapping,
A shaping-aware input pipeline congruent with Indic orthographic rules.
A multi-dialect predictive text engine capable of learning variants rather than standardising them away.
A Unicode-compliant font that renders the script reliably across browsers.
Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.
The primary output is a web-based keyboard system designed to restore agency to Tulu speakers by providing a consistent, self-determined way to write their language digitally.
The Design Phase
Phonetic Inventory
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.




The phonological study of a language involves looking at data (phonetic transcriptions of the speech of native speakers) and trying to deduce what the underlying phonemes are and what the sound inventory of the language is.
Starting out, it was useful to see the transliteration schemes practiced by the dominant language of the region, Kannada. In doing so, we were able to identify cases where transliterating text caused ambiguous translations, i.e. instances where multiple sounds were mapped on the same characters, excluding unique dialectal sounds, resulting in misrepresentation.
The Design Phase
Keyboard Layout
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.
Key Characteristics of a Good Input Method
Control
The user should be in full control. The input method should not "dictate" or guess what the user wants to type, a criticism aimed at probabilistic keyboards.
Privacy
The tool must be private and not "spy on you." The article states that being free and open-source is a prerequisite for ensuring this
Availability & Ownership
A good IME should work offline, be available on all your devices (PC, phone), and not be a proprietary tool that can be withdrawn by its vendor (like Google's abandoned desktop IMEs)
Correctness
It must generate correct and standard Unicode. It should not, for example, substitute a visually similar character like the number '0' for the Malayalam anuswaram 'ം'.
Well-maintained
The software should have active maintainers, a public bug tracker, and regular updates to fix issues and adapt to new operating systems.
Easy to Learn
While ease of learning is important, the author stresses that users should expect to put in a reasonable effort to master a lifelong skill, just as they did when learning to write by hand.
Documentation
It must have clear, up-to-date documentation for users.
Key Size
Apple’s own design guidelines, which recommend that all clickable interface elements be of least 6.85 × 6.85 millimeters because anything below that would yield very poor click accuracy. (Microsoft and Nokia also recommend a minimum hit area of approximately 7 × 7 millimeters). Predictably, this results in misspellings.
Fitt's Law
T = a + b · log₂(1 + D/W)
Tulu has several sounds absent from English , including retroflex consonants (ṭ, ḍ, ṇ) and long vowel markers — which Latin-script typists approximate using digraphs or diacritics. If these sequences land in the far or mid zones, every Tulu message incurs a disproportionate finger-travel cost.
Fitts' Law frames this as a design constraint: the most frequent Tulu phonemes must map to the home and near zones, even if that departs from standard QWERTY conventions for English users.
Using Genetic algorithms and multi-objective Pareto optimization will help arrive at the mathematically optimal keayboard layout.
Key Findings (Post Testing)
Phonetic Clusters
Designing keyboards for scripts with many glyphs benefits from phonetic and phonology-aware layouts (map by place of articulation / phoneme clusters) rather than just visual/orthographic clustering — this helps learnability for novices.
Real Estate Trade Off
For soft keyboards, direct mapping (one tap → target glyph) is generally faster than approaches that require extra keystrokes (dead key) to produce diacritics, but direct mapping demands more screen real estate or layer switching. Bi et al. found K5-Direct faster than dead-key approaches for many languages.
Latency Vs Usability
Long-press (press & hold) is commonly used to expose diacritics on mobile soft keyboards and is a practical compromise — but discoverability and delay tuning matter a lot; novice users often find press-and-hold less intuitive. Empirical/UX evidence and community reports show long-press latency and discoverability strongly impact subjective flow.
IME + Predictive Text
For Indic languages, transliteration + predictive candidate bars (IMEs that convert Roman input to script) are powerful: they reduce keystrokes by letting the IME resolve phoneme sequences into glyphs and can hide complexity from users. Surveys and literature on machine transliteration / IMEs show this is a high-value approach for many users.
In a Nutshell:
Long-press is a usable tool but is not a silver bullet, it helps on small screens, but it can break typing flow if used as the only method or if the long-press delay is poorly tuned.
Predictive transliteration and a small diacritic layer are strong complementary strategies.
Phonetic Clustering of glyphs is a unique interaction benefit that Indic languages enjoy.
The trade off between keys on the keyboard and real estate limits need to be balanced.
The Design Phase
Transliteration Schemes
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.
Transliteration schemes dictate how glyphs of one language are represented in another script.
IPA (International Phonetic Alphabet) and IAST (Indian Alphabet of Sanskrit Transliteration) are two of the most relevant systems used to depict indian languages in the latin script.
However, before creating a transliteration scheme, it was important to gain an understanding of existing transliteration systems.

For this, I looked at Unicode proposals published by Vaishnavi Murthy and the transliteration scheme developed for the Tulu Lexical project, which spanned over several decades and comprised of 6 volumes.
The Design Phase
Dialect Sensitive
Predictive Text Engine
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.
The initial Text engine made using an internet scraped corpora contained code mixed language of several dialects of Tulu. This is not a helpful approach, as the resulting suggestions were not meaningful to any speaker.
Need for a custom engine
A custom predictive text engine becomes essential for Tulu because generic language models cannot account for the multi-dialectal variation, inconsistent orthographies, and shifting transliteration practices that shape everyday usage.
Unlike standardized languages with fixed spellings, Tulu exhibits wide variation across coastal, hinterland, Brahmin, and border dialects: each with its own pronunciation patterns, suffixes, and preferred word forms. A generic model trained on limited or mixed data tends to “correct” these natural variations into dominant patterns, effectively flattening dialects and erasing linguistic diversity.
A custom engine allows you to define the linguistic rules yourself: mapping multiple dialectal inputs to their correct Tulu-Tigalari forms, supporting alternate spellings, preserving community-specific vocabulary, and integrating ritual, agricultural, and colloquial registers that mainstream corpora never capture. It can be trained on curated datasets rather than noisy or biased corpora, ensuring that prediction reflects authentic speech patterns rather than assumptions drawn from other languages.
Most importantly, a custom predictive text engine enables cultural and linguistic sovereignty, ensuring that the technology adapts to Tulu, not the other way around. It supports a living, diverse language rather than enforcing a rigid standard imposed by external systems.
Heavy Trie Data Structure
This entry illustrates how the heavy trie data structure accommodates the linguistic complexity of Tulu by storing far more than just surface-level spellings. Instead of a simple word list, each node holds multiple layers of information: variant forms, dialect tags, phonetic representations, and regional metadata.
In this example, the variant “antɛ” is linked to its underlying lemma “aandɛ”, marking the Brahmin dialect and a general regional usage, while also capturing a Latin phonetic form “antɛ”.
Overall, this enriched entry design ensures that prediction remains dialect-aware, tolerant of spelling variation, and grounded in authentic spoken Tulu, rather than forcing the language into rigid standardization.
By structuring the data this way, the trie directly addresses issues that generic predictive text models struggle with:
Dialect diversity: Different communities pronounce or spell the same word differently. Storing dialect and region prevents the engine from “normalizing” everything into one dominant form.
Variant lemma mapping: Users may type any variant, but the engine can still map it to a canonical lemma for prediction or display, preserving both flexibility and consistency.
Phonetic alignment: The latin_phonetic value allows the system to bridge user input with the appropriate Tulu-Tigalari output, crucial for transliteration-based typing.
Sparse frequency data: Even when frequency is unknown, the trie can still prioritize words through structural cues, variant clusters, and context.
Overall, this enriched entry design ensures that prediction remains dialect-aware, tolerant of spelling variation, and grounded in authentic spoken Tulu, rather than forcing the language into rigid standardization.
The Design Phase
Unicode Optimised
Tigalari Font
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.
What Unicode Actually Solves
Before Unicode, scripts around the world existed as ad-hoc encodings: 8-bit code pages, local hacks, proprietary glyph maps, font-specific logic. The same binary value could represent entirely different characters depending on the font. A document created in one system often appeared as nonsense on another. For minority languages, this instability made digital writing unreliable and unsafe for preservation.
Unicode shifted the paradigm from “fonts define meaning” to “the script has a single global definition.” Each character receives a permanent code point, its identity is independent of any font. Fonts become visual interpretations rather than encodings themselves.
For languages like Tulu–Tigalari, Unicode establishes:
A global, cross-platform guarantee that the script will display correctly
A public definition of every letter, sign, conjunct, and diacritic
A future-proof infrastructure that will outlive software trends
This stability is crucial for languages that are rebuilding their digital presence after centuries of exclusion.
The Gap Between Unicode Encoding and Real Usage
Once a script is encoded, the next challenge is: what font will render it?
Unicode only defines code points; it does not supply actual glyphs. If browsers, operating systems, and apps lack a font for a newly encoded script, users see:
This is the situation currently faced by Tulu–Tigalari. The script is encoded in Unicode 16.0, but no OS ships a system font yet, and extremely few third-party fonts fully support shaping behaviour.
Thus, you technically can encode text, but you cannot reliably read it. This is where the idea of a Unicode Standard Font becomes essential.
But what is a Unicode Standard Font?
A Unicode Standard Font is a font designed to fully implement:
Every encoded character in the script
The shaping rules required for proper display
Stylistic fidelity to the historical script
Complete coverage of conjuncts, ligatures, and contextual forms-
Consistent baseline, metrics, and readability across digital environments
Practically, this means the font must work in:
Browsers
Mobile keyboards
Word processors
PDF generators
Social media interfaces
Operating systems
A script without such a font is technically “supported by Unicode,” but functionally inaccessible.
Why Unicode Fonts Are Complex for Indic Scripts
Indic scripts are not simple left-to-right glyph mappings. They require:
Shaping Engines
Reordering vowel signs
Suppressing inherent vowels
Forming half-forms
Attaching diacritics
Creating conjunct ligatures
Conjunct Explosion
Scripts like Tulu–Tigalari have dozens of consonant clusters.
A good Unicode font must handle:
Full conjunct ligatures
Stacked consonant forms
Vertical/horizontal ligature rules
Post-base and pre-base forms
Contextual Substitutions
Scripts like Tulu–Tigalari have dozens of consonant clusters.
A good Unicode font must handle:
Placement rules
Script-specific posture and calligraphic logic
Surrounding characters
Vowel signs
Prototyping
Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.
The first version of the keyboard prototype contains diacritics of consonants mapped on new keys on the default qwerty layout. To not disturb the muscle memory of users, the accented characters are positioned beside the unaccented variants.
For the accented vowels, an algorithm maps the presence of double vowels and replaces such instances with accented vowels. This mimics the natural understanding of the speakers, and acts as a soft educational tool for teaching pronunciation of such characters
Change of Software
A move away from Keyman Developer toward a fully programmed web keyboard becomes necessary when the project demands flexibility, control, and long-term sustainability.
Keyman is excellent for rapid prototyping, but it operates as a closed environment with its own rules for compiling layouts, shaping behavior, and distributing keyboards. As the Tulu-Tigalari system grows more complex—with custom Unicode shaping, phonetic mapping, predictive text, dynamic transliteration, and multiple interaction modes—Keyman’s constraint-driven model becomes limiting.
A web-native keyboard, built in JavaScript, allows complete control over input logic, conjunct formation, UI states, layout switching, and integration with machine-learning prediction engines. It also ensures that the keyboard works across platforms without installation barriers, making it accessible to users with varying devices and operating systems. Most importantly, a programmed web keyboard becomes an open, extensible system—one that can evolve organically with community needs, script updates, and new research in digital linguistics.
Usability Test
Testing scenario
User performed a natural typing task: writing a WhatsApp conversation describing how to cook rice.
Hardware: iPhone.
Goal
Observe linguistic accuracy, typing flow, prediction utility, and overall typing experience.
Summary of Findings
1. Character Input Conflicts
The double-tap system for generating retroflex and accented consonants (e.g., ṭ, ḍ, ṇ) caused conflicts with genuine double letters.
Users were unable to type unaccented double consonants such as tt or nn, as the system automatically converted them to accented forms.
The design thus limited orthographic flexibility in contexts where Tulu phonology or emphatic spellings require gemination. This also poses problems in case of translanguaging, as english includes double consonants.
2. Non-functional accented keys
Certain diacritic keys — particularly ḷ and ṇ — failed to output on mobile devices, even though they rendered correctly when typed through the Apple Latin keyboard and the desktop prototype.
This indicates an implementation inconsistency between the Keyman touch layout and context rules: the touch layer overrides rule-based transformations, preventing output substitution.
3. Absence of gesture or glide typing
Keyman’s architecture does not support gesture input, producing a break in habitual motor patterns and increased cognitive load for letter-by-letter typing.
Participants instinctively attempted glide typing (swipe input) for speed, revealing an expectation shaped by mainstream keyboard behaviors.
User Behavior & Observations
Participants defaulted to English spellings (e.g., anna maadu rice) whenever the target Tulu diacritic was inaccessible.
They valued visible representation of native phonemes but resisted systems that break familiar typing habits.
Design Implications
Replace the double-tap rule with a long-press or multitap mechanism to allow coexistence of gemination and diacritic entry.
Adding unique keys to the keyboard without compromising usability is an option, but needs to be judiciously curated.
Migrate to an open-source Android base (e.g., AnySoftKeyboard or FlorisBoard) to integrate glide typing and full control of predictive logic.
Ensure touch layout explicitly outputs precomposed Unicode characters rather than relying on context rules.
Final Prototype























