Tulu Munisi

A transliteration-based keyboard enabling digital use of the Tulu language.

Docbook

Github Project

Chaitanya Vats

Work

Play

About

Problem Statement

Problem Statement

Despite being Unicode-encoded, Tulu lacks phonetic digital input tools, making everyday typing inconsistent, exclusionary, and unreliable for its speakers.

Duration

4 Months

Role

Solo Designer

Interaction Design

UX

Linguistic

HCI

Design Outline

The project seeks to address these gaps by building a fully custom web-based transliteration keyboard system

shaped by script-specific logic, dialectal variation, and user needs, this system includes:

A transliteration engine capable of phonetic-to-Unicode mapping,

A shaping-aware input pipeline congruent with Indic orthographic rules.

A multi-dialect predictive text engine capable of learning variants rather than standardising them away.

A Unicode-compliant font that renders the script reliably across browsers.

Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.

Project TImeline

Contextual Inquiry

Contextual Inquiry

Contextual Inquiry

Research

Research

Research

Ideation

Ideation

Design

Design

Design

Design

Prototyping

Prototyping

Prototyping

Prototyping

Prototyping

Prototyping

Development

Development

Development

Feedback

Feedback

Feedback

Feedback

Feedback

Feedback

Loom Walkthrough

Context

Tulu

What

Tulu is a Dravidian language spoken primarily in coastal Karnataka and northern Kerala, with a rich oral tradition and a deeply rooted cultural identity. While widely used in everyday life, performance, and ritual, Tulu has long faced challenges in written representation.

Historically, the language was written using the Tigalari script, a South Indian Brahmic script used across the region for centuries, especially in manuscripts related to religion, literature, and administration. Over time, the dominance of Kannada and Malayalam scripts in education and print gradually displaced Tigalari from everyday use.

Who

Tulu is spoken by approximately two to three million people, primarily within communities native to the coastal regions of Karnataka and northern Kerala. The language is used across diverse social and cultural groups, including Tuluva Hindus, Jains, Christians, and Muslims.

Within these communities, the language varies in vocabulary, pronunciation, and influence from neighboring languages, resulting in multiple dialectal forms shaped by history, occupation, caste, and regional interaction.

Where

Tulu is predominantly spoken in the coastal districts of Dakshina Kannada and Udupi in Karnataka, and in the Kasaragod region of northern Kerala—an area collectively known as Tulu Nadu.

Within this region, the language exists in several dialectal forms influenced by geography and community identity, including Brahmin Tulu, Jain Tulu, Harijan Tulu, and variants shaped by contact with languages such as Kannada, Malayalam, and Beary. These variations reflect the linguistic richness of Tulu but also pose challenges for standardized digital input systems.

Today, Tulu exists in an unusual linguistic position: vibrant in speech, yet fragmented in writing. The revival of Tulu-Tigalari has gained momentum in recent years, driven by scholars, practitioners, and communities seeking to reconnect with their literary heritage. However, digital support for the script remains limited. Many characters are newly encoded in Unicode, fonts are inconsistent, and input methods are still emerging.

Tulu Tigalari

Script of the Tulu Language

Tigalari is a Southern Brahmic script which was used to write Tulu, Kannada, and Sanskrit languages. It was primarily used for writing Vedic texts in Sanskrit. It evolved from the Grantha script.

Today, majority of tulu speakers are not literate in the Tigalari script. Subsequently, Kannada, Malayalam and English are used to write the language digitally.

Research

Primary and Secondary

Research

Stakeholder Interviews

Six voices on a script's digital future

Conversations with engineers, journalists, poets, and practitioners surfaced a linguistic landscape shaped by deep expertise, generational tension, and a shared anxiety about Tulu's survival in digital spaces.

01/06

Pioneering Software Engineer

Skeptic

Challenged the very foundations — arguing Unicode is structurally inadequate for Indic scripts, requiring what he described as "eleven dimensions" of complexity. Dismissed English-based keyboards, dialect-sensitive models, and younger speakers' language competency in equal measure.

"The only major Tulu corpus is publicly searchable but its source code stays closed — trapped by licensing and credit disputes."

01/06

Retired Tulu Journalist

Advocate

Offered the sharpest contrast — enthusiastic about digital efforts and confident that new tools can serve the language well. Validated key decisions around vowel representation and welcomed attempts to encode Tulu sounds normatively and accurately across scripts.

A reminder that not all experts resist change; some have been waiting for this work.

01/06

Tulu Poet

Cautious

Grounded the conversation in lived use: online transliteration and translation tools regularly distort meaning, especially for non-standard dialects. Meaning is fragile, and tools that flatten dialect variation cause real harm to the language's expressive range.

Reinforced the need for transparent, dialect-aware models over confident but opaque automation.

01/06

Senior Tulu Engineer

Pragmatist

Advised against prioritising a Tigalari keyboard given that most speakers today cannot read the script. Yet acknowledged a real and growing counter-trend: younger users increasingly want to learn Tigalari, and Roman script dominates online Tulu communication.

The script question is not resolved — it's generational. Design must hold space for both realities.

01/06

Tulu Filmmaker

Advocate

Having used Tulu fonts in a feature film, brought a practitioner's view of real transliteration workflows. Mapped current pain points and confirmed a broader cultural shift: Roman-script Tulu has become the de facto register for digital and informal expression.

Cultural production is already happening in Roman Tulu. The tool needs to meet speakers where they are.

01/06

Tulu Software Engineer

Critical

Confirmed the closed nature of existing lexical resources and gave concrete technical guidance: grammar sources, data-scraping constraints, and the hard limits of current Roman–Tulu fonts. Emphasised that meaningful progress requires a fully Unicode-native Tigalari font and open, expandable language data.

"Open data isn't just a preference — it's a prerequisite for any tool that claims to serve the community."

Persona Map

Designing for the fluent but unscripted speaker

She speaks Tulu every day but has never typed it. Aishwarya represents the majority: a generation navigating language without the tools to express it digitally.

Aishwarya

Primary Persona

Age

22

Occupation

College Student

Location

Mangalore, Karnataka

Language Fluency

Tulu

English

Kannada

Background

Aishwarya grew up speaking Tulu at home with her family and friends but was educated primarily in English and Kannada. While she is fluent in spoken Tulu, she has never formally learned to read or write the traditional Tulu script. Like many in her generation, she communicates digitally through messaging apps and social media, often switching between English, Kannada, and Tulu.

Goals

Preserve the language in everyday digital life

Communicate naturally in Tulu through messaging and social media

Type quickly without learning a new layout or script

Pain Points

No reliable keyboard for typing Tulu directly

Inconsistent transliteration in Latin characters

Keyboard switching breaks conversation flow

Predictive text has no knowledge of Tulu words

Core Need

No reliable keyboard for typing Tulu directly

Fast, predictable, and consistent output

Near-zero learning curve — feels like any other keyboard

Observed Behaviours

Types Tulu in Latin transliteration

Active on messaging apps and social media

Uses improvised or shortened spellings

Switches between English & Kannada keyboards

Code-switches mid-sentence

"She already speaks the language. She just needs a tool that speaks it back — in the scripts she grew up with, and the ones she's still discovering."

Secondary Research

Theoretical Framework

The project is grounded in critical design lenses to move beyond simple translation toward true inclusion:

Postcolonial Computing

Dismantling power structures that render certain languages "backend" and others "frontend".

Shadow Infrastructures

Recognizing the informal workarounds, such as hacked fonts and closed-source dictionaries, that Tulu speakers use to survive digitally.

Design Justice

Ensuring marginalized communities control and benefit from the design process through participatory methods.

The Digital Language Divide

Digital infrastructures such as keyboards and operating systems often enforce a standard of English supremacy. For Tulu speakers, this results in a layered accessibility problem:

Infrastructural Friction

Users must negotiate with autocorrect systems that view native Tulu words as "errors"

Infrastructural Bricolage

Tulu youth often force the Roman (English) keyboard to speak Tulu, leading to lost phonemes and a frustrating user experience.

Fragmented Writing

While vibrant in speech, Tulu has been displaced in writing by Kannada and Malayalam scripts, leaving the traditional Tigalari script with limited digital support.

Dialectal Exclusion

Generic predictive text models fail to recognize the linguistic diversity of the community, such as Brahmin, Jain, and Beary-influenced forms of Tulu.

Revised Definitions

Designing a

Transliteration System

Problem Statement

Although the script has recently been encoded in the Unicode Standard, the surrounding digital ecosystem remains incomplete: mainstream keyboards, input methods, predictive text engines, and fonts do not yet support the script in a usable, everyday format.

As a result, Tulu speakers continue to rely on Latin transliteration, Kannada, or improvised hybrid typing systems that vary widely across dialects and social contexts.

This creates a layered accessibility problem: users cannot type Tulu consistently across devices;

The script renders unpredictably due to incomplete shaping support. Some sounds and glyphs get excluded too.

Generic predictive text models fail to recognise the linguistic diversity of the community, including Brahmin Tulu, Jain Tulu, Harijan Tulu, Beary-influenced forms, and rural vs. coastal variations. These failures disproportionately affect those whose dialects fall outside the informal “standard,” reinforcing inequalities within an already minoritised linguistic landscape.

Design Outline

The project seeks to address these gaps by building a fully custom web-based keyboard system, shaped by script-specific logic, dialectal variation, and user needs. This system includes:

A transliteration engine capable of phonetic-to-Unicode mapping,

A shaping-aware input pipeline congruent with Indic orthographic rules.

A multi-dialect predictive text engine capable of learning variants rather than standardising them away.

A Unicode-compliant font that renders the script reliably across browsers.

Together, these interventions aim to restore agency to Tulu speakers by giving them a consistent, self-determined, and decolonised means of writing their language digitally.

The primary output is a web-based keyboard system designed to restore agency to Tulu speakers by providing a consistent, self-determined way to write their language digitally.

The Design Phase

Phonetic Inventory

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

A study into the unique phonological structure of Tulu is essential in developing the Phonetic Inventory, so as to not exclude the unique sounds of the language

The phonological study of a language involves looking at data (phonetic transcriptions of the speech of native speakers) and trying to deduce what the underlying phonemes are and what the sound inventory of the language.

Link to Spreadsheet

Starting out, it was useful to see the transliteration schemes practiced by the dominant language of the region, Kannada. In doing so, we were able to identify cases where transliterating text caused ambiguous translations, i.e. instances where multiple sounds were mapped on the same characters, excluding unique dialectal sounds, resulting in misrepresentation.

The Design Phase

Keyboard Layout

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Key Characteristics of a Good Input Method

Control: The user should be in full control. The input method should not "dictate" or guess what the user wants to type, a criticism aimed at probabilistic keyboards.

Privacy: The tool must be private and not "spy on you." The article states that being free and open-source is a prerequisite for ensuring this

Availability & Ownership: A good IME should work offline, be available on all your devices (PC, phone), and not be a proprietary tool that can be withdrawn by its vendor (like Google's abandoned desktop IMEs)

Correctness: It must generate correct and standard Unicode. It should not, for example, substitute a visually similar character like the number '0' for the Malayalam anuswaram 'ം'.

Well-maintained: The software should have active maintainers, a public bug tracker, and regular updates to fix issues and adapt to new operating systems.

Documentation: It must have clear, up-to-date documentation for users.

Easy to Learn: While ease of learning is important, the author stresses that users should expect to put in a reasonable effort to master a lifelong skill, just as they did when learning to write by hand.

Well-maintained: The software should have active maintainers, a public bug tracker, and regular updates to fix issues and adapt to new operating systems.

Key Layout

Apple’s own design guidelines, which recommend that all clickable interface elements be of least 6.85 × 6.85 millimeters because anything below that would yield very poor click accuracy. (Microsoft and Nokia also recommend a minimum hit area of approximately 7 × 7 millimeters). Predictably, this results in misspellings.

Using Genetic algorithms and multi-objective Pareto optimization will help arrive at the mathematically optimal keayboard layout.

Key Findings (Post Testing)

Designing keyboards for scripts with many glyphs benefits from phonetic and phonology-aware layouts (map by place of articulation / phoneme clusters) rather than just visual/orthographic clustering — this helps learnability for novices. See work on Indic/phonetic keyboards (AKSHAR, Keylekh, IIT-Bombay work). pranavmistry.com+1

For soft keyboards, direct mapping (one tap → target glyph) is generally faster than approaches that require extra keystrokes (dead key) to produce diacritics, but direct mapping demands more screen real estate or layer switching. Bi et al. found K5-Direct faster than dead-key approaches for many languages. Stony Brook Computer Science

Long-press (press & hold) is commonly used to expose diacritics on mobile soft keyboards and is a practical compromise — but discoverability and delay tuning matter a lot; novice users often find press-and-hold less intuitive. Empirical/UX evidence and community reports show long-press latency and discoverability strongly impact subjective flow.

For Indic languages, transliteration + predictive candidate bars (IMEs that convert Roman input to script) are powerful: they reduce keystrokes by letting the IME resolve phoneme sequences into glyphs and can hide complexity from users. Surveys and literature on machine transliteration / IMEs show this is a high-value approach for many users.

Net: long-press is a usable tool but is not a silver bullet, it helps on small screens, but it can break typing flow if used as the only method or if the long-press delay is poorly tuned; predictive transliteration and a small diacritic layer are strong complementary strategies.

The Design Phase

Transliteration Schemes

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Transliteration schemes dictate how glyphs of one language are represented in another script.

IPA (International Phonetic Alphabet) and IAST (Indian Alphabet of Sanskrit Transliteration) are two of the most relevant systems used to depict indian languages in the latin script.

However, before creating a transliteration scheme, it was important to gain an understanding of existing transliteration systems.

For this, I looked at Unicode proposals published by Vaishnavi Murthy and the transliteration scheme developed for the Tulu Lexical project, which spanned over several decades and comprised of 6 volumes.

The Design Phase

Dialect Sensitive

Predictive Text Engine

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

The Design Phase

Unicode Optimised

Tigalari Font

Tulu has many unique sounds, which do not have glyphs to represent them in languages used to write the language.

Prototype 1

The first version of the keyboard prototype contains diacritics of consonants mapped on new keys on the default qwerty layout. To not disturb the muscle memory of users, the accented characters are positioned beside the unaccented variants. 

For the accented vowels, an algorithm maps the presence of double vowels and replaces such instances with accented vowels. This mimics the natural understanding of the speakers, and acts as a soft educational tool for teaching pronunciation of such characters

Create a free website with Framer, the website builder loved by startups, designers and agencies.