The KIWrite Handbook: Another Section

K-IWrite version 0.1

4. Transliteration rules

K-IWrite works using the simple method of transliteration. However the transliteration rules are open-ended, allowing you to modify existing rules and/or create new rules. Each rule essentially instructs K-IWrite how to substitute a set of characters in input text into another set in the output. K-Write does not understand any human language. It is completely governed by the transliteration rules that you specify. So, if you make a mistake while defining the rules, you can expect the error to be propagated into your documents. K-IWrite is not capable of and does not try to correct any errors made by you.

4.1 How to define a transliteration map file

In order to convert the English text into your language, IWrite32 needs to know:

Which character to replace and by what
How to handle multiple consonant characters without vowels in between (yuktakshar)
How to apply a vowel to a consonant/a collection of consonants

The transliteration file contains many lines, each describing a way of handling a group of characters. Each line has the following format :

Description of each component in the format :

<English Characters>	The characters that you will type and want to get converted into your language. You can use any character in this field except white space (space and tab characters). This field is case-sensitive.
<wsp>	Linear white space (space character or tab)
<stand alone characters>	When the <English characters> are typed without any adjascent consonant/vowel. Example : aasmaan You can use any character in this field except white space (space and tab characters). This field is case-sensitive.
<applied as>	When the <English characters> are typed together with (before or after) another set of <English characters>. Example : aasmaan You can use any character in this field except white space (space and tab characters). This field is case-sensitive.
<type>	Fixed values : Consonant or Vowel or StandAlone. Consonant and Vowel are self-explanatory. StandAlone means no vowel or consonant can be applied to this sound. If a vowel is found after a standalone sound, the vowel's <stand alone characters> will be echoed after the StandAlone sound (rather than the vowel being applied which is the case for consonants). This field is not case-sensitive.
<sides>	Fixed values : PreFix, PostFix, Both, Around, AroundPost and None This value implies how the sound (vowel/consonant) is applied to a preceding consonant. PreFix means this sound goes before preceding consonant (e.g. likhna). PostFix means this sound goes after preceding consonant (e.g. meraa). Both means this sound consists of two parts which wrap the preceding consonant (I know of examples in Bengali only). Around means that although the letter appears before some other consonant in English script, at should actually be applied after the succeeding consonant(s) in converted text (e.g. nirbhul). AroundPost means that the consonant can act both as described in Around and as a PostFix (e.g. nirbhul and chakra). None means this sound is never applied to any consonant. This field is not case-sensitive.

A note on <English characters>, <stand alone characters> and <applied as>

Although you can use any character(s) while defining these three sets, remember that the backslash (\), plus (+) and hyphen (-) are special characters. They have special meanings in the map files as described in below. So, if you want to include any of them in your pattern, prepend a backslash before them, i.e. use double backslash (\\), \+ and \- respectively.

These three sets of characters are the most important part of the transliteration map. They decide what is converted into what. So for each sound in your language, you have to find the proper character(s) in your font (use Character Map in Accessories) and note down their ASCII or extended ASCII codes.

These characters can be plain ASCII characters or a combination of ASCII and extended ASCII. Plain ASCII characters are written as is. Extended ASCII characters can be written as hex or decimal or octal numbers. The prefix shows the number used : \x for hex, \d for decimal and \o for octal numbers. Extended ASCII characters MUST be separated from each other and from ASCII characters by a '+' character.

If any of these sets has no characters, use a '-' chatacter. Please remember that the <English characters> set cannot have '-' (nothing) as the data (although \- is a valid pattern).

With this information, let's look at some sample transliteration maps:

aa B \xA1 Vowel PostFix

au A - Vowel None

i C \xA2 Vowel PreFix

r l \d209+\xCB Consonant AroundPost

tr \x9c - Consonant None

So, now if my input text contains "aar", output will be "Bl" which, when viewed in your font, will represent something. Similarly, if the input text contains "raa", the output will be "l\xA1" (where \xA1 is the character having ASCII code 0xA1).

You may want to take a look at the map file that accompanies K-IWrite. The map file is for Bengali. For other languages, you may want to download IWrite32 and look at all the map files in the archive.

If you think you are still confused, drop me a line stating your problem and I will try to help you out.

Next Previous Contents

aa	B	\xA1	Vowel	PostFix
au	A	-	Vowel	None
i	C	\xA2	Vowel	PreFix
r	l	\d209+\xCB	Consonant	AroundPost
tr	\x9c	-	Consonant	None