This is a transcript of a message I originally sent to the Unicode public mailing list on Monday, 18th November, 2002 @23:42 UTC

The contents of this message do not necessarily reflect my current opinions on the subject

Subject:
Extending the semantics of ZWJ and ZWNJ for
Indic scripts


The function of ZWJ and ZWNJ in regard to Indic scripts is to alter the shaping of a preceding consonant+Virama, but in some Indic scripts (Bengali, Oriya & possibly Traditional Malayalam) a device to control the shaping of a proceeding Virama+consonant may be desired. Examples of this are found when the first consonant is the letter Ra.

For example, the sequence, Ra+Virama+Ya may be rendered as reph+Ya or Ra+Ya.secondaryForm. This secondary form may or may not ligate with the preceding character.

 

Bengali and Oriya examples:

The first form, Ya+Reph, is the common form, and should be the form rendered as the norm, so the Ra_Ya.secondary needs to be marked some way as an exception

For the purposes of this discussion, I will use Bengali examples, as that is what I know best.

In Bengali, the secondary form of Ya is called jophola (AKA zofola, jofola, japhala, & yaphala).

Jophola is often used to transcribe sounds foreign to Bengali. For example to write the English word 'rat', one could use the sequence Ra+Virama+Ya+VowelSignA+TTa and expect it to be rendered as Ra+Jophola+VowelSignA+TTa. The question is, how would a rendering device know which form of Ra+Virama+Ya was intended (Ya+reph or Ra+Jophola)?

My thoughts were to put a ZWNJ after the Ra to indicate that is not to form a Reph e.g. Ra+ZWNJ+Virama+Ya = Ra+Jophola
Then I remembered that in some font designs, secondary forms such as jophola can form a conjunct ligature with the preceding consonant. I think that a ZWNJ would imply that Ra and Ya should not ligate.

My second Idea is to use the primary semantic of ZWJ as used in non-Indic scripts. For example, In the sequence, Ra+Virama+ZWJ+Ya, the ZWJ would imply that Virama and Ya should combine to make the secondary form.

This rule would only apply if the 'base' consonant were the letter Ra.
(This rule would not apply to Devanagri as: 1. it is not an affected script and 2. For Devanagri, Ra+Virama+ZWJ = eyelash_Ra).

Any thoughts?

Andy