Menu

 
 
 
 
 

Setting Type in Marshallese

Marshallese is the language of the Marshall Islands in the Eastern Pacific Ocean. The modern version of the written language uses the Latin character set plus sixteen combinations of Latin characters with diacritical marks. To properly set type in Marshallese, the following issues should be addressed, in the order listed:

Character Definition

The sixteen specialty characters include eight having a macron over them, and eight having an undefined diacritical mark under them. We have been unable to get a definitive answer as to which mark is "correct" for the "below" mark. However, in common practice, the cedilla is used, and this appears to be the recommended practice. The remainder of this discussion assumes use of the cedilla. Caveat: It is possible that some entity with authority on the subject may change this preference, which may result in documents needing to be encoded with a different scheme.

Character Encoding

For consistency and usability, it is desirable, where possible, to use Unicode as the basis for encoding the characters.

Of the sixteen special Marshallese characters, six of them do not have Unicode code points assigned, and it appears that they never will:

To avoid creating every conceivable combination of Latin character and diacritical mark, the intent of the Unicode standard is to have the base character and diacritical mark keyed separately, as two characters. The rendering software should then combine the two characters together.

The table below summarizes the Unicode code points for the correct combined characters, and also for the character combinations that produce the same effect. The Decimal equivalent of the Unicode code point is included as well, as it is useful for keyboard entry in some applications (for example, holding the Alt-key down in Microsoft Word while keying the Decimal value will enter the character):

Character Combined
Character
Character Combination
Unicode[1] Decimal Character Unicode Keyboard/
Decimal
A-macron U+0100 256 A U+0041 A
combining macron U+0304 772
a-macron U+0101 257 a U+0061 a
combining macron U+0304 772
N-macron     N U+004E N
combining macron U+0304 772
n-macron     n U+006E n
combining macron U+0304 772
O-macron U+014C 332 O U+004F O
combining macron U+0304 772
o-macron U+014D 333 o U+006F o
combining macron U+0304 772
U-macron U+016A 362 U U+0055 U
combining macron U+0304 772
u-macron U+016B 363 u U+0075 u
combining macron U+0304 772
L-cedilla U+013B 315 L U+004C L
combining cedilla U+0327 807
l-cedilla U+013C 316 l U+006C l
combining cedilla U+0327 807
M-cedilla     M U+004D M
combining cedilla U+0327 807
m-cedilla     m U+006D m
combining cedilla U+0327 807
N-cedilla U+0145 325 N U+004E N
combining cedilla U+0327 807
n-cedilla U+0146 326 n U+006E n
combining cedilla U+0327 807
O-cedilla     O U+004F O
combining cedilla U+0327 807
o-cedilla     o U+006F o
combining cedilla U+0327 807

Notes:
[1] All predefined character combination Unicode code-points are in the Latin Extended-A range.

For the six non-Unicode characters, if your software or font does not handle combining diacriticals properly there are at least two workarounds, both deprecated:

  • Create a font that composes the two glyphs (for example, the "n" and the combining macron) into one glyph, and use a code point in a private-use range in the Unicode standard to encode that glyph.
  • Use a different character that approximates the desired character, and that does have a Unicode code point. So, for example, use an n-tilde in place of an n-macron.

The following table summarizes these options. Please note that the private-use Unicode code points listed are purely arbitrary:

Character Private-Use Combined Character Substitute (Deprecated) Reasonable Substitute for the Character Itself (Deprecated)
Unicode Decimal Character Unicode Decimal
N-macron U+F000 61440 N-tilde U+00D1 209
n-macron U+F001 61441 n-tilde U+00F1 241
M-cedilla U+F002 61442 M-dot U+1E42 7746
m-cedilla U+F003 61443 m-dot U+1E43 7747
O-cedilla U+F004 61444 O-dot U+1ECC 7884
o-cedilla U+F005 61445 o-dot U+1ECD 7885

The following table summarizes the Unicode code points for the various combining diacriticals that are acceptable:

Combining Diacritical Unicode Decimal
combining macron U+0304 772
combining cedilla (currently recommended) U+0327 807
combining dot below (currently deprecated) U+0323 803
combining comma below (currently deprecated) U+0326 806

The combining diacriticals can be (or at least should be) usable even for characters that have Unicode code points for the combined characters. For example, even though A-macron has the Unicode code point U+0100 assigned, it could also be entered by first keying an uppercase A, then keying a combining macron. This may simplify keyboard entry for some users, because only two special characters need to be maintained and remembered, rather than sixteen.

Fonts

To use the encoding described above, your font must:

  • contain glyphs mapped to the Unicode code points described
  • for the combining diacriticals, it must have the "mark" feature implemented for the various base and mark glyphs

Even though the OpenType font format supports the concept of combining diacriticals discussed above, current searches (April, 2004) reveal no commercial fonts that have implemented the necessary features. Furthermore, it is only with great difficulty that such features can be added to commercial fonts (see OpenType fonts for discussion).

Applications

Even after appropriate fonts are licensed or created, there is an amazing dearth of software applications that support the combining diacriticals. Adobe InDesign 2.0 does not, and we are told that Adobe InDesign CS does not either. Microsoft Word 2003 appears to be the only widely available package to do so, and its implementation appears to be a bit buggy.

Convenient Keyboard Entry

We believe that convenient keyboard entry can be achieved in most applications and environments without significant effort. However, until the fonts and applications are more widely available, we will probably not address this issue.

Conclusion

Using Unicode for setting type in Marshallese is not ready for "prime time" yet. It is probably best to continue using the current practice of arbitrarily-encoded precomposed glyphs instead. However, this situation is likely to change in the near future, as fonts and applications adopt more of the OpenType feature set.


This article was written by Victor Mote, April 20, 2004.