Nora Stevens Heath: Japanese-English Translation

Avoiding Mojibake
by John de Hoog (Wataru Tenga)

Even when using a Japanese-capable e-mail program, it is possible inadvertently to send mail to a mailing list or other recipients that shows up as unreadable gibberish—so-called mojibake.

Here are some basic considerations for avoiding this problem.

Choose the correct encoding

Ordinarily, Unicode would seem to be the safest choice. Unfortunately, a few popular programs, notably Eudora, do not support this encoding scheme. Use it only if you are sure your recipient can handle it, and when you need to mix different character types (e.g., Japanese and Korean) in the same message.

Shift-JIS is to be avoided. For Japanese, using ISO-2022-JP will ensure that anyone with a Japanese-enabled system and e-mail program will be able to read your message. (In some mail programs this may be called simply "JIS" or "JIS7".)

Trying to send Japanese characters in a message whose encoding is set to ASCII (plain text) or to a Western encoding like ISO-8859-1 is an invitation to serious mojibake.

Common pitfalls

The Honyaku mailing list, with its heavy traffic in multilingual messages passed among translators, is a good place to observe some of the common pitfalls that lead to mojibake.

The most common problem by far is caused by sending not from a mail program but through the Web interface. See below on how to avoid mojibake in this case.

Another cause of mojibake is when posters are using a mail program that does not support Japanese, and try to get around this by pasting in Japanese text from a different program. The pasted-in text is likely to be Shift-JIS, whereas the message header will identify the message as ASCII or as ISO-8859-1. This combination leads to mojibake on the receiving end in most cases.

Even in a mailer with full Japanese support, something similar can happen. The poster responds to a message that did not have any Japanese text, but inserts Japanese in the response. Or, the poster starts a new message in a default Western encoding, but adds Japanese in the process of editing. Some mailers will not adjust the encoding header accordingly. So it is important to check the encoding before sending a message.

One more pitfall has to do with Unicode. Programs like Outlook and Outlook Express will default to Unicode (UTF-8) encoding when they detect characters that cannot be sent in the 7-bit schemes. Rather than accepting this choice, it is safer to go back and find the offending character, remove it, and set the encoding to ISO-2022-JP (JIS).

A good e-mail program like TuruKame Mail will warn you anytime you try to send a message containing any characters that do not match the encoding in your headers. For other programs it is necessary to check manually, which is done differently in each program. Take time to learn how to check and adjust the encoding in your particular program.

Posting via a Web interface

This last problem does not have to do with e-mail programs per se, but relates to the mailing lists hosted on Yahoo Groups.

It is possible to post messages to Honyaku and other such lists directly from the Yahoo Groups Web site. Unfortunately, the default is set to English (ISO-8859-1), which is unacceptable for Japanese.

Right under the text entry box where you enter your message text is a place to designate the language. It looks like this:

A common mistake is to assume that since the message is in English, the choice above should also be "English". But in fact, what this question is asking about is those pesky encoding schemes. The choice of "English" actually sets the encoding to ISO-8859-1, which is no good for Japanese.

So when posting from the Web site, if your message (or signature) has any Japanese at all in it, be sure to choose "Japanese" as the posting language.

Additional complications can arise, however, depending on your browser's default encoding settings. If they are set to Auto-detect, Japanese (JIS), or Japanese (EUC), there should be no problem. But if Unicode (UTF-8) is set as the default, even selecting Japanese as the posting language may not have the desired result. The safest approach is to avoid posting through the Web altogether, using a Japanese email program like those introduced here instead.

Other resources

If you've received a bake'd piece of mail, this page may be able to help decode it. It offers a MIME header decoder, a broken JIS mail recovery service, and a Unicode decoder.

.: Return to the main page :.

Copyright © Nora Stevens Heath. All Rights Reserved.