*BSD News Article 9558

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5982 ; Sat, 02 Jan 93 13:02:10 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
Message-ID: <1993Jan5.093059.29631@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <2565@titccy.cc.titech.ac.jp> <1992Dec28.064029.24421@fcom.cc.utah.edu> <2616@titccy.cc.titech.ac.jp>
Date: Tue, 5 Jan 93 09:30:59 GMT
Lines: 106

In article <2616@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <1992Dec28.064029.24421@fcom.cc.utah.edu>
>	terry@cs.weber.edu (A Wizard of Earth C) writes:
>
>>|> True. But, it should be noted that they don't fit even in 16 bits.
>>
>>Work is already under way to adapt Unicode to 32 bits.  I would be interested
>>in any similar work you know of in progress for XPG4/JIS.
>
>Anyway, a program written for 16 bit Unicode can not be usable with 32 bit
>Unicode.

Not true.  Attribution of files/users/compound data within the files/etc.
allows easy identification of version changes.

>>I am *not*
>>interested in proposing or attempting to provide yet another standard, if
>>that is what you believe is necessary.
>
>Then, why you are interested in using yet non-existent standard?

Unicode has been codified.  It exists:

	The Unicode Standard
	Worldwide Character Encoding
	Version 1.0, Volume 1
	_The Unicode Consortium_
	Addison-Wesley Publishing Company, Inc.
	ISBN 0-201-56788-1

	The Unicode Standard
	Worldwide Character Encoding
	Version 1.0, Volume 2
	_The Unicode Consortium_
	Addison-Wesley Publishing Company, Inc.
	ISBN 0-201-60845-6

>BTW, can you explain what XPG4 is?

The internationalization mechanism following XPG3, the SVR4.2 standard for
internationalization.  XPG4 is XPG3 with East Asian language support.
Standards documents are currently avilable, but a reference implementation
(to the best of my knowledge) is not.  I can look up and post the
publication information if you are truly interested.


>>Again, I want to stress that we are about the identification and adoption
>>of an existing standard rather than the specification and ratification of
>>a new one.
>
>Then, the only standard available now for internationalization is ISO 2022.
>
>It can, at least, differentiate Chinese and Japanese character.
>
>Do you want to use it?

ISO 2022 places the unacceptable burden of Runic encoding for monolingual
environments (post localization).  While is is an "OK" standard for
internationalization (multinationalization, really, since it deals with the
concept of multilingual documents directly), the penalties of a change in
apparent environment for the purely localized user are unacceptable, since
the purely localized user is the majority case.

The differentiation of Chinese and Japanese characters is the job of the
input mechanism, which would, in any case, be required to change between
the job of inputting Chinese and the job of inputing Japanese.  This switch
is an acceptable tagging mechanism for multilingual use.  Tagging for
monolingual use can be done on a per-system or per-user basis, since it
is unlikely that all localaization databases for each language (message
catalogs, etc) will be kept around.  As an English-speaker only
(hypothetically speaking, since I can get along in Japanese, German, Latin,
and Spanish and have a very little Greek, Gaelic, French and Swahili on the
side), I am unlikely to localize my machine to anything other than English,
and the only thing served by carrying around localization sets for 20 other
languages is my disk drive vendor.

The primary use for an interntaionalization mechanism will be localization;
anything on top of that (and yes, we can build multilingual applications
on top of that with little effort) is gravy.

>>We may invoke "tricks" to reduce storage requirements or to
>>retrofit existing input mechanisms, but we are not attempting a new standard.
>
>Again, existing input mechanism for Japanese is so large that even very
>complex tranlation does not affect its performance.

All the more reason to not quibble about storage mechanisms and get down to
the job of coding a reference implementation.  Unicode is a storage mechansim;
it does *not* disctate the display font any more than a plain text file
dictates the default Postscript font that will be used when you type "lpr".
Both input and display are functions of the localization (or
multinationalization) mechanisms we choose to employ on top of the storage
mechanism.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------