*BSD News Article 8758

Path: sserve!manuel.anu.edu.au!munnari.oz.au!hp9000.csc.cuhk.hk!saimiri.primate.wisc.edu!sdd.hp.com!swrinde!cs.utexas.edu!uwm.edu!spool.mu.edu!sol.ctr.columbia.edu!ira.uka.de!math.fu-berlin.de!unidui!du9ds3!veit
From: veit@du9ds3 (Holger Veit)
Newsgroups: comp.unix.bsd
Subject: Re: [386BSD] What about localisation?
Date: 9 Dec 92 07:46:27 GMT
Organization: Uni-Duisburg FB9 Datenverarbeitung
Lines: 85
Message-ID: <veit.723887187@du9ds3>
References: <1992Dec7.182103.1799@rdrel.relcom.msk.su> <1992Dec8.214215.24804@fcom.cc.utah.edu>
Reply-To: veit@du9ds3.fb9dv.uni-duisburg.de
NNTP-Posting-Host: du9ds3.fb9dv.uni-duisburg.de

In <1992Dec8.214215.24804@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:

>In article <1992Dec7.182103.1799@rdrel.relcom.msk.su> sir@rdrel.relcom.msk.su (Sergey I.Ryzhkov) writes:
>>Gentelmens!
>In any case, I would prefer any Unicode standard, however badly implemented,
>to XPG3, which would fail to deal with anything but Western Europe and
>North and South America, in my opinion.

I would also advocate for Unicode, or in its more general representation,
ISO-10646-DIS 1.2. Unicode is basically the first ("basic multilingual") plane
of ISO 10646, and standardization is in progress. ISO may be extended to
64K plane with 64K characters each (a real investment into the future,
including all common and uncommon intergalactic languages :-))

>Another thing which requires consideration is a set of standardized
>messages translated into all supported languages through whatever
>localization mechanism we will use for messages in the shell, programs,
>and etc. for perror and family.  This will tend to go a long way towards
>usability in an international forum -- and probably constitutes our best
>bet for high return on the effort we invest, guaranteeing at least base
>functionality in supported languages.

This would require a large "resource data base".

>I think that this eventually assumes an X environment and a full Unicode
>"fixed" font; this is ~250K for a 5x8, <1M for a 10x20 (non-default).
>Does such a font already exist?

5x8 is probably too small for several asian languages, 10x20 looks feasible.
I have looked for such a monster for myself, without much success.

>The other fundamental assumption is multibyte data stream to the tty, and
>appropriate localization by the tty itself.  This is an easy mod for
>xterm, but requires spanning sets within a given non-PC-ASCII driver
>for (for instance) a downloaded Cyrillic font in a VGA/EGS card.  This
>would be, fundamentally, a 16-bit to 8-bit "mapchan".

The console driver should, if not purely ASCII, understand and display all
available characters. There should be no asymmetry between any modes; 
the "text mode" should behave like a single, fullscreen "xterm", with a fixed
font only.
It is not very difficult to make codrv accept and emit 16bit chars, the real
problems start with the restricted VGA scheme when we still remain in text
mode. There are two available font slots only (some SVGAs understand eight),
so constantly remapping of characters may occur (BTW: A screen may display
more different characters, 80x25, than available in the font slots). So we
need to run the console in graphics mode. A really bad problem we may
encounter is the different direction of writing. Some languages write
right-to-left; it requires "change direction" prefixes to handle this, and it
is difficult to handle mixed texts under all circumstances.

>I suggest we attack it in this order:

>1)	Pick a standard for encoding (I vote Unicode).
>2)	Pick a standard for storage (I vote character set attributed files
>	to avoid stream encoding while maintaining the benefits of 8-bit
>	storage for most languages).
>3)	Create an X environment capable of supporting all languages by
>	default (Again, I vote Unicode).
>4)	Build some tools for running two character sets simultaneously
>	(requires combination of Anglicized/Localized encoding and
>	adat entry mechanisms).
>5)	Provide basic error message and prompting translations (requires
>	fluently bilingual volunteers).
>6)	Perform a code integration (probably at the 0.2 level, although
>	this may drag on until 0.3).

>Anything else that needs to be handled?

I could agree with this. Volunteers :-) ?

>Unlike most OS products (with a possible exception for NT, which is Unicode
>aware), we have a chance to do this right before the product is too
>mature to let us do things "the right way".  We should take the opportunity
>while it still presents itself.


>					Terry Lambert

Holger
-- 
|  |   / Dr. Holger Veit         | INTERNET: veit@du9ds3.fb9dv.uni-duisburg.de
|__|  /  University of Duisburg  | "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|  | /   Dept. of Electr. Eng.   |   Sorry, the above really good fortune has
|  |/    Inst. f. Dataprocessing |      been CENSORED because of obscenity"