Return to BSD News archive
Newsgroups: comp.unix.bsd Path: sserve!manuel.anu.edu.au!munnari.oz.au!hp9000.csc.cuhk.hk!saimiri.primate.wisc.edu!sdd.hp.com!spool.mu.edu!agate!netsys!pagesat!spssig.spss.com!news.oc.com!eff!news.byu.edu!ux1!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: [386BSD] What about localisation? Message-ID: <1992Dec8.214215.24804@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1992Dec7.182103.1799@rdrel.relcom.msk.su> Date: Tue, 8 Dec 92 21:42:15 GMT Lines: 95 In article <1992Dec7.182103.1799@rdrel.relcom.msk.su> sir@rdrel.relcom.msk.su (Sergey I.Ryzhkov) writes: >Gentelmens! > >Are anybody attempted to make true POSIX-style localisation of >386BSD for some language? I find only "C-language frames" for "setlocale" >ans "strcoll" function in c-lib sources and noothing about localisation >in docs. I plans to do the complete implementation of this feature >for Russian language but, it seems to me that somebody in the world >must do something analogous (for some other language, may be) >I could not find something about except of description of function call >in POSIX which is do not complete as my minds (may be I do not have all the >discription) and I well be glad to get the an example of their >implementation or to contact with some person who can explain me how >they must be implemented. Internationalization has been discussed in some detail; the languages I'm unterested in are Spanish and German (handled by ISO Latin-1) and Greek and Japaneese (definitely requiring better localization than that provided by XPG3. The last discuussion on it ended with a pretty much unanimus agreement that Unicode was the way to go, but not on storage mechanism or in program versus in system overhead. Thus a discussion of Runes (multibyte encoding) that pretty much insures non-ASCII characters will take multiple bytes, and that 16 bit Unicode characters will take 8-24 bits to encode (a poor trade off for anyone using a majority of non-ASCII characters, and rather centered on internationalization for Western Europe/North and South America). Last message was: > > From: keld@login.dkuug.dk (Keld J|rn Simonsen) > Subject: Re: multibyte character representations and Unicode > Message-ID: <keld.722285494@login.dkuug.dk> > > The Plan 9 encoding of ISO 10646 is planned for inclusion > in POSIX .2b standard, and is thus on a standards track. In any case, I would prefer any Unicode standard, however badly implemented, to XPG3, which would fail to deal with anything but Western Europe and North and South America, in my opinion. Another thing which requires consideration is a set of standardized messages translated into all supported languages through whatever localization mechanism we will use for messages in the shell, programs, and etc. for perror and family. This will tend to go a long way towards usability in an international forum -- and probably constitutes our best bet for high return on the effort we invest, guaranteeing at least base functionality in supported languages. I think that this eventually assumes an X environment and a full Unicode "fixed" font; this is ~250K for a 5x8, <1M for a 10x20 (non-default). Does such a font already exist? The other fundamental assumption is multibyte data stream to the tty, and appropriate localization by the tty itself. This is an easy mod for xterm, but requires spanning sets within a given non-PC-ASCII driver for (for instance) a downloaded Cyrillic font in a VGA/EGS card. This would be, fundamentally, a 16-bit to 8-bit "mapchan". I suggest we attack it in this order: 1) Pick a standard for encoding (I vote Unicode). 2) Pick a standard for storage (I vote character set attributed files to avoid stream encoding while maintaining the benefits of 8-bit storage for most languages). 3) Create an X environment capable of supporting all languages by default (Again, I vote Unicode). 4) Build some tools for running two character sets simultaneously (requires combination of Anglicized/Localized encoding and adat entry mechanisms). 5) Provide basic error message and prompting translations (requires fluently bilingual volunteers). 6) Perform a code integration (probably at the 0.2 level, although this may drag on until 0.3). Anything else that needs to be handled? Unlike most OS products (with a possible exception for NT, which is Unicode aware), we have a chance to do this right before the product is too mature to let us do things "the right way". We should take the opportunity while it still presents itself. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------