*BSD News Article 9638

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA6152 ; Tue, 05 Jan 93 13:08:47 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!cs.utexas.edu!wupost!gumby!destroyer!gatech!news.byu.edu!ux1!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: Ohta enpitsu inke desu
Message-ID: <1993Jan7.222423.899@fcom.cc.utah.edu>
Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <2628@titccy.cc.titech.ac.jp> <1993Jan7.045612.13244@fcom.cc.utah.edu> <2637@titccy.cc.titech.ac.jp>
Date: Thu, 7 Jan 93 22:24:23 GMT
Lines: 313

In article <2637@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <1993Jan7.045612.13244@fcom.cc.utah.edu>
>	terry@cs.weber.edu (A Wizard of Earth C) writes:
>
>>Before I proceed, I will [ once again ] remove the "dumb Americans" from my
>>original topic line.
>
>I changed the subject to reflect the content better.

I changed the subject as you did, to be insulting rather than useful.  My,
aren't we resolving these issues wonderfully.  I expect you to change the
topic again, but hopefully to something germane to the content of the
posting rather than another insult.

>>>>>>This I don't understand.  The maximum translation table from one 16
>>>>>>bit value to another is 16k.
>>>>>WHAAAAT? It's 128KB, not 16k.
>>It is still a translation of one 16 bit value to another.  In is *not* an
>>*arbitrary* translation we are talking about, since the spanning sets will
>>be known.
>You wrote MAXIMUM.

I was referring to a spanning set less than the full 16 bit set; to anyone
who took these sentences in context (you, obviously, did not so as to
manufacture a situation for you to be able to disagree with me), it is
obvious that by "maximum" I was referring to "maximum translation table
spanning set", not "maximum translation table for the full 16 bit set".

Keep this up, and I will be openly insulting.

>>>>>>This means 2 16k tables for translation into/out of
>>>>>>Unicode for Input/Output devices,
>>Sorry; I misspoke (mistyped?) here.  
>
>You are dumb.

I can speak just fine, thank you, not that a handicap of that nature, were
I to have it, would make me any less of a person.

Nice of you to remove the context here, as well.  The statement "you are
dumb" is, of course, a brilliant rebuttal of my point, which follows:

>>I meant to refer to any arbitrary 8-bit
>>set for which a localization set is available (example: and ISO 8859-x set).
>
>Do you know what HASHING is? If not, read Knuth. 

Hashing involves a loss of information (read Knuth yourself).  I was not
suggesting that information be destroyed in the mapping process (as you
apparenty wish would happen, since it would invalidate my argument).  The
translation would be from 16-bit Unicode through a 16-to-8-bit spanning
table to a specific 8-bit ISO character set.  This is *not* hashing.

>>Obviously, by this response, you meant "cat two files to a third file" rather
>>than what you stated,
>
>You don't have to create a third file, as the output might be piped.
>
>>what you stated, which would have resulted in the files going to the
>>screen.  Display device attribution based on supported character
>
>While you may not know UNIX at all, "cat" has nothing to do with display.
>Instead, some device drivers and terminal emulators might.

EXCUSE ME, BUT YOUR ORIGINAL STATEMENT WAS:
] How can you "cat" two files with different file attributes?

To which I replied.

] By localizing the display output mechanism.

Thinking that, since you did not suggest that the ouput would be other than
the default for cat, I made the mistake of taking your words to mean what
they meant.  To which you intentionally misinterpreted:

] Wow! Apparently he thinks "cat" is a command to display content of
] files. No wonder he think file attributes good.

DO YOU DENY THIS?

>From that derived the quoted (">") section just above.

Any one with half a brain knows that cat can be used to display files, that
the default output of cat is to fd 1 (stdout), and that by the phrasing
`you "cat" two files` you implied with "you" that I would do it
personally rather than as part of a script.  Further, stdout in an
interactive environment is attached to a device driver for a tty or a pty
-- a display device.  You, of all people, are exactly qualified to know
this.

>>Obviously what you are asking is "how do I make two monolingual/bilingual/
>>multilingual files of different language attribution into a single bilingual/
>>multilingual file using cat" -- not the question as you have phrased it, nor
>>as I have answered it, but in the context of the discussion, clearly the
>>intended tack.
>
>"How to "cat" files with different attributes" is the classic question
>to piss off attribute-lovers, which all UNIX lovers know.

It didn't piss me off; I answered it in good faith, and provided a workable
soloution that you could even call "cat" if you wanted.

Yes, it introduced a case where multiple output streamss combined to
produce its input failed; but it worked in all other cases.  We can
name it "cat" instead of "combine" if we choose to say that this is a
case where the beahaviour is undefined.  This is exactly analogous to
the ANSI C standard changing expected behaviour to undefined behaviour
for things like memcpy() to overlapping areas, or similar changes to
the action of system calls under Posix.  I do not see you claiming
that ANSI C is not C or that a Posix compliant UNIX is not UNIX.  My
redefinition of "cat" stands as a potential soloution to your attribute
problems.

If there is not a default attribution of files and a default attribution
of all files below a mount point where the mount goes remote via NFS to
an older system, how do you propose to deal with use of non-international
files on an internationalized system?  You may cop out for 7 bit US ASCII,
but whatever your answer, it damn well doesn't hold for existing file
from 8-bit clean internationalizations in Western Europe, Russia, and
elsewhere where small glyph-set character sets are currently in use --
or would you have us all update all our systems and all our software
simultaneously?

The reverse case of a non-internationalized system mounting an exported
file system from an internationalized system applies here as well.  How
do you propose to solve this problem with a character set containing
nonintersecting (non-unified) national character sets?

Obviously, you will make a snide comment about me, rather than answering the
questions, unless you chose to take that tack that there are bound to
be incompatabilities with existing software *the same answer you berate
me for here).

>Of course, there are several other reasons why not to use file attributes,
>which yuu don't know. But, I'm tired.
>
>>Rather than pretending I don't know what you are getting at,
>
>Then, don't post anymore.

I should have pretended that I didn't know you were attempting to disguise
the '"cat" of attributed files' problem and let you work up to it over
the period of a week?  Seems like sour grapes on your part.

>>The answer is "you don't use 'cat'".  The "cat" command does not deal with
>
>OK, say it in comp.unix.misc and see what happens.

If I don't delete the context from this (as you did) and state that the
"cat" command can be replaced with the "combine" command, and that the
"combine" command can be renamed to "cat" as long as you don't construct
wildly pessimistic code, an example of which I pointed out -- I am well
aware of drawbacks in suggestions I make, and, unlike you, I not only
admit them, but point them out.  Did it occur to you that criticism is
necessary only to draw attention to a flaw, and if the originator of
the flaw admits it, perhaps they are looking for suggestions rather
than someone parroting their own words back to them?

>>What this means is that all files which are multilingual in nature require
>>a compound document architecture.
>
>No thank you. I do want to grep my multilingual files.

Grep for "macro".  It's a Latin word used in many, many western languages;
tell me: how will this match "macro" in Latin, German, and English when
the character sets are not unified?  Is your "grep " going to unify
internally?  How does your suggestion (which does not have a standard
codified for it) resolve this issue?

>>What this means is that a utility to combine documents (let's call it
>>"combine") must have the ability to either generate language attributed
>>files (if the source files are all of a single language attribution) or
>>our default compound document format (TBD).
>
>You are making simple problem unsolvable.

You are taking information from my posting out of context to make it seem
as if this were the case.  This is not a technique in rational discourse,
it is a sales job.  Salesman.

>>The correct approach is to note that since Unicode does not provide a
>>mechanism directly for language attribution, and that file attribution
>>is only a partial soloution,
>
>So, the correct aproach is not to use Unicode as it is.

No, the correct approach is to use a full soloution; one potential full
soloution is in my previous posting (following the comma in the
"mysteriously" truncated half line above.

>>What this means is that a utility to combine documents (let's call it
>>"combine")
>
>Wow!

"Wow" indeed, as you "cleverly" omit the fact that "combine" may be renamed
to "cat" if we omit a single contrived and pessimistic use from the set
of defined behaviours for "cat"... as I stated in my previous posting.

>>Does this answer your "cat" question sufficiently?  
>
>Conglaturations! You are now prepared to accept the second question.
>
>Under internationalized environment, we often create a file with Japanese
>name. At the same time,
>
>	1) we might have a file having Chinese name in the same directory.
>	2) we might have a file having Chinese name in the different directory.
>	3) the Japanses file's full pathname might contain Chinese at its
>	   intermediate directory name.
>
>Could you design a replacement of "ls" for such a situation?

Yes, no problem, since the name space information is not considered to
be multilingual text in common usage, but rather it is considered to be
name space information.

Each name in a file is already tagged in the inode as to the nature of
the language to be used within the inode (for monolingual documents).
For documents which are *not* monolingual, the file name must have been
entered in the context of a particualar language-dependant input mechanism
for the file to exist within the file system name space at all.  Thus the
language tagging of the file name itself is also derivable at creation time.

This is only untrue if you are proposing the maintenance of multiple name
spaces, one per language used on the machine.  This is both at odds with
your stated intent of minimizing the currently loaded font sets, a natural
requirement of your expansion of the combined font size -- an unworkable
soloution in both Unicode and your suggested environment.  This also has
the ramification of mapping files into other than their creation name space
at creation time, or to save space taken up by directories, on first
reference within a particular name space.  Unless you have personally
solved the machine translation problem, there are attributes which do
not move from language to language in the file system name space itself,
such as file names denoting ownership ("bobs.file") or the contents of the
file ("QuartlySales.Q3.1991").  Thus there is nothing added in doing this
which is unresolovable, unless it is also unresolvable in your suggested
mechanism as well.

The only possible argument is collating sequence, and we both know your
proposed soloution breaks down in languages with multiple possible
collation sequences (ie: German dictionary vs. phone book order).  It
requires an exception.  There is no reason not to make the exception the
rule, and provide routines for alpha sort and locale-specific tables
for all languages, instead of just the exceptions.  This soloution is
one that has been proposed for a Unicode-based environment as well.

>Then, the third:
>
>>Attribution of output and clever construction of out output device drivers
>>would even allow us to switch fonts as dictated by the compound document
>>architecture controls embedded in the file and/or the attribution of the
>>file descriptor (the absence of such attribution being an indicator of a
>>compund document).
>
>Given the above situation for "ls", I'm afraid that "argv" to any command
>be the compound document. Am I correct? Is it still have a type "char"?
>Do you think the entire OS still UNIX?

In order:
No.

No, unless you don't mean "byte" by "char",

Yes, if POSIX and ANSI C haven't "unUNIXed UNIX" by their specification of
previously non-existant exception cases, No, if you mean SVID -- but
then again, your proposal (or any multibyte proposal) fails this test,
as does 386BSD itself, the OS to which we will be applying the work.

>>The problem seemed to
>>be that there was not a means around the problem from your point of view.
>
>Just include language information in character code, and the problem
>disappears.

Unfortuantely (or fortunately, since it means I am not culpable and do
not owe you nor anyone else an explanation on the matter), I am not a
member of the responsible standards committee, or I might have done what
you suggest.  If you could suggest a standard that did what you are
suggesting, allowed X11 to operate on 16 bit fonts (since X is our
only possible common user interface at this time and most servers do
not support 32 bit fonts), and allowed language specific compaction by
character set choice as an optimization for monolingual documents (or
did not disallow it!), then I would adopt your approach.  Even a draft
standard which was under serious consideration by a standards committee
would be acceptable.

One can not build a palace of bricks when one has only straw; but with
straw, one may build bricks.

Unicode is straw.

The work on 386BSD is widely distributed, and it is not possible to use
an approach which has not been formally documented when the developers
are so widely seperated geographically.  It is not possible to use a
"standard" where a reference takes the form of "ask Ohta; it's his
standard" (if you had a car accident, we would all be screwed).  It is
useless to use a standard which has no hope of becoming codified by
a respected standards committee... thus it must be a draft standard
under consideration or an actual standard.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------