Return to BSD News archive
Received: by minnie.vk1xwt.ampr.org with NNTP id AA6152 ; Tue, 05 Jan 93 13:08:47 EST Newsgroups: comp.unix.bsd Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!cs.utexas.edu!wupost!gumby!destroyer!gatech!news.byu.edu!ux1!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Ohta enpitsu inke desu Message-ID: <1993Jan7.222423.899@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <2628@titccy.cc.titech.ac.jp> <1993Jan7.045612.13244@fcom.cc.utah.edu> <2637@titccy.cc.titech.ac.jp> Date: Thu, 7 Jan 93 22:24:23 GMT Lines: 313 In article <2637@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <1993Jan7.045612.13244@fcom.cc.utah.edu> > terry@cs.weber.edu (A Wizard of Earth C) writes: > >>Before I proceed, I will [ once again ] remove the "dumb Americans" from my >>original topic line. > >I changed the subject to reflect the content better. I changed the subject as you did, to be insulting rather than useful. My, aren't we resolving these issues wonderfully. I expect you to change the topic again, but hopefully to something germane to the content of the posting rather than another insult. >>>>>>This I don't understand. The maximum translation table from one 16 >>>>>>bit value to another is 16k. >>>>>WHAAAAT? It's 128KB, not 16k. >>It is still a translation of one 16 bit value to another. In is *not* an >>*arbitrary* translation we are talking about, since the spanning sets will >>be known. >You wrote MAXIMUM. I was referring to a spanning set less than the full 16 bit set; to anyone who took these sentences in context (you, obviously, did not so as to manufacture a situation for you to be able to disagree with me), it is obvious that by "maximum" I was referring to "maximum translation table spanning set", not "maximum translation table for the full 16 bit set". Keep this up, and I will be openly insulting. >>>>>>This means 2 16k tables for translation into/out of >>>>>>Unicode for Input/Output devices, >>Sorry; I misspoke (mistyped?) here. > >You are dumb. I can speak just fine, thank you, not that a handicap of that nature, were I to have it, would make me any less of a person. Nice of you to remove the context here, as well. The statement "you are dumb" is, of course, a brilliant rebuttal of my point, which follows: >>I meant to refer to any arbitrary 8-bit >>set for which a localization set is available (example: and ISO 8859-x set). > >Do you know what HASHING is? If not, read Knuth. Hashing involves a loss of information (read Knuth yourself). I was not suggesting that information be destroyed in the mapping process (as you apparenty wish would happen, since it would invalidate my argument). The translation would be from 16-bit Unicode through a 16-to-8-bit spanning table to a specific 8-bit ISO character set. This is *not* hashing. >>Obviously, by this response, you meant "cat two files to a third file" rather >>than what you stated, > >You don't have to create a third file, as the output might be piped. > >>what you stated, which would have resulted in the files going to the >>screen. Display device attribution based on supported character > >While you may not know UNIX at all, "cat" has nothing to do with display. >Instead, some device drivers and terminal emulators might. EXCUSE ME, BUT YOUR ORIGINAL STATEMENT WAS: ] How can you "cat" two files with different file attributes? To which I replied. ] By localizing the display output mechanism. Thinking that, since you did not suggest that the ouput would be other than the default for cat, I made the mistake of taking your words to mean what they meant. To which you intentionally misinterpreted: ] Wow! Apparently he thinks "cat" is a command to display content of ] files. No wonder he think file attributes good. DO YOU DENY THIS? >From that derived the quoted (">") section just above. Any one with half a brain knows that cat can be used to display files, that the default output of cat is to fd 1 (stdout), and that by the phrasing `you "cat" two files` you implied with "you" that I would do it personally rather than as part of a script. Further, stdout in an interactive environment is attached to a device driver for a tty or a pty -- a display device. You, of all people, are exactly qualified to know this. >>Obviously what you are asking is "how do I make two monolingual/bilingual/ >>multilingual files of different language attribution into a single bilingual/ >>multilingual file using cat" -- not the question as you have phrased it, nor >>as I have answered it, but in the context of the discussion, clearly the >>intended tack. > >"How to "cat" files with different attributes" is the classic question >to piss off attribute-lovers, which all UNIX lovers know. It didn't piss me off; I answered it in good faith, and provided a workable soloution that you could even call "cat" if you wanted. Yes, it introduced a case where multiple output streamss combined to produce its input failed; but it worked in all other cases. We can name it "cat" instead of "combine" if we choose to say that this is a case where the beahaviour is undefined. This is exactly analogous to the ANSI C standard changing expected behaviour to undefined behaviour for things like memcpy() to overlapping areas, or similar changes to the action of system calls under Posix. I do not see you claiming that ANSI C is not C or that a Posix compliant UNIX is not UNIX. My redefinition of "cat" stands as a potential soloution to your attribute problems. If there is not a default attribution of files and a default attribution of all files below a mount point where the mount goes remote via NFS to an older system, how do you propose to deal with use of non-international files on an internationalized system? You may cop out for 7 bit US ASCII, but whatever your answer, it damn well doesn't hold for existing file from 8-bit clean internationalizations in Western Europe, Russia, and elsewhere where small glyph-set character sets are currently in use -- or would you have us all update all our systems and all our software simultaneously? The reverse case of a non-internationalized system mounting an exported file system from an internationalized system applies here as well. How do you propose to solve this problem with a character set containing nonintersecting (non-unified) national character sets? Obviously, you will make a snide comment about me, rather than answering the questions, unless you chose to take that tack that there are bound to be incompatabilities with existing software *the same answer you berate me for here). >Of course, there are several other reasons why not to use file attributes, >which yuu don't know. But, I'm tired. > >>Rather than pretending I don't know what you are getting at, > >Then, don't post anymore. I should have pretended that I didn't know you were attempting to disguise the '"cat" of attributed files' problem and let you work up to it over the period of a week? Seems like sour grapes on your part. >>The answer is "you don't use 'cat'". The "cat" command does not deal with > >OK, say it in comp.unix.misc and see what happens. If I don't delete the context from this (as you did) and state that the "cat" command can be replaced with the "combine" command, and that the "combine" command can be renamed to "cat" as long as you don't construct wildly pessimistic code, an example of which I pointed out -- I am well aware of drawbacks in suggestions I make, and, unlike you, I not only admit them, but point them out. Did it occur to you that criticism is necessary only to draw attention to a flaw, and if the originator of the flaw admits it, perhaps they are looking for suggestions rather than someone parroting their own words back to them? >>What this means is that all files which are multilingual in nature require >>a compound document architecture. > >No thank you. I do want to grep my multilingual files. Grep for "macro". It's a Latin word used in many, many western languages; tell me: how will this match "macro" in Latin, German, and English when the character sets are not unified? Is your "grep " going to unify internally? How does your suggestion (which does not have a standard codified for it) resolve this issue? >>What this means is that a utility to combine documents (let's call it >>"combine") must have the ability to either generate language attributed >>files (if the source files are all of a single language attribution) or >>our default compound document format (TBD). > >You are making simple problem unsolvable. You are taking information from my posting out of context to make it seem as if this were the case. This is not a technique in rational discourse, it is a sales job. Salesman. >>The correct approach is to note that since Unicode does not provide a >>mechanism directly for language attribution, and that file attribution >>is only a partial soloution, > >So, the correct aproach is not to use Unicode as it is. No, the correct approach is to use a full soloution; one potential full soloution is in my previous posting (following the comma in the "mysteriously" truncated half line above. >>What this means is that a utility to combine documents (let's call it >>"combine") > >Wow! "Wow" indeed, as you "cleverly" omit the fact that "combine" may be renamed to "cat" if we omit a single contrived and pessimistic use from the set of defined behaviours for "cat"... as I stated in my previous posting. >>Does this answer your "cat" question sufficiently? > >Conglaturations! You are now prepared to accept the second question. > >Under internationalized environment, we often create a file with Japanese >name. At the same time, > > 1) we might have a file having Chinese name in the same directory. > 2) we might have a file having Chinese name in the different directory. > 3) the Japanses file's full pathname might contain Chinese at its > intermediate directory name. > >Could you design a replacement of "ls" for such a situation? Yes, no problem, since the name space information is not considered to be multilingual text in common usage, but rather it is considered to be name space information. Each name in a file is already tagged in the inode as to the nature of the language to be used within the inode (for monolingual documents). For documents which are *not* monolingual, the file name must have been entered in the context of a particualar language-dependant input mechanism for the file to exist within the file system name space at all. Thus the language tagging of the file name itself is also derivable at creation time. This is only untrue if you are proposing the maintenance of multiple name spaces, one per language used on the machine. This is both at odds with your stated intent of minimizing the currently loaded font sets, a natural requirement of your expansion of the combined font size -- an unworkable soloution in both Unicode and your suggested environment. This also has the ramification of mapping files into other than their creation name space at creation time, or to save space taken up by directories, on first reference within a particular name space. Unless you have personally solved the machine translation problem, there are attributes which do not move from language to language in the file system name space itself, such as file names denoting ownership ("bobs.file") or the contents of the file ("QuartlySales.Q3.1991"). Thus there is nothing added in doing this which is unresolovable, unless it is also unresolvable in your suggested mechanism as well. The only possible argument is collating sequence, and we both know your proposed soloution breaks down in languages with multiple possible collation sequences (ie: German dictionary vs. phone book order). It requires an exception. There is no reason not to make the exception the rule, and provide routines for alpha sort and locale-specific tables for all languages, instead of just the exceptions. This soloution is one that has been proposed for a Unicode-based environment as well. >Then, the third: > >>Attribution of output and clever construction of out output device drivers >>would even allow us to switch fonts as dictated by the compound document >>architecture controls embedded in the file and/or the attribution of the >>file descriptor (the absence of such attribution being an indicator of a >>compund document). > >Given the above situation for "ls", I'm afraid that "argv" to any command >be the compound document. Am I correct? Is it still have a type "char"? >Do you think the entire OS still UNIX? In order: No. No, unless you don't mean "byte" by "char", Yes, if POSIX and ANSI C haven't "unUNIXed UNIX" by their specification of previously non-existant exception cases, No, if you mean SVID -- but then again, your proposal (or any multibyte proposal) fails this test, as does 386BSD itself, the OS to which we will be applying the work. >>The problem seemed to >>be that there was not a means around the problem from your point of view. > >Just include language information in character code, and the problem >disappears. Unfortuantely (or fortunately, since it means I am not culpable and do not owe you nor anyone else an explanation on the matter), I am not a member of the responsible standards committee, or I might have done what you suggest. If you could suggest a standard that did what you are suggesting, allowed X11 to operate on 16 bit fonts (since X is our only possible common user interface at this time and most servers do not support 32 bit fonts), and allowed language specific compaction by character set choice as an optimization for monolingual documents (or did not disallow it!), then I would adopt your approach. Even a draft standard which was under serious consideration by a standards committee would be acceptable. One can not build a palace of bricks when one has only straw; but with straw, one may build bricks. Unicode is straw. The work on 386BSD is widely distributed, and it is not possible to use an approach which has not been formally documented when the developers are so widely seperated geographically. It is not possible to use a "standard" where a reference takes the form of "ask Ohta; it's his standard" (if you had a car accident, we would all be screwed). It is useless to use a standard which has no hope of becoming codified by a respected standards committee... thus it must be a draft standard under consideration or an actual standard. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------