Return to BSD News archive
Received: by minnie.vk1xwt.ampr.org with NNTP id AA5613 ; Fri, 01 Jan 93 01:50:52 EST Newsgroups: comp.unix.bsd Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty From: hasty@netcom.com (Amancio Hasty Jr) Subject: Re: S3 question - Amancio, are you there? Message-ID: <1992Dec28.054342.13142@netcom.com> Organization: Netcom Online Communications Services (408-241-9760 login: guest) References: <VIXIE.92Dec26034105@cognition.pa.dec.com> <1992Dec27.081525.29228@netcom.com> <Bzy9wD.9Ez@pix.com> Date: Mon, 28 Dec 1992 05:43:42 GMT Lines: 237 In article <Bzy9wD.9Ez@pix.com> stripes@pix.com (Josh Osborne) writes: >In article <1992Dec27.081525.29228@netcom.com> hasty@netcom.com (Amancio Hasty Jr) writes: >>In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes: >[...] >>>I see that the two greatest bit-bangers of the average computer are available >>>as VESA cards: display, and disk. I'm still formulating my disk controller >>>questions and perhaps I'll ask them in a future post. Right now I'm trying >>>to solve the S3 mystery. > >One problem with VESA LB and disk drives, (I think) VESA LB doesn't allow >bus mastering cards. For SCSI (at least) this could be quite useful. Of >corse with current tech disk drives you need 3 fast disks running at once >to use all the ISA bus. Or you need (say, IDE) controlers with cache on them, >but it would be better to have a auto-sizing disk cache in main memory (like >SunOS, or Linux), because it would be (a) faster, and (b) useable as core if >thats more useful then disk cache, (c) you know if it is flushed to disk >or not. > >>>At work I have a EISA/SVGA/34020 board. It is very fast when run under >>>Windows 3.1; however, Microsoft had access to the 34020 specs and I don't, >>>so I can't figure out how to port the X server to it and noone in this >>>newsgroup seems to have done that either. It's too bad -- a 34020 with >>>a minimal BITBLT interpreter downloaded into it would make for a lightening >>>fast X11 server with the 34020 as almost a co-processor. However, I'm >>>fairly sure that the 34020's days are numbered given something called "S3" >>>and the "GUI Accelerator" that seem to be taking the market by storm. > >The 34020 docs are available from TI, I have a set somewhere. The cross >compiler is quite expensiave, and the old version makes poor code. Someone >got a old gcc to work (more or less) with it. The 34020 is fairly quick, >I would like to see a 34020 running X on it :-) (I know it would be faster >to do most of the X stuff on the [34]86 and let the TI bang bits). > >The GUI accel's are doing better then the 34020 cards because they are cheap, >however I think you can build a 34020 card as cheap as a S3, but nobody has. > >>>I know that SVGA is more or less a hack on the IBM VGA spec to allow more >>>pixels; what I don't know is what an "SVGA S3" is. I have gathered from >>>context in posts on this newsgroup that it is some kind of graphics >>>accelerator chipset and that there are several different revisions of >>>it and that different board manufacturers have had different results. >>>Yet, VGA is fundamentally a frame buffer that has some hardware assist >>>for certain operations. Where does S3 fit in? Is it another IO port, or >>>just more opcodes to the existing VGA IO port? Or just a faster implementation >>>of the VGA spec? > >This is answered well below, but I thought I would point out that: > * VGA only allows 64K of the video memory to be mapped into the PC's addr space > at once > * Most SVGAs allow 128K at once, normally 2 64K windows. > * Some more useful, but more disgusting ways of viewing video memory are also > available. > * A small number of SVGA chipsets can map all of the video memory into the > PC, but I don't know if the video cards can do it. The 386BSD kernal will > need to be wacked to make it work anyway. > * The S3 adds a bunch of IO addrs on top of a normal looking SVGA chipset. > >>>There are two reasons I need to know this. First, if the VGA really is "just >>>a frame buffer", then given a fast CPU and VESA it should be trivial to get >>>the MIT CFB server running and have it run near the theoretical maximum >>>(though at some potentially unneccessary cost in main CPU cycles). If on >>>the other hand VGA is like EGA in that you can only map certain parts into >>>memory at a time and it's generally cheaper to send high-level commands and >>>let the graphics hardware figure out how to achieve them, then I see a >>>problem. > >In genneral you can only map part of the video memory at a time. > >>>What problem? Well, DEC did this really neat thing called the "Dragon" chip >>>set back on their MicroVAX II/GPX. It was really really fast -- if you wrote >>>your application in FORTRAN on VMS. On the other hand if you ran under X11, >>>things ran doggishly slow and the visual results were often less than perfect. >>>This is because the _only_ way to talk to a Dragon is in high-level op-codes, >>>and the model X11 lived in was incompatible with the one the Dragon used -- >>>so achieving one X11 operation often took several, or hundreds, of Dragon >>>operations. Since the Dragon's speed came from its economy of scale, the >>>speed was less than amazing. > >I don't know much about the dragon (is that the hardware made out of N >vipers?), but the S3, Mach8, Mach32, and even the 8514/a (or whatever it is) >have accel for short line segments which I think match up quite well with >the MI code in DDX's use of "spans" (not 100% short lines have limited length, >spans do not), so even when the exact graphics command X wants is not supported >by the hardware, this is (and should be faster then just pushing bits onto >a dumb buffer, except for really small spans). > >[...] >>>So here comes S3. Is it the salvation to all the world's woes? That depends. >>>Given VESA, one can access the VGA's "array" at memory speed (barring refresh >>>stalls -- that whole thing isn't dual-ported, is it?). Is that enough? Or, >>>if not, is it the S3 that gives one the extra performance and/or op-codes that >>>make X11 sing? And, if that last is true, why isn't an S3 on EISA or even ISA >>>"fast enough" ? > >I *think* (someone *please* correct me if I am wrong!) most of the numbers >(even the 70k+ ones) were with ISA S3 cards (they may have been in a EISA >system 'tho). I have an 486/50MHz system (ISA and Vesa Local Bus) and the 83k xstone posting is for the Actix GraphicEngine32 (S3 8C801 1MB DRAM ISA card). > >[...now the Hasty-miester speekith...] >>The image write, read and fill operations' performance was increased by >>using vga banking.We experienced a 10x performance improvement when >>we switched to vga banking. In the 8514/a architecture, all data transfer >>between the cpu and co-processor is done via the data transfer register. >>Also, we have to transfer the images a line at time inside a loop. >>If there is one area in which the S3 architecture suffers this is it! >>Ideally, I would like to see the chip do dma transfers from memory >>to the card and have it calculate the offsets into its memory and >>the logical converse - have the chip transfer a block of memory >>to consecutive region in the hosts memory. > >How about XCopyPlane (in XOR mode)? I don't have a S3 card (yet), but thats >the single most important thing for my application... Will let you know how fast it is :-) And, I will like to know what is your application? > >[...] >>The 801/805 and 928 architectures are capable of mapping their entire video >>memory to the host's address space. Currently, we only map 64k bytes at a >>time. This limitation is mostly imposed to us by the kernel! > >Can the video cards do this? Yes, the 801/805 are capable of mapping up to 2MB of memory > I assume the problem w/ the kernal is allocating >physicly contigous RAM? Yes this is main problem.. > The best way to do this is add a new flag to the >memmory allocator. The simplest way is to have the device probe allocate the >VM you need during boot when most allocations will be contigous, confirm that >is _is_ contigous and go on... Tnks, we are looking into it right now... (> >>Further performance improvements were achieved by compiling the server >>with gcc-2.3.1. Some of the x11perf results were nearly twice as fast! >>Overall performance improvement, using xbench, proved to be around %15. > >Did you remember to use -m 486 (to produce code that runs fast on the 486, >but still runs on the 386), or just have it do 386 code? Yes, we use the -m 486 flag. In fact this was one of the highest motivations for compiling the server using gcc-2.3.1. I am using gcc-2.3.2 in machine and will soon upgrade to gcc-2.3.3 :-) > >[...] >>Slowly, the server is evolving from its pure 8514/a architecture to the >>S3 architecture. The next major jump will be when 16 bit or 24 bit >>color gets implemented :-) > >I thought the next big jump would be when you can map in 1+M of video memory >and use it... In terms of adding functionality which is not available today, I think we should start working on 16/24 bit colors. At any rate, this is my choice :-) Most of the graphics operations don't addressed directly the video memory. image write/read/fill are the only X operations which access the video memory directly. For instance, here is a sample code which moves characters from the card's memory to the desired location in the display: WaitQueue(7); outpw(CUR_X, (short)(ibm8514FC_X+(((int)chars[i])%32)*FC_MAX_WI\ DTH)); outpw(CUR_Y, (short)(ibm8514FC_Y+(((int)chars[i])/32)*FC_MAX_HE\ IGHT)); outpw(DESTX_DIASTP, (short)(x + pci->metrics.leftSideBearing)); outpw(DESTY_AXSTP, (short)(y - pci->metrics.ascent)); outpw(MAJ_AXIS_PCNT, (short)(GLYPHWIDTHPIXELS(pci)-1)); outpw(MULTIFUNC_CNTL, MIN_AXIS_PCNT | (short)(GLYPHHEGHTPIXELS(pci)-1) outpw(CMD, CMD_BITBLT | INC_X | INC_Y | DRAW | PLANAR | WRTDATA\ ); >C >[...] >>Next, is how does the S3 architecture fair agains other accelerated cards? >> >>The January issue of Byte magazine voted the Actix's GraphicEngine32 (801) >>as one of the best overall graphic accelarated cards for window applications. >>At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32). >>And, I really doubt that the tests were executed at low clock frequencies. >>However, the article did not state the dot clock frequency which the tests >>were executed at. The other faster cards were based on the 34020 and cost >>more than $1400. > >People have had the S3 for long enough to make good use of it, the Mach32 may >be too new for good drivers to be available yet. I am assuming that Byte use ATI's drivers in their benchmark. > If people decide that the >34020 cards don't need to emulate SVGA/EGA/CGA/Herc in hardware the price >should drop by more then $1000, if they insist on doing that the price may >drop by about $1000. This would be the best card for X, because the 34020 >is fully programable and can be made more X orientated then windows orientated >Also, the 340xx has super great control over the display (size/shape/res/ >borders). The 34020 can even use the VRAM serial write regs... > >[...] >>On the topic of local bus IDE cards: >> >>It takes about 6 and 50 seconds to recompile the kernel with gcc-1.39. >>With an ISA IDE card, it takes about 7.5 minutes :-) >> >>How much does it cost? $89. > >What does "6 and 50 seconds" mean? Most IDE local bus cards mainly add lots >of cache. We can do better by adding more RAM to the main system and using >it wisely... 6 minutes and 50 seconds vs 7.5 minutes to compile the kernel. Orchid claims an 8MB data transfer rate and I am not going to get into a long philophical argument here with respect to what is a good benchmark for disk controllers :-) The Vesa Local Bus IDE controller is a non-caching controller and please don't forget it costs $89! > >[...] >-- > stripes@pix.com "Security for Unix is like > Josh_Osborne@Real_World,The Multitasking for MS-DOS" > "The dyslexic porgramer" - Kevin Lockwood >We all agree on the necessity of compromise. We just can't agree on >when it's necessary to compromise. - Larry Wall -- Amancio Hasty | Home: (415) 495-3046 | ftp-site depository of all my work: e-mail hasty@netcom.com | sunvis.rtpnc.epa.gov:/pub/386bsd/incoming