Return to BSD News archive
Received: by minnie.vk1xwt.ampr.org with NNTP id AA5576 ; Fri, 01 Jan 93 01:49:33 EST Newsgroups: comp.unix.bsd Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty From: hasty@netcom.com (Amancio Hasty Jr) Subject: Re: S3 question - Amancio, are you there? Message-ID: <1992Dec27.081525.29228@netcom.com> Organization: Netcom Online Communications Services (408-241-9760 login: guest) References: <VIXIE.92Dec26034105@cognition.pa.dec.com> Date: Sun, 27 Dec 1992 08:15:25 GMT Lines: 287 In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes: >I could have addressed this directly to Amancio, but I am betting that a >lot of other folks would like to know the answer. I have been away from >the PC UNIX world for a while now (years, really) but I am presently >taking a look at different PC configurations for possible BSD/386 or 386BSD >use. I've already determined that Localbus ("VESA") is more cost effective >than EISA (in terms of useful-bit-made-faster per dollar-spent) and that >a VESA/ISA system is probably what I want unless the price difference of a >VESA/EISA system is within epsilon of what I have in my new-computer fund. > >I see that the two greatest bit-bangers of the average computer are available >as VESA cards: display, and disk. I'm still formulating my disk controller >questions and perhaps I'll ask them in a future post. Right now I'm trying >to solve the S3 mystery. > >At work I have a EISA/SVGA/34020 board. It is very fast when run under >Windows 3.1; however, Microsoft had access to the 34020 specs and I don't, >so I can't figure out how to port the X server to it and noone in this >newsgroup seems to have done that either. It's too bad -- a 34020 with >a minimal BITBLT interpreter downloaded into it would make for a lightening >fast X11 server with the 34020 as almost a co-processor. However, I'm >fairly sure that the 34020's days are numbered given something called "S3" >and the "GUI Accelerator" that seem to be taking the market by storm. > >I know that SVGA is more or less a hack on the IBM VGA spec to allow more >pixels; what I don't know is what an "SVGA S3" is. I have gathered from >context in posts on this newsgroup that it is some kind of graphics >accelerator chipset and that there are several different revisions of >it and that different board manufacturers have had different results. >Yet, VGA is fundamentally a frame buffer that has some hardware assist >for certain operations. Where does S3 fit in? Is it another IO port, or >just more opcodes to the existing VGA IO port? Or just a faster implementation >of the VGA spec? > >There are two reasons I need to know this. First, if the VGA really is "just >a frame buffer", then given a fast CPU and VESA it should be trivial to get >the MIT CFB server running and have it run near the theoretical maximum >(though at some potentially unneccessary cost in main CPU cycles). If on >the other hand VGA is like EGA in that you can only map certain parts into >memory at a time and it's generally cheaper to send high-level commands and >let the graphics hardware figure out how to achieve them, then I see a >problem. > >What problem? Well, DEC did this really neat thing called the "Dragon" chip >set back on their MicroVAX II/GPX. It was really really fast -- if you wrote >your application in FORTRAN on VMS. On the other hand if you ran under X11, >things ran doggishly slow and the visual results were often less than perfect. >This is because the _only_ way to talk to a Dragon is in high-level op-codes, >and the model X11 lived in was incompatible with the one the Dragon used -- >so achieving one X11 operation often took several, or hundreds, of Dragon >operations. Since the Dragon's speed came from its economy of scale, the >speed was less than amazing. > >That seems to be what kills EGA (and non-SVGA VGA) performance on PC's. You >can either send lots of not-exactly-what-you-wanted high level operations >down the "wire" or you can write to memory over a very slow bus. Either way >things are very very slow. > >So here comes S3. Is it the salvation to all the world's woes? That depends. >Given VESA, one can access the VGA's "array" at memory speed (barring refresh >stalls -- that whole thing isn't dual-ported, is it?). Is that enough? Or, >if not, is it the S3 that gives one the extra performance and/or op-codes that >make X11 sing? And, if that last is true, why isn't an S3 on EISA or even ISA >"fast enough" ? > >I know that Amancio's numbers indicate that the problem _is solved_, one way >or another. But before I consider plunking money down to buy one of these >boxes, I would very much like to know _how_ it was solved. And, I would like >to know the answer to the perennial question: "which VESA S3 card is fastest, >and why?" > >Thanks in advance... >-- >Paul Vixie, DEC Network Systems Lab >Palo Alto, California, USA "Don't be a rebel, or a conformist; ><vixie@pa.dec.com> decwrl!vixie they're the same thing, anyway. Find ><paul@vix.com> vixie!paul your own path, and stay on it." -me I chose the S3 chipset because: o it offers a relative low cost Recently, S3 cards have been listed for less than $200 I expect the following price break down: 911/924 to cost around $170 801 (ISA) / 805 (local bus or EISA) $200 and $250. 928 ?? to be around > $300 but less than $350. o it is a high performance graphics engine on a 486/50 256k cache with 8MB: XS3 924 around 48k xstones XS3 801 around 83k xstones XS3 928 greater than 120k xstones. o there is a clear path of functional as well as a performance growth path ( I knew from day 1, when I first got my Diamond stealth that there was going to be an S3 928. And, I no longer have a Diamond Stealth!) o the documentation is publicly available The S3 basic architecture consists of an 8514/a core, vga core, and memory management. The 8514/a side of the S3 chipset is not fully compatible with the 8514/a standard. However, in Kevin Martin's X server the 8514/a instruction opcodes used are nearly identical to the S3 chipset 8514/a instructions provided. The 8514/a side of XS3-0.1 differ from Kevin's server in the way that pixels were encoded for doing stipple image transfer. The change to XS3 was minor to incorporate the difference, but difficult at the time because we did know the difference. The initialization of XS3 is done just like a svga with minor changes to incorporate the added features available in the S3 chipset. Must svga chipsets provided their added functionality slightly different, hence, the special initialization code for S3. Kevin's server was chosen because of its simplicity and a great match for the S3 chipset. Part of the confusion in the early phase of searching for a server for the S3 chipset is that S3 corporation does not really advertise the high degree of 8514/a compatibility that the S3 chipset has. In fact, when I first started I had no clue that S3 had such a degree of compatibility with the 8514/a! Much of the speed that we see today with S3 chipset is due to the built-in hardware graphic's functions provided by the chipset. An example, is line drawing, the server uses a Bresenham Line drawing algorithm implemented in the S3 chipset. However, dashed-lines, are currently implemented in software. Another example, is rectangle fills is all done in hardware. Additionally, the S3 chipset has a queue of up to 8 commands deep in the 911/924 class and is 16 command levels deep in the 801/805. Fortunately, the cost of setting up the S3 graphic operations has not proven to be a great performance drawback. Obviously, the less that we have to do the better off we are but this engineering issue must be taken into the context of how much will it cost to provide a minimal graphic set-up operation scheme. The fast text speed is due to a font cache which stores the fonts in the card's memory. We blt the characters from the cards memory to the location where we are writing the character to. This functionality is the same as in the 8514/a server. The image write, read and fill operations' performance was increased by using vga banking.We experienced a 10x performance improvement when we switched to vga banking. In the 8514/a architecture, all data transfer between the cpu and co-processor is done via the data transfer register. Also, we have to transfer the images a line at time inside a loop. If there is one area in which the S3 architecture suffers this is it! Ideally, I would like to see the chip do dma transfers from memory to the card and have it calculate the offsets into its memory and the logical converse - have the chip transfer a block of memory to consecutive region in the hosts memory. The stipple operations were improved by using tigher logic and doing 16 bit transfers as oppposed to 8 bit data transfers via the data transfer register. The server also now enjoys harware supported cursor. Some X applications such as acm (air combat simulator) run without the cursor ever flickering. In effect, is rare to see the cursor flicker, period. The 8514/a architecture does not have hardware cursor support. The 801/805 and 928 architectures are capable of mapping their entire video memory to the host's address space. Currently, we only map 64k bytes at a time. This limitation is mostly imposed to us by the kernel! Further performance improvements were achieved by compiling the server with gcc-2.3.1. Some of the x11perf results were nearly twice as fast! Overall performance improvement, using xbench, proved to be around %15. So far, we have been able to benefit from faster S3 chipset implementations, as, well as, more cpu power. For instance on my following systems: o 486/33Mhz 64k cache 8MB S3 911 46k xstones S3 801 64k xstones o 486/50Mhz 256k cache 8MB S3 801 83k xstones Note: All benchmarks were executed at 1024x768 45Mhz interlace. In the case of the 801 and 805 DRAM based architectures at higher clock rate you might experience performance degradation. However, I have not been able to the test this hypothesis put forth on this newsgroup. So, if anyone out there is running XS3 with an 801 card and has a high resolution monitor, I would appreciate if you ran xbench at 1024x768 45Mhz interlace and 1024x768 72Mhz. All I want with respect to this issue are the numbers, there has been enough postings with respect to this issue :-) Slowly, the server is evolving from its pure 8514/a architecture to the S3 architecture. The next major jump will be when 16 bit or 24 bit color gets implemented :-) Next, is what are the different S3 chipsets: o 8C911 VRAM based card. This is the first model. o 8C924 VRAM based card. In essence is the same as the 911. o 8C801 DRAM based card. Supports up to 2MB of memory. Max resolution is 1280x1024 256 colors at 60Mhz 1024x768 65k colors at 43.5 Mhz Interlace 640x480 16 million colors at 60Mhz o 8C928 VRAM based card. Will support up to 4MB of memory - don't have the functional specs for a card- I do have the databook. On Local Bus S3 cards: It is not clear at this point, whether XS3 will benefit from a local bus S3 implementation. The reason is because, most of the graphic functions used by the server are already implemented by the chip. I do expect image read/write/fill operations to benefit greatly. I do have a a Vesa Local Bus S3 805 card but I am not done with it, yet. And, I am using it right now. Will absolutly, not release the cards name till I am done with my work here! Next, is how does the S3 architecture fair agains other accelerated cards? The January issue of Byte magazine voted the Actix's GraphicEngine32 (801) as one of the best overall graphic accelarated cards for window applications. At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32). And, I really doubt that the tests were executed at low clock frequencies. However, the article did not state the dot clock frequency which the tests were executed at. The other faster cards were based on the 34020 and cost more than $1400. Dumb vga cards: On a non-accelarated vga card, all pixels in the screen are manipulated by the host computer a pixel at a time. If you do any kind of computation on the background the performance will suffer drastically. Chipsets such as the et4000 benefit tremendously when going to a local bus or EISA implementation. Currently, I don't have any xbenchmarks figures for ET4000 on a local bus system and it will be nice if someone posted their xbenchmark results. On the topic of local bus IDE cards: It takes about 6 and 50 seconds to recompile the kernel with gcc-1.39. With an ISA IDE card, it takes about 7.5 minutes :-) How much does it cost? $89. My current hardware configuration: Orchid SuperBoard 486/50Mhz 256k cache 8MB Vga cards which I used: Actix GraphicEngine(S3 801) Orchid F1280 (S3 911) xx brand Local Bus (S3 805) Diamond SpeedStar (ET4000) Orchid Local Bus IDE controller 14 inch svga monitor max resolution 1024x768 45Mhz interlace 216 MB Western Digital (All for X) 120 MB Western Digital 5 1/4 inch floppy drive 3 1/2 inch floppy drive Colorado Tape Backup system Hope this helps, Amancio Hasty -- Amancio Hasty | Home: (415) 495-3046 | ftp-site depository of all my work: e-mail hasty@netcom.com | sunvis.rtpnc.epa.gov:/pub/386bsd/incoming