*BSD News Article 9323

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5576 ; Fri, 01 Jan 93 01:49:33 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty
From: hasty@netcom.com (Amancio Hasty Jr)
Subject: Re: S3 question - Amancio, are you there?
Message-ID: <1992Dec27.081525.29228@netcom.com>
Organization: Netcom Online Communications Services (408-241-9760 login: guest)
References: <VIXIE.92Dec26034105@cognition.pa.dec.com>
Date: Sun, 27 Dec 1992 08:15:25 GMT
Lines: 287

In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes:
>I could have addressed this directly to Amancio, but I am betting that a
>lot of other folks would like to know the answer.  I have been away from
>the PC UNIX world for a while now (years, really) but I am presently
>taking a look at different PC configurations for possible BSD/386 or 386BSD
>use.  I've already determined that Localbus ("VESA") is more cost effective
>than EISA (in terms of useful-bit-made-faster per dollar-spent) and that
>a VESA/ISA system is probably what I want unless the price difference of a
>VESA/EISA system is within epsilon of what I have in my new-computer fund.
>
>I see that the two greatest bit-bangers of the average computer are available
>as VESA cards: display, and disk.  I'm still formulating my disk controller
>questions and perhaps I'll ask them in a future post.  Right now I'm trying
>to solve the S3 mystery.
>
>At work I have a EISA/SVGA/34020 board.  It is very fast when run under
>Windows 3.1; however, Microsoft had access to the 34020 specs and I don't,
>so I can't figure out how to port the X server to it and noone in this
>newsgroup seems to have done that either.  It's too bad -- a 34020 with
>a minimal BITBLT interpreter downloaded into it would make for a lightening
>fast X11 server with the 34020 as almost a co-processor.  However, I'm
>fairly sure that the 34020's days are numbered given something called "S3"
>and the "GUI Accelerator" that seem to be taking the market by storm.
>
>I know that SVGA is more or less a hack on the IBM VGA spec to allow more
>pixels; what I don't know is what an "SVGA S3" is.  I have gathered from
>context in posts on this newsgroup that it is some kind of graphics
>accelerator chipset and that there are several different revisions of
>it and that different board manufacturers have had different results.
>Yet, VGA is fundamentally a frame buffer that has some hardware assist
>for certain operations.  Where does S3 fit in?  Is it another IO port, or
>just more opcodes to the existing VGA IO port?  Or just a faster implementation
>of the VGA spec?
>
>There are two reasons I need to know this.  First, if the VGA really is "just
>a frame buffer", then given a fast CPU and VESA it should be trivial to get
>the MIT CFB server running and have it run near the theoretical maximum
>(though at some potentially unneccessary cost in main CPU cycles).  If on
>the other hand VGA is like EGA in that you can only map certain parts into
>memory at a time and it's generally cheaper to send high-level commands and
>let the graphics hardware figure out how to achieve them, then I see a
>problem.
>
>What problem?  Well, DEC did this really neat thing called the "Dragon" chip
>set back on their MicroVAX II/GPX.  It was really really fast -- if you wrote
>your application in FORTRAN on VMS.  On the other hand if you ran under X11,
>things ran doggishly slow and the visual results were often less than perfect.
>This is because the _only_ way to talk to a Dragon is in high-level op-codes,
>and the model X11 lived in was incompatible with the one the Dragon used --
>so achieving one X11 operation often took several, or hundreds, of Dragon
>operations.  Since the Dragon's speed came from its economy of scale, the
>speed was less than amazing.
>
>That seems to be what kills EGA (and non-SVGA VGA) performance on PC's.  You
>can either send lots of not-exactly-what-you-wanted high level operations
>down the "wire" or you can write to memory over a very slow bus.  Either way
>things are very very slow.
>
>So here comes S3.  Is it the salvation to all the world's woes?  That depends.
>Given VESA, one can access the VGA's "array" at memory speed (barring refresh
>stalls -- that whole thing isn't dual-ported, is it?).  Is that enough?  Or,
>if not, is it the S3 that gives one the extra performance and/or op-codes that
>make X11 sing?  And, if that last is true, why isn't an S3 on EISA or even ISA
>"fast enough" ?
>
>I know that Amancio's numbers indicate that the problem _is solved_, one way
>or another.  But before I consider plunking money down to buy one of these
>boxes, I would very much like to know _how_ it was solved.  And, I would like
>to know the answer to the perennial question: "which VESA S3 card is fastest,
>and why?"
>
>Thanks in advance...
>--
>Paul Vixie, DEC Network Systems Lab	
>Palo Alto, California, USA         	"Don't be a rebel, or a conformist;
><vixie@pa.dec.com> decwrl!vixie		they're the same thing, anyway.  Find
><paul@vix.com>     vixie!paul		your own path, and stay on it."  -me

I chose the S3 chipset because:

o	it offers a relative low cost 

	Recently, S3 cards have been listed for less than $200 
	I expect the following price break down:
	911/924 to cost around $170
	801 (ISA) / 805 (local bus or EISA) $200 and $250.
	928 ?? to be around > $300 but less than $350.
	
o	it is a high performance graphics engine

	on a 486/50 256k cache with 8MB: 
		XS3 924 around 48k xstones
		XS3 801 around 83k xstones
		XS3 928 greater than 120k xstones.

o	there is a clear path of functional as well as a performance 
	growth path ( I knew from day 1, when I first got my Diamond
	stealth that there was going to be an S3 928. And, I no longer
	have a Diamond Stealth!)

o	the documentation is publicly available

The S3 basic architecture consists of an 8514/a core, vga core, and memory
management. The 8514/a side of the S3 chipset is not fully compatible
with the 8514/a standard. However, in Kevin Martin's X server the 8514/a
instruction opcodes used are nearly identical to the S3 chipset 8514/a
instructions provided. The 8514/a side of XS3-0.1 differ from Kevin's
server in the way that pixels were encoded for doing stipple image
transfer. The change to XS3 was minor to incorporate the difference, but
difficult at the time because we did know the difference.

The initialization of XS3 is done just like a svga with minor changes
to incorporate the added features available in the S3 chipset. Must
svga chipsets provided their added functionality slightly different, hence,
the special initialization code for S3.

Kevin's server was chosen because of its simplicity and a great match for the
S3 chipset. Part of the confusion in the early phase of searching for a
server for the S3 chipset is that S3 corporation does not really advertise
the high degree of 8514/a compatibility that the S3 chipset has. In fact,
when I first started I had no clue that S3 had such a degree of compatibility
with the 8514/a!

Much of the speed that we see today with S3 chipset is due to the built-in
hardware graphic's functions provided by the chipset. An example, is line
drawing, the server uses a Bresenham Line drawing algorithm implemented 
in the S3 chipset. However, dashed-lines, are currently implemented in
software. Another example, is rectangle fills is all done in hardware.
Additionally, the S3 chipset has a queue of up to 8 commands deep in the
911/924 class and is 16 command levels deep in the 801/805.
Fortunately, the cost of setting up the S3 graphic operations has not proven
to be a great performance drawback. Obviously, the less that we have to
do the better off we are but this engineering issue must be taken into
the context of how much will it cost to provide a minimal graphic set-up
operation scheme.

The fast text speed is due to a font cache which stores the fonts in 
the card's memory. We blt the characters from the cards memory to 
the location where we are writing the character to. This functionality
is the same as in the 8514/a server.

The image write, read and fill operations' performance was increased by
using vga banking.We experienced a 10x performance improvement when 
we switched to vga banking. In the 8514/a architecture, all data transfer
between the cpu and co-processor is done via the data transfer register.
Also, we have to transfer the images a line at time inside a loop.
If there is one area in which the S3 architecture suffers this is it!
Ideally, I would like to see the chip do dma transfers from memory
to the card and have it calculate the offsets into its memory and 
the logical converse - have the chip  transfer a block of memory
to consecutive region in the hosts memory.

The stipple operations were improved by using tigher logic and doing
16 bit transfers as oppposed to 8 bit data transfers via the data
transfer register.

The server also now enjoys harware supported cursor. Some X applications such
as acm (air combat simulator) run without the cursor ever flickering.
In effect, is rare to see the cursor flicker, period. The 8514/a architecture
does not have hardware cursor support.

The 801/805 and 928 architectures are capable of mapping their entire video
memory to the host's address space. Currently, we only map 64k bytes at a
time. This limitation is mostly imposed to us by the kernel!


Further performance improvements were achieved by compiling the server
with gcc-2.3.1. Some of the x11perf results were nearly twice as fast!
Overall performance improvement, using xbench, proved to be around %15.

So far, we have been able to benefit from faster S3 chipset implementations,
as, well as, more cpu power. For instance on my following systems:

o	486/33Mhz 64k cache 8MB  
	S3 911	46k xstones
	S3 801	64k xstones

o	486/50Mhz 256k cache 8MB
	S3 801	83k xstones
	
Note:
	All benchmarks were executed at 1024x768 45Mhz interlace.
	In the case of the 801 and 805 DRAM based architectures at higher
	clock rate you might experience performance degradation. However,
	I have not been able to the test this hypothesis put forth on this
	newsgroup. So, if anyone out there is running XS3 with an 801 card
	and has a high resolution monitor, I would appreciate if you ran
	xbench at 1024x768 45Mhz interlace and 1024x768 72Mhz. All I want
	with respect to this issue are the numbers, there has been enough
	postings with respect to this issue :-)

Slowly, the server is evolving from its pure 8514/a architecture to the
S3 architecture. The next major jump will be when 16 bit or 24 bit
color gets implemented :-)


Next, is what are the different S3 chipsets:

o 8C911 	VRAM based card. This is the first model.
o 8C924 	VRAM based card. In essence is the same as the 911.
o 8C801		DRAM based card. Supports up to 2MB of memory.
		Max resolution is 1280x1024 256 colors at 60Mhz
		1024x768 65k colors at 43.5 Mhz Interlace
		640x480  16 million colors at 60Mhz

o 8C928		VRAM based card.  Will support up to 4MB of memory
		- don't have the functional specs for a card-
		I do have the databook.


On Local Bus S3 cards:

It is not clear at this point, whether XS3 will benefit from a local bus
S3 implementation. The reason is because, most of the graphic functions
used by the server are already implemented by the chip. I do expect
image read/write/fill operations to benefit greatly. I do have a 
a Vesa Local Bus S3 805 card but I am not done with it, yet. And, I am
using it right now. Will absolutly, not release the cards name
till I am done with my work here!


Next, is how does the S3 architecture fair agains other accelerated cards?

The January issue of Byte magazine voted the Actix's GraphicEngine32 (801)
as one of the best overall graphic accelarated cards for window applications.
At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32).
And, I really doubt that the tests were executed at low clock frequencies.
However, the article did not state the dot clock frequency which the tests 
were executed at.  The other faster cards were based on the 34020 and cost
more than $1400.


Dumb vga cards:

On a non-accelarated vga card, all pixels in the screen are manipulated
by the host computer a pixel at a time. If you do any kind of computation
on the background the performance will suffer drastically. Chipsets such
as the et4000 benefit tremendously when going to a local bus or EISA 
implementation. Currently, I don't have any xbenchmarks figures for
ET4000 on a local bus system and it will be nice if someone posted their
xbenchmark results.


On the topic of local bus IDE cards:

It takes about 6 and 50 seconds to recompile the kernel with  gcc-1.39.
With an ISA IDE card, it takes about 7.5 minutes :-)

How much does it cost? $89.


My current hardware configuration:

Orchid SuperBoard 486/50Mhz 256k cache 8MB

Vga cards which I used:

	Actix GraphicEngine(S3 801)
	Orchid F1280 (S3 911)
	xx brand Local  Bus (S3 805)
	Diamond SpeedStar (ET4000)


Orchid Local Bus IDE controller

14 inch svga monitor max resolution 1024x768 45Mhz interlace


216 MB Western Digital (All for X)
120 MB Western Digital 


5 1/4 inch floppy drive
3 1/2 inch floppy drive

Colorado Tape Backup system 

Hope this helps,
Amancio Hasty




-- 
Amancio Hasty           |  
Home: (415) 495-3046    |  ftp-site depository of all my work:
e-mail hasty@netcom.com	|  sunvis.rtpnc.epa.gov:/pub/386bsd/incoming