*BSD News Article 9347

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5613 ; Fri, 01 Jan 93 01:50:52 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty
From: hasty@netcom.com (Amancio Hasty Jr)
Subject: Re: S3 question - Amancio, are you there?
Message-ID: <1992Dec28.054342.13142@netcom.com>
Organization: Netcom Online Communications Services (408-241-9760 login: guest)
References: <VIXIE.92Dec26034105@cognition.pa.dec.com> <1992Dec27.081525.29228@netcom.com> <Bzy9wD.9Ez@pix.com>
Date: Mon, 28 Dec 1992 05:43:42 GMT
Lines: 237

In article <Bzy9wD.9Ez@pix.com> stripes@pix.com (Josh Osborne) writes:
>In article <1992Dec27.081525.29228@netcom.com> hasty@netcom.com (Amancio Hasty Jr) writes:
>>In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes:
>[...]
>>>I see that the two greatest bit-bangers of the average computer are available
>>>as VESA cards: display, and disk.  I'm still formulating my disk controller
>>>questions and perhaps I'll ask them in a future post.  Right now I'm trying
>>>to solve the S3 mystery.
>
>One problem with VESA LB and disk drives, (I think) VESA LB doesn't allow
>bus mastering cards.  For SCSI (at least) this could be quite useful.  Of
>corse with current tech disk drives you need 3 fast disks running at once
>to use all the ISA bus.  Or you need (say, IDE) controlers with cache on them,
>but it would be better to have a auto-sizing disk cache in main memory (like
>SunOS, or Linux), because it would be (a) faster, and (b) useable as core if
>thats more useful then disk cache, (c) you know if it is flushed to disk 
>or not.
>
>>>At work I have a EISA/SVGA/34020 board.  It is very fast when run under
>>>Windows 3.1; however, Microsoft had access to the 34020 specs and I don't,
>>>so I can't figure out how to port the X server to it and noone in this
>>>newsgroup seems to have done that either.  It's too bad -- a 34020 with
>>>a minimal BITBLT interpreter downloaded into it would make for a lightening
>>>fast X11 server with the 34020 as almost a co-processor.  However, I'm
>>>fairly sure that the 34020's days are numbered given something called "S3"
>>>and the "GUI Accelerator" that seem to be taking the market by storm.
>
>The 34020 docs are available from TI, I have a set somewhere.  The cross 
>compiler is quite expensiave, and the old version makes poor code.  Someone
>got a old gcc to work (more or less) with it.  The 34020 is fairly quick,
>I would like to see a 34020 running X on it :-)  (I know it would be faster
>to do most of the X stuff on the [34]86 and let the TI bang bits).
>
>The GUI accel's are doing better then the 34020 cards because they are cheap,
>however I think you can build a 34020 card as cheap as a S3, but nobody has.
>
>>>I know that SVGA is more or less a hack on the IBM VGA spec to allow more
>>>pixels; what I don't know is what an "SVGA S3" is.  I have gathered from
>>>context in posts on this newsgroup that it is some kind of graphics
>>>accelerator chipset and that there are several different revisions of
>>>it and that different board manufacturers have had different results.
>>>Yet, VGA is fundamentally a frame buffer that has some hardware assist
>>>for certain operations.  Where does S3 fit in?  Is it another IO port, or
>>>just more opcodes to the existing VGA IO port?  Or just a faster implementation
>>>of the VGA spec?
>
>This is answered well below, but I thought I would point out that:
> * VGA only allows 64K of the video memory to be mapped into the PC's addr space
> at once
> * Most SVGAs allow 128K at once, normally 2 64K windows.
> * Some more useful, but more disgusting ways of viewing video memory are also
> available.
> * A small number of SVGA chipsets can map all of the video memory into the
> PC, but I don't know if the video cards can do it.  The 386BSD kernal will
> need to be wacked to make it work anyway.
> * The S3 adds a bunch of IO addrs on top of a normal looking SVGA chipset.
>
>>>There are two reasons I need to know this.  First, if the VGA really is "just
>>>a frame buffer", then given a fast CPU and VESA it should be trivial to get
>>>the MIT CFB server running and have it run near the theoretical maximum
>>>(though at some potentially unneccessary cost in main CPU cycles).  If on
>>>the other hand VGA is like EGA in that you can only map certain parts into
>>>memory at a time and it's generally cheaper to send high-level commands and
>>>let the graphics hardware figure out how to achieve them, then I see a
>>>problem.
>

>In genneral you can only map part of the video memory at a time.
>
>>>What problem?  Well, DEC did this really neat thing called the "Dragon" chip
>>>set back on their MicroVAX II/GPX.  It was really really fast -- if you wrote
>>>your application in FORTRAN on VMS.  On the other hand if you ran under X11,
>>>things ran doggishly slow and the visual results were often less than perfect.
>>>This is because the _only_ way to talk to a Dragon is in high-level op-codes,
>>>and the model X11 lived in was incompatible with the one the Dragon used --
>>>so achieving one X11 operation often took several, or hundreds, of Dragon
>>>operations.  Since the Dragon's speed came from its economy of scale, the
>>>speed was less than amazing.
>
>I don't know much about the dragon (is that the hardware made out of N 
>vipers?), but the S3, Mach8, Mach32, and even the 8514/a (or whatever it is)
>have accel for short line segments which I think match up quite well with
>the MI code in DDX's use of "spans" (not 100% short lines have limited length,
>spans do not), so even when the exact graphics command X wants is not supported
>by the hardware, this is (and should be faster then just pushing bits onto
>a dumb buffer, except for really small spans).
>
>[...]
>>>So here comes S3.  Is it the salvation to all the world's woes?  That depends.
>>>Given VESA, one can access the VGA's "array" at memory speed (barring refresh
>>>stalls -- that whole thing isn't dual-ported, is it?).  Is that enough?  Or,
>>>if not, is it the S3 that gives one the extra performance and/or op-codes that
>>>make X11 sing?  And, if that last is true, why isn't an S3 on EISA or even ISA
>>>"fast enough" ?
>
>I *think* (someone *please* correct me if I am wrong!) most of the numbers
>(even the 70k+ ones) were with ISA S3 cards (they may have been in a EISA
>system 'tho).

 I have an  486/50MHz system (ISA and Vesa Local Bus) and the 83k xstone
 posting is for the  Actix GraphicEngine32 (S3 8C801 1MB DRAM ISA card).

>
>[...now the Hasty-miester speekith...]
>>The image write, read and fill operations' performance was increased by
>>using vga banking.We experienced a 10x performance improvement when 
>>we switched to vga banking. In the 8514/a architecture, all data transfer
>>between the cpu and co-processor is done via the data transfer register.
>>Also, we have to transfer the images a line at time inside a loop.
>>If there is one area in which the S3 architecture suffers this is it!
>>Ideally, I would like to see the chip do dma transfers from memory
>>to the card and have it calculate the offsets into its memory and 
>>the logical converse - have the chip  transfer a block of memory
>>to consecutive region in the hosts memory.
>
>How about XCopyPlane (in XOR mode)?  I don't have a S3 card (yet), but thats
>the single most important thing for my application...
Will let you know how fast it is :-) And, I will like to know what is your
application?

>
>[...]
>>The 801/805 and 928 architectures are capable of mapping their entire video
>>memory to the host's address space. Currently, we only map 64k bytes at a
>>time. This limitation is mostly imposed to us by the kernel!
>
>Can the video cards do this? 
Yes, the 801/805 are capable of mapping up to 2MB of memory
>  I assume the problem w/ the kernal is allocating
>physicly contigous RAM?

Yes this is main problem..
>  The best way to do this is add a new flag to the
>memmory allocator.  The simplest way is to have the device probe allocate the
>VM you need during boot when most allocations will be contigous, confirm that
>is _is_ contigous and go on...
Tnks, we are looking into it right now...

(>
>>Further performance improvements were achieved by compiling the server
>>with gcc-2.3.1. Some of the x11perf results were nearly twice as fast!
>>Overall performance improvement, using xbench, proved to be around %15.
>
>Did you remember to use -m 486 (to produce code that runs fast on the 486,
>but still runs on the 386), or just have it do 386 code?
Yes, we use the -m 486 flag. In fact this was one of the highest motivations
for compiling the server using gcc-2.3.1. I am using gcc-2.3.2 in machine
and will soon upgrade to gcc-2.3.3 :-)

>
>[...]
>>Slowly, the server is evolving from its pure 8514/a architecture to the
>>S3 architecture. The next major jump will be when 16 bit or 24 bit
>>color gets implemented :-)
>
>I thought the next big jump would be when you can map in 1+M of video memory
>and use it...
In terms of adding functionality which is not available today, I think we
should start working on 16/24 bit colors. At any rate, this is my choice :-)

Most of the graphics operations don't addressed directly the video memory.
image write/read/fill are the only X operations which access the video memory
directly.

For instance, here is a sample code which moves characters from the card's
memory to the desired location in the display:
                WaitQueue(7);
                outpw(CUR_X, (short)(ibm8514FC_X+(((int)chars[i])%32)*FC_MAX_WI\
DTH));
                outpw(CUR_Y, (short)(ibm8514FC_Y+(((int)chars[i])/32)*FC_MAX_HE\
IGHT));
                outpw(DESTX_DIASTP, (short)(x + pci->metrics.leftSideBearing));
                outpw(DESTY_AXSTP, (short)(y - pci->metrics.ascent));
                outpw(MAJ_AXIS_PCNT, (short)(GLYPHWIDTHPIXELS(pci)-1));
		 outpw(MULTIFUNC_CNTL, MIN_AXIS_PCNT |
                      (short)(GLYPHHEGHTPIXELS(pci)-1)
                outpw(CMD, CMD_BITBLT | INC_X | INC_Y | DRAW | PLANAR | WRTDATA\
);

>C

>[...]
>>Next, is how does the S3 architecture fair agains other accelerated cards?
>>
>>The January issue of Byte magazine voted the Actix's GraphicEngine32 (801)
>>as one of the best overall graphic accelarated cards for window applications.
>>At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32).
>>And, I really doubt that the tests were executed at low clock frequencies.
>>However, the article did not state the dot clock frequency which the tests 
>>were executed at.  The other faster cards were based on the 34020 and cost
>>more than $1400.
>
>People have had the S3 for long enough to make good use of it, the Mach32 may
>be too new for good drivers to be available yet. 

I am assuming that Byte use ATI's drivers in their benchmark.

> If people decide that the
>34020 cards don't need to emulate SVGA/EGA/CGA/Herc in hardware the price
>should drop by more then $1000, if they insist on doing that the price may
>drop by about $1000.  This would be the best card for X, because the 34020
>is fully programable and can be made more X orientated then windows orientated
>Also, the 340xx has super great control over the display (size/shape/res/
>borders).  The 34020 can even use the VRAM serial write regs...
>
>[...]
>>On the topic of local bus IDE cards:
>>
>>It takes about 6 and 50 seconds to recompile the kernel with  gcc-1.39.
>>With an ISA IDE card, it takes about 7.5 minutes :-)

>>
>>How much does it cost? $89.
>
>What does "6 and 50 seconds" mean?  Most IDE local bus cards mainly add lots
>of cache.  We can do better by adding more RAM to the main system and using
>it wisely...
6 minutes and 50 seconds vs 7.5 minutes to compile the kernel.
Orchid claims an 8MB data transfer rate and I am not going to get into
a long philophical argument here with respect to what is a good benchmark
for disk controllers :-)
The Vesa Local Bus IDE controller is a non-caching controller and please
 don't forget it costs $89! 
>
>[...]
>-- 
>           stripes@pix.com              "Security for Unix is like
>      Josh_Osborne@Real_World,The          Multitasking for MS-DOS"
>      "The dyslexic porgramer"                  - Kevin Lockwood
>We all agree on the necessity of compromise.  We just can't agree on
>when it's necessary to compromise.       - Larry Wall


-- 
Amancio Hasty           |  
Home: (415) 495-3046    |  ftp-site depository of all my work:
e-mail hasty@netcom.com	|  sunvis.rtpnc.epa.gov:/pub/386bsd/incoming