*BSD News Article 7519

Xref: sserve comp.unix.bsd:7569 comp.benchmarks:2329 comp.arch:27947 comp.arch.storage:671
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!darwin.sura.net!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!rutgers!cbmvax!jesup
From: jesup@cbmvax.commodore.com (Randell Jesup)
Newsgroups: comp.unix.bsd,comp.benchmarks,comp.arch,comp.arch.storage
Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone
Message-ID: <36794@cbmvax.commodore.com>
Date: 7 Nov 92 23:45:36 GMT
References: <1992Nov7.102940.12338@igor.tamri.com>
Reply-To: jesup@cbmvax.commodore.com (Randell Jesup)
Organization: Commodore, West Chester, PA
Lines: 214

jbass@igor.tamri.com (John Bass) writes:
>There should be something to learn or debate in this posting for everyone!

	Yup.  Good posting.

>There are a number of significant issues in comparing IDE vs SCSI-2, to
>avoid comparing apples to space ships -- this topic is loaded with traps.

	Very true.  Hopefully I can show some of the issues from a non-clone
non-Unix point of view.

>For years people have generally claimed SCSI to be faster than ESDI/IDE
>for all the wrong reasons ... this was mostly due to the fact that
>SCSI drives implemented lookahead caching of a reasonable length before
>caching appeared in WD1003 interface compatable controllers (MFM, RLL, ESDI,
>IDE). Today, nearly all have lookahead caching. Properly done, ESDI/IDE
>should be slightly faster than an equivalent SCSI implementation. This
>means hostadapters & drives of equivalent technology.

	You're right about the early SCSI days - for example, scsi got zone
recording earlier, etc.  The base technology in IDE/SCSI is now the same, so
it comes down to software and interface issues.  I'll agree that for a single-
disk system and small transfers, IDE has a slight margin in speed over SCSI,
due to lower command overhead.  This may change as SCSI-fast becomes more
universal, especially in large transfers and as drives get into >4MB/s
sustained rates.

>PC Architecture and I/O subsystem architecture issues are the real answer
>to this question ... it is not only a drive technology issue.

	Quite.  For later reference, Commodore uses both SCSI and IDE.  IDE
is used in the A600, A1200, and the A4000 (though SCSI boards are coming
for that), and we have done several SCSI boards in the past (A2090, A2091,
A590), and machines with scsi (A3000).  The IDE is hooked up to 680x0
interfaces in a memory-mapped manner, and is read with interrupts and
programmed-IO (our SCSI controllers are all DMA-driven (bus-master DMA in
PC-speak)).

	Why do we use IDE for some machines, particularily the low end?
Because it's _cheap_ (for the host).

>First, dumb IDE adapters are WD1003-WHA interface compatible, which means:
>
>	1) the transfer between memory and controller/drive are done in
>	software .... IE tight IN16/WRITE16 or READ16/OUT16 loop. The
>	IN and OUT instructions run a 286-6mhz bus speeds on all ISA
>	machines 286 to 486 ... time invariant ... about 900us per 16bits.
>	This will be refered to as Programmed I/O, or PIO.

	I think you mean 900ns, or you have very slow drives... :-)  This gives
a max burst rate of around 2MB/s.

>	2) the controller interrupts once per 512 sector, and the driver
>	must transfer the sector. On drives much faster than 10mbit/sec
>	the processor is completely saturated during diskio interrupts
>	and no concurrent user processing takes place. This is fine for
>	DOS, but causes severe performance problems for all multitasking OS's.
>	(Poor man's disconnect/reconnect, allows multiple concurrent drives).

	Once per sector?  Don't PC's use the ReadMultiple/WriteMultiple
commands?  I guess not (which matches what I've heard elsewhere).  Our IDE
implementations will use read/write multiple to reduce CPU interrupt and
task-switching overhead on longer reads, and we transfer up to 16 sectors per
interrupt.  BTW, we have on occasion found bugs in some drives' RWMultiple
commands, or funny performance results, like slower throughput with larger
transfers.

	I don't understand your comment about "poor man's disconnect".  While
you may be waiting for an interrupt, unless you have multiple IDE busses you
can't use your second IDE drive until the IO on the first is complete.

	Disconnect is where SCSI can make big wins over IDE.  First, especially
with multi-tasking OS's, you can keep multiple drives busy at once (since
drives are usually slower than the bus, and are often seeking as well).  Also
it can be very useful in supporting slower devices, like tape drives and
CDROMs, because you can burst the data quickly, and then disconnect while
the slow device writes/reads from buffers (and so avoid killing regular disk
performance while using slow devices).  And of course you can hook up more
than disks to SCSI...  SCSI also supports removable media better, though
the CAM-ATA group is working on a removable media extension to IDE (ATA).

>	5) for sequential disk reads with lookahead caching, the
>	system is completely CPU bound for drives above 10mbit/sec.
>	All writes, and reads without lookahead caching lose one
>	rev per request, unless a very large low level interleave
>	is present. 1:1 is ALMOST ALWAYS the best performing interleave,
>	even taking this into account, due to multiple sector requests
>	at the filesystem and paging interfaces.

	Write-buffering (starting to be commonly available as an option on
SCSI drives) can help in avoiding slipping revs.  So long as ordering is
maintained, write-buffering in practice is fine for most uses of desktop
systems.  Of course, this can apply to either IDE or SCSI, but for IDE you'll
have to add commands to turn it on, or have it default to on, or use jumpers.

>There will be strong a market for high performance IDE hostadapters
>in the future, for UNIX, NOVEL, OS/2 and NT ... which are NOT PIO via
>IN/OUT instructions. Both ISA memory mapped and Bus Mastering IDE
>host adapters should appear soon .... some vendors are even building
>RAID IDE hostadapters. I hope this article gets enough press to make
>endusers/vars aware enough to start asking for the RIGHT technology.
>While the ISA bus used like this is a little slow, it is fast enough
>to handle the same transfer rates and number of drives as SCSI.

	I'd think the ISA bus would run out of steam before SCSI (or a faster
IDE or RAID-IDE) would.  EISA/MCA could handle it, but ISA doesn't deal well
with 4+MB/s transfers (and cuts CPU performance a lot at those speeds, too,
for the MT OS's you mention).

>What the market really needs are some CHEAP but very dumb IDE and SCSI
>adapters that are only a bus to bus interface with a fast Bus Mastering
>DMA for each drive. In theory these would be a medium sized gate array
>for IDE, plus a 53C?? for SCSI and cost about $40 IDE, and $60 SCSI.

	Well, Commodore has built SCSI controllers with a WD scsi chip, a
gate array to handle bus/chip interface issues and a DMA FIFO, random pals/
connectors/board/eproms, and we even threw on 2MB of ram sockets because
it cost almost nothing (given the gate array).  I don't know the cost (and
couldn't say if I did), but it's not expensive.  The board is currently
available from dealers for as low as $49 (though that may be an overstock
price - it cost ~$200+ when introduced in '88, I think).  This can do up to
~1.7MB/s (limited by the old Zorro-II bus interface and a simplistic gate
array).  Basically the same interface is on A3000's, with a wider/smarter
gate array and on the '030 bus, and there we've gotten 4.2MB/s sustained,
with <20% bus usage.

	The 53c7x0 series has potential to get every possible bit of
performance out of a SCSI bus, up to the limit of your drives/devices.  They
also require little surrounding logic, since they have bus-mastering built-in,
and can basically sit on the host bus (x86 or 680x0) and handle the entire
IO, including disconnect/reselect.  This means little/no host CPU overhead,
minimal host bus overhead, minimal command latency, and reasonable cost
(they're not cheap, but you don't need DMA controllers or gate arrays to
use them).

> For
>486 systems they would blow the sockets off even the fastest adapters built
>today since the 486 has faster CPU resources to follow SCSI protocol -- more
>so than what we find on the fast adapters, and certainly faster than the
>crawlingly slow 8085/Z80 adapters. With such IDE would be both faster and
>cheaper than SCSI -- maybe we would see more IDE tapes and CD ROMS. Certainly
>the products firmware development would be shorter than any SCSI-II effort.

	The IDE drives would be (and always will be) slightly cheaper (not
much).  The host interface for IDE is slightly cheaper (though once you add
a gate array/chip/etc to handle bus-master DMA, the gap narrows).  As a pure,
single-disk interface the IDE will be slightly faster (as per above).  As
a multi-disk interface, or generalized IO interface (tape drives, CDROM, etc)
SCSI has a large edge.  Also, IDE can only handle 2 devices.  Even if IDE
tape drives and CDROMs were available (they're not), you'd rapidly start
needing multiple IDE interfaces.

>All IDE and SCSI drives have a microprocessor which oversees the bus and
>drive operation. Generally this is a VERY SLOW 8 bit micro ... 8048, Z80,
>or 8085 core/class CPU. The IDE bus protocol is MUCH simpler than SCSI-2,
>which allows IDE drives to be more responsive. Some BIG/FAST/EXPENSIVE
>SCSI drives are starting to use 16 micro's to get the performance up.

	Right.  This is the reason IDE is slightly faster for small, frequent
IO's.

>High performance hostadapters on the ESIA and MC platforms are appearing
>that have fast 16 bit micros ... and the current prices reflect not only
>the performance .... but reduced volumes as well.

	There's no reason (except maybe slow bus speeds or large fancy
caches) why SCSI adapters should need their own processors.  This is much as
you stated above.

>Lookahead caches are very good things ... but fragile ... the filesystem,
>disksort, and driver must all attempt to preserve locality long enough to
>allow them to work. This is a major problem for many UNIX systems ... DOS
>is usally easy .... single process, mostly reads. Other than the several
>extent based filesystems (SGI) ... the balance of the UNIX filesystems
>fail to maintain locality during allocation of blocks in a file ... some
>like the BSD filesystem and SCO's AFS manage short runs ... but not good
>enough.  Log structured filesystems without extensive cache memory suffer
>and late binding suffer the same problem.

	Commodore's filesystems generally manage to put large blocks of
data in contiguous chunks.  I'd hope that HPFS does also.

>One final note on caching hostadapters ... the filesystem should make better
>use of any memory devoted to caching, compared with ANY hostadapter. Unless
>there is some unreasonable restriction, the OS should yield better
>performance with the additional buffer cache space than the adapter.

	This is Commodore's approach.

>Even the fastest 486 PC UNIX systems are filesystem CPU bound to between
>500KB and 2.5MB/sec ... drive subsystems faster than this are largely
>useless (a waste of money) ... especially most RAID designs. Doing
>page flipping (not bcopy) to put the stuff into user space can improve
>things if aligned properly inside well behaved applications.

	This is a combination of poor interfaces and the OS interface.
AmigaDos tries to do transfers direct to user buffers where possible,
especially for large reads, and reserve most of it's buffer space for
small (<1 block) reads and filesystem structures (file headers, etc).  This
can provide very low-overhead IO for the common cases.  For example, with
a drive that does ~4.3MB/s off the platters, and 4.2MB/s through the scsi
controller, we can get 4.0MB/s through the filesystem (with little impact
on available CPU).

	I suspect with work done in the filesystems and with the right bus
and controller, drives faster than 2MB/s could be useful for the 486 PC Unix
systems you mention above.

-- 
To be or not to be = 0xff
-
Randell Jesup, Jack-of-quite-a-few-trades, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Disclaimer: Nothing I say is anything other than my personal opinion.