*BSD News Article 7552

Xref: sserve comp.arch:27981 comp.unix.bsd:7602 comp.os.linux:15102
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek
From: torek@horse.ee.lbl.gov (Chris Torek)
Newsgroups: comp.arch,comp.unix.bsd,comp.os.linux
Subject: Re: IDE faster than Mips SCSI disk?????
Date: 8 Nov 1992 12:00:26 GMT
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 123
Message-ID: <27298@dog.ee.lbl.gov>
References: <1992Nov6.033942.21194@ntuix.ntu.ac.sg> <1992Nov6.142946.17430@walter.bellcore.com>
Reply-To: torek@horse.ee.lbl.gov (Chris Torek)
NNTP-Posting-Host: 128.3.112.15

In article <1992Nov6.142946.17430@walter.bellcore.com> mo@bellcore.com writes:
>one must be very careful that one isn't comparing the
>performance of two disk controllers and not the attached drives.
>... the problem, however, is really *knowing* what one is measuring.

Indeed.  Mike O'Dell knows whereof he speaks.  In this case, of course, 
the comparison is between overall system performance.

Whenever you run any benchmark, all you are *definitely* measuring is
the speed of the system at that particular benchmark.  Running bonnie
or iozone on a MIPS and a 386 box will tell you how fast those systems
are at running bonnie or iozone.

Now, the authors of benchmarks try hard to write them such that they
correlate well with performance at other things.  With any luck, the
performance you get running bonnie is fairly indicative of the
performance you will get running other I/O bound tasks.  But all you
have really measured is overall system performance, NOT individual
disk, controller, protocol, bus, cpu, memory, or kernel code
performance.

If you can hold most of these variables constant, you may be able to
get meaningful comparisons between IDE and SCSI drives, but in
practise, you have to change at least two variables simultaneously.

As far as `knowing your measurements' goes...

In I/O, there are at least two interesting numbers: `latency' and
`throughput'.  Latency is essentially `turnaround time'---the time lag
from when you ask for something until when it actually happens.
Throughput is `total work done'.  Usually you want small latencies and
large throughput: things get done immediately, and lots of things
get done.  But getting both is expensive.

As a general rule, things you do to reduce latency generally worsen
throughput.  Things you do to increase throughput also increase
latency.  (There are exceptions to this general rule.)

Large latencies do not necessarily reduce throughput.  At a busy
supermarket, for instance, one generally has to wait in line quite a
while (high latency) but the actual volume of people moving through all
checkouts tends to be high.  (Supermarkets use parallelism: lots of
lines.  This is one exception: parallelism generally improves
throughput without having much effect on latency---but not always;
it only helps if you do not run into another, separate bottleneck.)

With disk drives and controllers, fancy protocols can improve
throughput (by allowing more stuff to run in parallel), but they often
have a high cost in latency.  As others have noted, manufacturers are
notoriously cheap, and tend to use 8085s and Z80s and such in their I/O
devices.  These CPUs are horrendously slow, and take a long time to
interpret the fancy protocols.

Raw disk throughput depends entirely on bit density.  If the platters
spin at 3600 rpm, and there are 60 512-data-byte sectors on each track,
we find that exactly 3600 sectors pass under the read/write head
(assuming there is only one per track) each second.  This is 1.8 MB
of data, and therefore the fastest the disk could possibly read or
write is 1.8 MB/s.  Seek times and head switch or settle delays
will only reduce this.

Disk latency is influenced by a number of factors.  Seek time is
relevant, but the so-called `average seek time' is computed using a
largely baseless assumption, namely that sectors are requested in a
uniform random fashion.  In all real systems, sector requests follow
patterns.  The exact patterns depends greatly on systemic details that
are not well suited to analysis, so manufacturers use the randomness
assumption and calculate average seek time using 1/3 the maximum stroke
distance.  (Actuators tend to be nonlinear, so time(1/3 max distance)
!= 1/3 time(max distance).  On the other hand, the latter number is
typically smaller; manufacturers may thus use it either out of
`specsmanship' or simple ignorance....)

Rotational rate also affects latency.  The faster the disk spins, the
faster any particular sector will appear under the heads.  On the other
hand, if the disk spins quickly, and the overall system runs slowly,
the next sector may have already `flown by' just after it gets
requested.  This is called `blowing a rev', and is the basis of disk
interleaving.  One can also compensate for this with track buffers, but
those require hardware, which is considered expensive.  Interleaving
can be done in software: in the drive, in the controller, in the
driver, or in the file system, or in any combination thereof.  Figuring
out where and when to interleave, and how much, is horrendously
confusing, because all the factors affect each other, and they are
usually not plainly labelled (they are all just called `the interleave
factor', as if there were only one).

When you start building complete systems, other effects creep in.
For instance, separate disk drives can seek separately, in parallel.
Some busses permit this, some do not.  The SCSI protocol includes a
SEEK command, which may or may not be implemented in any particular
SCSI unit.  SCSI also allows `disconnecting', so that a target can
release the bus while a disk seeks, but this requires cooperation
from the host adapter.  Early versions of SCSI, and the original
SASI from which SCSI evolved, did not permit disconnecting, so it
was added as an option.

Disk controllers often have caches; these may or may not be separate
from track buffers.  The algorithms used to fill and flush these
caches will affect both latency and throughput.

The busses involved will put their own constraints on timing, affecting
both latency and throughput.  The CPU's involvement, if any, will also
affect both.  The design and coding of the kernel and the various
drivers all have some effect.  Most Unix systems do memory-to-memory
copies, whether as DMA from device memory into a buffer cache followed
by buffer-to-user copies, or DMA directly to user space, or whatever;
often this means that the memory bus speed controls total throughput.
(It used to be that memory speeds outpaced peripheral speeds
comfortably, and hence kernels did not need to worry about this.  Now
the equations have changed, but many kernels have not.  Incidentally,
note that DMA to user space has its drawbacks: if the resulting page is
not copy-on-write or otherwise marked clean, a later request for the
same sector must return to the disk, rather than reusing the cached
copy.)

All in all, system characterisation is extremely difficult.  Single-
number measures should be taken with an entire salt lick.  Good
benchmarks must include more information---if nothing else, at least a
complete system description.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov