*BSD News Article 86310

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!news.mel.aone.net.au!grumpy.fl.net.au!news.webspan.net!www.nntp.primenet.com!nntp.primenet.com!dciteleport.com!worldnet.att.net!news.mathworks.com!enews.sgi.com!chronicle.mti.sgi.com!news
From: Dror Maydan <maydan@mti.sgi.com>
Newsgroups: comp.unix.bsd.freebsd.misc,comp.arch,comp.benchmarks,comp.sys.super
Subject: Re: benchmarking discussion at Usenix?
Date: Thu, 09 Jan 1997 10:45:05 -0800
Organization: Silicon Graphics
Lines: 39
Distribution: inet
Message-ID: <32D53CB1.41C6@mti.sgi.com>
References: <5am7vo$gvk@fido.asd.sgi.com> <32D3EE7E.794B@nas.nasa.gov>
NNTP-Posting-Host: three.mti.sgi.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.0S (X11; I; IRIX 6.2 IP20)
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:33688 comp.arch:62201 comp.benchmarks:18685 comp.sys.super:6828

Hugh LaMaster wrote:
> 
> Question: Aside from (pure) latency (currently measured by lmbench)
> and bandwidth (measured by lmbench and STREAM), does concurrency matter?
> 
> Answer: I think so.  There are a number of cases of interest.
> "Bandwidth" is (often) based on stride-1 concurrency.  Also interesting:
> Concurrency with stride-N.  Concurrency on gather/scatter.
> 
> Putting everything in units of time (seconds x 10 ^^ -N) to:
> latency (lmbench):         fetch          random address or datum
> stride-1 (1/bandwidth):    fetch&process  time/unit of contiguous data
> stride-N (1/bandwidth):    fetch&process  every N-th datum
> gather/scatter (1/B-W):    fetch&process  random data
> subword (1/bandwidth):     fetch&process  8/16/(32) data bits
>                                           within word
> 
> For some machines, the bandwidth doesn't vary much, with roughly
> constant bandwidth over all possibilities; on some machines, extra
> load/store paths allow 2-3X improvements; on some machines, subword
> instructions (e.g. VIS and so on) vastly speed up "vector/parallel"
> operations within a word.  A major battle of the early 80's was the
> CDC Cyber 205 vs. Cray-1/S.  The Cyber 205 had greater stride-1
> and gather-scatter bandwidth, the Cray-1/S better latency and stride-N
> performance.  Each machine had its applications where it outperformed
> the other.  All these types of bandwidths and concurrencies are
> worthwhile to examine systematically.  Most "scalar" code, including
> compilers, tend to be dominated by latency, while many engineering,
> scientific, and graphics/image processing applications tend to be
> more bandwidth-intensive.
> 

One more interesting category is the latency accessing objects bigger
than 4 bytes.  On many cache based machines accessing everything in a
cache line is just as fast as accessing one element.  I've never seen
measurements, but my guess is that many data elements in compilers are
bigger than 4 bytes; i.e., spatial locality works for compilers.

Dror