Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!news.mel.aone.net.au!grumpy.fl.net.au!news.webspan.net!www.nntp.primenet.com!nntp.primenet.com!dciteleport.com!worldnet.att.net!news.mathworks.com!enews.sgi.com!chronicle.mti.sgi.com!news From: Dror Maydan <maydan@mti.sgi.com> Newsgroups: comp.unix.bsd.freebsd.misc,comp.arch,comp.benchmarks,comp.sys.super Subject: Re: benchmarking discussion at Usenix? Date: Thu, 09 Jan 1997 10:45:05 -0800 Organization: Silicon Graphics Lines: 39 Distribution: inet Message-ID: <32D53CB1.41C6@mti.sgi.com> References: <5am7vo$gvk@fido.asd.sgi.com> <32D3EE7E.794B@nas.nasa.gov> NNTP-Posting-Host: three.mti.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 2.0S (X11; I; IRIX 6.2 IP20) Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:33688 comp.arch:62201 comp.benchmarks:18685 comp.sys.super:6828 Hugh LaMaster wrote: > > Question: Aside from (pure) latency (currently measured by lmbench) > and bandwidth (measured by lmbench and STREAM), does concurrency matter? > > Answer: I think so. There are a number of cases of interest. > "Bandwidth" is (often) based on stride-1 concurrency. Also interesting: > Concurrency with stride-N. Concurrency on gather/scatter. > > Putting everything in units of time (seconds x 10 ^^ -N) to: > latency (lmbench): fetch random address or datum > stride-1 (1/bandwidth): fetch&process time/unit of contiguous data > stride-N (1/bandwidth): fetch&process every N-th datum > gather/scatter (1/B-W): fetch&process random data > subword (1/bandwidth): fetch&process 8/16/(32) data bits > within word > > For some machines, the bandwidth doesn't vary much, with roughly > constant bandwidth over all possibilities; on some machines, extra > load/store paths allow 2-3X improvements; on some machines, subword > instructions (e.g. VIS and so on) vastly speed up "vector/parallel" > operations within a word. A major battle of the early 80's was the > CDC Cyber 205 vs. Cray-1/S. The Cyber 205 had greater stride-1 > and gather-scatter bandwidth, the Cray-1/S better latency and stride-N > performance. Each machine had its applications where it outperformed > the other. All these types of bandwidths and concurrencies are > worthwhile to examine systematically. Most "scalar" code, including > compilers, tend to be dominated by latency, while many engineering, > scientific, and graphics/image processing applications tend to be > more bandwidth-intensive. > One more interesting category is the latency accessing objects bigger than 4 bytes. On many cache based machines accessing everything in a cache line is just as fast as accessing one element. I've never seen measurements, but my guess is that many data elements in compilers are bigger than 4 bytes; i.e., spatial locality works for compilers. Dror