Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.mel.connect.com.au!news.syd.connect.com.au!phaedrus.kralizec.net.au!news.mel.aone.net.au!grumpy.fl.net.au!news.webspan.net!newsfeeds.sol.net!hammer.uoregon.edu!arclight.uoregon.edu!enews.sgi.com!ames!cnn.nas.nasa.gov!news From: Hugh LaMaster <lamaster@nas.nasa.gov> Newsgroups: comp.unix.bsd.freebsd.misc,comp.arch,comp.benchmarks,comp.sys.super Subject: Re: benchmarking discussion at Usenix? Date: Fri, 17 Jan 1997 15:45:52 -0800 Organization: NASA Ames Research Center Lines: 72 Distribution: inet Message-ID: <32E00F30.15FB@nas.nasa.gov> References: <5am7vo$gvk@fido.asd.sgi.com> <32D3EE7E.794B@nas.nasa.gov> <32D53CB1.41C6@mti.sgi.com> <32DAD735.59E2@nas.nasa.gov> <32DD6761.167E@mti.sgi.com> NNTP-Posting-Host: jeeves.nas.nasa.gov Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.0 (X11; U; IRIX 5.2 IP12) Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:34166 comp.arch:62495 comp.benchmarks:18949 comp.sys.super:6869 Dror Maydan wrote: > > Hugh LaMaster wrote: > > same on all three. Numbers, anyone? BTW, there are some more recent numbers in the following online paper (there seems to be no information on whether it has received further publication - it seems to have been done as a class project): "Cache Behaviour of the SPEC95 Benchmark Suite", Sanjoy Dasgupta and Edouard Servan-Schreiber http://http.cs.berkeley.edu/~dasgupta/paper/rep/rep.html The paper looks at a subset of SPEC95, on a SPARC. The paper suggests that 32 Byte block sizes are optimal for SPEC95, for small (< 128KB) caches, on the machine in question - (presumably 32-bit addresses). It appears to me from this data that a 64-bit address machine would likely do better with 64 Byte blocks, since the optimum is leaning that direction already. Larger cache sizes also do better with larger blocks, so machines with larger unified L1/L2 caches would likely do better with larger blocks. In short, it looks like the vendors have probably already done a pretty good job of optimizing their machines to run SPEC95. [Surprise, surprise]. > My point was that different machines do have different line sizes, and > the differences are quite large. On the SGI R10000, the secondary line > size is 128 Bytes. On some IBM Power 2's, the line size is 256 Bytes. > I'm pretty sure that some other vendors use 32 Byte line sizes. > Why different vendors use different line sizes is probably related to > both system issues and to which types of applications they try to > optimize. We seem to be in raging agreement up to this point. > But, it is irrelevant to the benchmarking issue. I still like to think that microbenchmarks like lmbench and STREAM, larger benchmarks like SPEC95, and full-sized application performance, could be correlated, and even "understood" starting from basic machine performance. So, I think I disagree with the above statement. The issue > is that lmbench measures the latency for fetching a single pointer. On > such a benchmark a large-line machine will look relatively worse > compared to the competition than if instead one used a benchmark that > measured the latency of fetching a cache line. Certainly true. Of course, in some cases, the machines which have long main memory latencies are *also* the same machines with poor bandwidth. > Now which benchmark is "better". I think both are interesting. Which > is more relevant to a typical integer application? I don't know. I don't think there is any doubt that the latencies of the (entire) memory hierarchy are a major determinant of "integer" performance. For engineering-and-scientific code, the picture is murkier. Some codes are pretty much 100% bandwidth determined. Others are not much different from "integer" performance [assuming you have a modern, fast, FP implementation]. It is actually the middle ground that is most "interesting": the codes which can't be trivially transformed to contiguous memory references, which have independent computed indices, and so on. This is the area where "concurrency", as distinguished from the ratio of bandwidth:latency, gets interesting.