Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.vbc.net!vbcnet-west!samba.rahul.net!rahul.net!a2i!bug.rahul.net!rahul.net!a2i!ns2.mainstreet.net!news.pbi.net!news.mathworks.com!enews.sgi.com!chronicle.mti.sgi.com!news From: Dror Maydan <maydan@mti.sgi.com> Newsgroups: comp.unix.bsd.freebsd.misc,comp.arch,comp.benchmarks,comp.sys.super Subject: Re: benchmarking discussion at Usenix? Date: Wed, 15 Jan 1997 15:25:21 -0800 Organization: Silicon Graphics Lines: 35 Distribution: inet Message-ID: <32DD6761.167E@mti.sgi.com> References: <5am7vo$gvk@fido.asd.sgi.com> <32D3EE7E.794B@nas.nasa.gov> <32D53CB1.41C6@mti.sgi.com> <32DAD735.59E2@nas.nasa.gov> NNTP-Posting-Host: three.mti.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 2.0S (X11; I; IRIX 6.2 IP20) Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:33823 comp.arch:62300 comp.benchmarks:18753 comp.sys.super:6841 Hugh LaMaster wrote: > > Dror Maydan wrote: > > > One more interesting category is the latency accessing objects bigger > > than 4 bytes. On many cache based machines accessing everything in a > > cache line is just as fast as accessing one element. I've never seen > > measurements, but my guess is that many data elements in compilers are > > bigger than 4 bytes; i.e., spatial locality works for compilers. > > Well, optimum cache line sizes have been studied extensively. > I'm sure there must be tables in H&P et al. showing hit rate > as a function of line size and total cache size. For reasonably > large caches, I think the optimum used to be near 16 Bytes for > 32-bit byte-addressed machines. I don't know that I have seen more > recent tables for 64-bit code on, say, Alpha, but my guess is that > 32 bytes is probably superior to 16 bytes given the larger address > sizes, not to mention alignment considerations. Just a guess. > Also, we often (but not always) have two levels of cache now, > and sometimes three, and the optimum isn't necessarily the > same on all three. Numbers, anyone? My point was that different machines do have different line sizes, and the differences are quite large. On the SGI R10000, the secondary line size is 128 Bytes. On some IBM Power 2's, the line size is 256 Bytes. I'm pretty sure that some other vendors use 32 Byte line sizes. Why different vendors use different line sizes is probably related to both system issues and to which types of applications they try to optimize. But, it is irrelevant to the benchmarking issue. The issue is that lmbench measures the latency for fetching a single pointer. On such a benchmark a large-line machine will look relatively worse compared to the competition than if instead one used a benchmark that measured the latency of fetching a cache line. Now which benchmark is "better". I think both are interesting. Which is more relevant to a typical integer application? I don't know.