Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.mel.connect.com.au!news.syd.connect.com.au!news.bri.connect.com.au!fjholden.OntheNet.com.au!not-for-mail From: Tony Griffiths <tonyg@OntheNet.com.au> Newsgroups: comp.unix.bsd.freebsd.misc,comp.unix.bsd.bsdi.misc,comp.sys.sgi.misc Subject: Re: no such thing as a "general user community" Date: Mon, 07 Apr 1997 10:37:33 +1000 Organization: On the Net (ISP on the Gold Coast, Australia) Lines: 118 Message-ID: <334841CD.16B4@OntheNet.com.au> References: <331BB7DD.28EC@net5.net> <5hnam9$393@hoopoe.psc.edu> <5hp7p3$1qb@fido.asd.sgi.com> <5hqc45$hlm@flea.best.net> <5i397n$eva@nyheter.chalmers.se> Reply-To: tonyg@OntheNet.com.au NNTP-Posting-Host: swanee.nt.com.au Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.0 (WinNT; I) To: Mats Olsson <matso@dtek.chalmers.se> Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:38601 comp.unix.bsd.bsdi.misc:6603 comp.sys.sgi.misc:29738 Mats Olsson wrote: > > In article <5hqc45$hlm@flea.best.net>, > Matt Dillon <dillon@flea.best.net> wrote: > > Larry, no matter what the results, you can't seriously be advocating > > that testing two OS's on two different platforms is scientific (!). > > Well? Yes? No? > > That depends if the difference between the platforms is significant in > light of the results and what you try to show. Ie, if the results are very > similar and you are trying to show that one OS is better than the other, > then the differences between the systems must be carefully analyzed to > see if they are significant. > > So, testing two OS'es on two differenent platforms isn't necessarily > bad science. How the collected data is used can be bad science. Anyone doing benchmarks has to be EXTREMELY careful about the environment! Even the "smallest" difference can have unforseen consequences. A 'real life' example that cost me some skin off my back... (a) Benchmark a DECsystem-10 (actually a DECSYSTEM-20 but loading TOPS-10 instead of TOPS-20! What's TOPS I hear you say? That's another story). Run the customer's tests over a weekend and collect all the printout and console logs, et al. (b) Customer likes the result and buys the system. (c) System is installed at customer site and lucky me gets sold with it. First job is to re-run the benchmarks to "prove" that we didn't cheat first time round. This is where things start to go wrong!!! The first thing to note is that two of the three disks delivered are NOT the same as in the original benchmark. In fact, they are 2.5 times bigger and 3 times faster in transfer rate! You little ripper, you say! Bigger, faster disks at the same price. What a nice vendor DEC is!!! (d) Run benchmarks on new system and everything goes swimmingly EXCEPT for the "Interactive Responsiveness" test. It goes from 0.8s to 1.2s ! A blink of an eye, I say... A 50% increase says the customer. Both of us are right but the customer refuses to pay the last $200,000 of the contract until the 'problem' is fixed!!! Ok, so now it's a matter of poring over the printouts and logs to see what is happening. After several days it hits me between the eyes... On the original benchmark, the system averaged 2%-3% idle time over a 1 hr benchmark run. On the new system, idle is 0% over the same period. The bigger, faster disks are allowing more jobs to run in a shorter time and, as a consequence, the compute queue is now MUCH deeper. This is causing the increase (decrease) in the responsiveness. I report this to the customer, written up nice and scientifically. Unofficially, their technical people agree with me but the benchmark still "fails" to meet their requirements so still the outstanding $200,000. My management/sales person is becoming nervous and my blood pressure is going up considerably. More time spent exercising the grey matter... OK, if the compute queue is too deep, why not reduce the period of time that processes can stay in the high priority queue (PQ1) before they are forced into the lower priority queue (PQ2). So, I take the system again and reduce the number of quanta that a process can have while in PQ1 from the default of 7 (7 x 20ms in a 50Hz country) to only 1. Low and behold, the 'responsiveness' test now goes from 1.2s down to 0.6s while all other tests still meet or exceed the original benchmark. Great! Report results to the customer assuming that they will be delighted (and pay the outstanding monies!). Nope!!! You've changed the parameters so "Do not pass GO, do not collect 200,000". At a blood pressure of 250 over 180, I charge out and head back to the DEC office swearing and cursing. My recommendation to management is to pull the plug on the system (which is at this time in full productive use) and refund the money already payed. This does not go down to well, as can be imagined. ;-) Finally, after several calming cups of coffee an idea comes to us. The "You've changed the parameters" quote of the customer hits us between the eyes... Yes we have, we've given them bigger, faster disks than in the original benchmark. Swap them out and we should then be able to reproduce the original results within the desired limits (-5% <-> +5%). Arrange a meeting with the customer and tell them of our intended "fix" to the responsiveness 'problem'. After a short pause, they agree to accept the system as it now stands and WAVE the benchmark requirements of system acceptance and, btw, thank you for doing all this tuning work to determine the optimal operating parameters!!! I am ready to KILL, KILL, KILL. > > /Mats The moral of the story is... "DON'T CHANGE ANYTHING WHEN BENCHMARKING!" Even the slighest difference in h/w or s/w will jump up and bite you on the bum! Tony