*BSD News Article 92184

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!cs.mu.OZ.AU!munnari.OZ.AU!uunet!in2.uu.net!144.212.100.12!news.mathworks.com!enews.sgi.com!news.corp.sgi.com!news.sgi.com!news1.best.com!nntp1.ba.best.com!not-for-mail
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.sgi.misc
Subject: Re: no such thing as a "general user community"
Date: 26 Mar 1997 00:35:42 -0800
Organization: BEST Internet Communications, Inc.
Lines: 54
Message-ID: <5han4u$fnf@flea.best.net>
References: <331BB7DD.28EC@net5.net> <5h91l2$gua@innocence.interface-business.de> <5h9rr0$2sj@flea.best.net> <5h9vft$8eo@fido.asd.sgi.com>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:37944 comp.sys.sgi.misc:29484

:In article <5h9vft$8eo@fido.asd.sgi.com>,
:Nate Tuck <nate@blit.engr.sgi.com> wrote:
:>In article <5h9rr0$2sj@flea.best.net>,
:>Matt Dillon <dillon@flea.best.net> wrote:
:..
:>
:>How do you define these load metrics?  What is a heavy load on each of
:>the machines?  Is it measured by xload, response time on some app, or
:>what?  How many users can you stick on each machine before the load
:>becomes heavy?

    I generally define a 'heavy load' as 'undergoing paging quite often'
    or 'significant number of processes blocked in I/O wait states'
    and a 'medium-to-heavy load' as 'occassionally paging'.  Our mail/www
    proxies each typically have around 80 sendmail processes running and
    30-40 active WWW connections, plus named.  I consider this a medium
    load.  Our main news machine runs its disks and network maxed out half
    of the time, with around 70 incoming and 95 outgoing processes (running
    Diablo).  That's heavily loaded.  The main newsreader machine, running
    inn 1.5.x, has around 250 nnrpd's going at any given moment but plenty
    of I/O and cpu cycles to spare, and runs it's disks at around 30% 
    saturation (at a guess)... that's medium loaded.  Except for the newsreader
    machine, the machines have 128MB of ram in them at the moment.

:>Religious differences and rose-colored glasses aside, what is the
:>difference in throughput between the two platforms in the specific
:>case of BEST?  Where does SGI need to do some code tuning?
:>
:>I'm interested.
:>
:>nate

    Well, this is a biased answer (as, probably all my SGI-related comments
    are), but my opinion is that the IRIX kernel needs a complete workover,
    especially the network, paging, and block I/O code.  The problems are
    mainly related to inefficiency.  Odd situations can result in huge
    swings in performance.  Sometimes the rtnetd's take huge globs of cpu,
    sometimes not.  Paging often hits degenerate cases where the machine's
    performance drops by an order of magnitude which, given the rate that
    new connections come in, generally spells a quick death.  it's so bad
    that we STILL have a once-a-minute cron job running on the two L's which 
    allocates 130 MBytes of ram, touches it all, then exits.  The block I/O
    is extremely inefficient for a general multitasking load, mainly oweing
    to terrible buffer cache management and 16K I/O operations (64 bit 
    kernels).   Basically, we will see the performance drop from one moment
    to the next without any discernable cause.  fork/exec overhead is also
    really, really bad, and shellx needs to do about 40 fork/execs a second
    at peak.  Even now, after midnight, it's doing 20/sec.  The VM systems
    is crazy... it reserves 'virtual' swap on a per-process basis even for
    read-only shared mmap()s, and if it does that, god only knows what it's
    doing with other shared objects.  Complete insanity.

					-Matt