*BSD News Article 80991

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.netspace.net.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!solace!news.stealth.net!demos!news1.best.com!nntp1.best.com!flash.noc.best.net!not-for-mail
From: dillon@best.com (Matthew Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 17 Oct 1996 14:17:26 -0700
Organization: Best Internet Communications, Inc. (info@best.com)
Lines: 169
Distribution: world
Message-ID: <5467p6$bl4@flash.noc.best.net>
References: <537ddl$3cc@amd40.wecs.org> <544bat$41o@twwells.com> <544nas$b5h@flash.noc.best.net> <5462it$r37@twwells.com>
NNTP-Posting-Host: flash.noc.best.net

:In article <5462it$r37@twwells.com>, T. William Wells <bill@twwells.com> wrote:
:>In article <544nas$b5h@flash.noc.best.net>,
:>Matthew Dillon <dillon@best.com> wrote:
:>: :>One other thing: you simply cannot run streaming and nonstreaming
:>: :>feeds into the same server. Or, you can, but the nonstreaming
:>: :>feeds will get so far behind as to be pointless. Even with fast
:>: :>disks, this will be true....
:>:
:>:      Well, the article writing overhead *could* be decoupled relatively
:>:      easily from INND.  It would be a 'one-hour hack' in programming
:>:      terms.  You just pipe the data to another process and go on to the
:>:      next article.
:>
:>Even if you do this, I think the the nonstreaming feeds will
:>still get crunched. Innd processes the stuff from each stream all
:>at once, so this introduces latency in all the other feeds.  The
:>is a *real* problem for nonstreaming feeds because any latency
:>above the network latency directly slows the nonstreaming feed.
:>(This is, in fact, the problem that streaming was invented to
:>solve....)

    I can see the logic to that, but I think reasonable history
    file caching pretty much fixes the problem.  Even with
    multiple feeds coming in, you still only have a relatively
    fixed article creation rate, with the remaining overhead
    going to history file lookups and sending back 'I've already
    got it' codes.  A streaming feed tends to be bursty, with a 
    *lot* of idle time between bursts... for example:

Oct 17 12:47:01 5H:news1 newslink[13484]: nntp1.best.com:nntp1.S43565^Ifinal^I 113 secs  739 acc   44 dup    0 rej,  783 tot (415/min latency us/them: 43/28 mS)
Oct 17 12:50:31 5H:news1 newslink[13642]: nntp-best.primenet.com:primenet.S40440^Ifinal^I  15 secs    8 acc  807 dup    0 rej,  815 tot (3260/min latency us/them: 0/17 mS)

    The first line is our newsfeed -> newsreader feed.  With most articles 
    accepted, it takes approximately 113 seconds to transmit 739 articles.  
    Batches are 5 minutes apart, so there is another 287 seconds of 100% IDLE.

    the second line shows what history file caching does for you... in 
    this particular case, primenet got behind in their feed to us and then 
    caught up... but we already had the articles.  The article rate 
    reflects the history file caching - over 54 articles/sec, most NOT
    accepted.  In this case, the entire 5 minute batch took 15 seconds
    of INN's time.

    So, from my point of view, innd tends to have plenty of free cycles
    for non-streaming feeds even in the face of streaming feeds.  While
    it might loose some in latency when several streaming feeds are
    in operation, it gains it right back when those same feeds are in their
    idle period.

:>: :>Alas, this is only true if your feeds are all so close to "real
:>: :>time" that things remain in the cache. Otherwise, caching doesn't
:>: :>do anything for you. (In my system, I solve this problem with a
:>: :>message id daemon, which eliminates most redundant history
:>: :>lookups.)
:>:
:>:     You can cache a *lot* of history file.  Sure, the cache will not be
:>:     as optimal, but it will still be there, and it will be a disk read
:>:     rather then a file create.
:>
:>Well maybe. There's an awful lot of disk activity going on on a
:>news server and most of it isn't in the history file.  This one, I
:>suspect, isn't going to be answered without tools that can
:>examine the buffering directly.

    The history file has its own partition.  And, as I said, you
    need memory for filesystem caches.  But if you set it up right,
    there isn't a problem.

:>: :>Yes it does. Because if innd can't buffer it, you get entries lost
:>: :>into the batch file. Unless you go to pains to ensure that those
:>: :>entries get processed, you end up with nnrpds wasting time
:>: :>recreating those entries.
:>:
:>:     Huh?  I have no idea what you are talking about here.  nnrpd does not
:>:     go around creating .overview entries.   It's asynchronous, and it has
:>:     no effect whatsoever on innd unless it gets behind.  I have NEVER seen
:>:     overchan get behind... ever... the system could be dying and overchan
:>:     still wouldn't get behind.
:>
:>Ok, here's what happens. If overchan gets behind, innd starts
:>creating a batch file for it. That goes in your out.going
:>directory. This *does* happen and did happen for me until I moved
:>the overview to a separate disk. This batch file doesn't ever get
:>processed. Thus some entries are lost from the overview.  This is
:>not a catastrophe: nnrpd considers the spool to be the master; if
:>there is a file in the directory which doesn't have an overview
:>entry, it creates one on the fly. This entry is *not* written to
:>the overview file, it's purely internal to the nnrpd. The "wasting
:>time" I was referring to is the time to open the articles with
:>missing entries and read them for overview data.

    overchan should never get behind.  If it does, one has other
    problems to deal with it.  I've run two busy news machines for
    two years now.  Overchan has not ever, not once gotten behind.
    The rest of the system could be dying and overchan still wouldn't
    get behind.

    Also, keep in mind my original comment... I said that if you
    had three or more spool disks, that you would not have to 
    separate the overview files.  In fact, I would guess that 
    reserving a single spindle just for overview would cause more
    problems then it would solve.  If it's in the spool, overview
    disk activity gets spread around the N spindles like everything
    else in the spool.  It's much more scaleable to stripe a
    single spool across three or four (or more!) physical disks
    and just put the overview in the spool.

    I suppose if you were paranoid, you could create a second
    directory hierarchy on the same (striped) partition as the
    spool.  But as I said... 99% of the time (in my view), the .overview
    file will already be in the vnode or namei cache or the blocks
    relating to the directory will already be cached from the
    article file's creation.

>: :>Alas not, because overchan is asynchronous. By the time it's
>: :>ready to fiddle with the overview file, that directory stuff is
>: :>likely to be long gone.
>:
>:     This is not true at all.  A 4K buffer is equivalent to less then
>:     a hundred articles.  It's still cached.  We aren't taling about hour
>:     delays here, or even 5 minute delays.  We are talking about 30 seconds
>:     of delay here.
>
>Well, I can't show you directory statistics anymore (because of my
>directory structure changes) but when your popular directories
>are hundreds of k's and you have a lot of nnrpds floating around
>reading from the disk, the cache turnover is pretty damned fast.
>This is another one of those where it would be nice to instrument
>the cache....
>
:>: :>Irrelevant because, even if FreeBSD doesn't copy or write the
:>: :>data, it _does_ allocate swap space. Get a bunch of these all at
:>: :>once and your server will refuse to fork. There are certain news
:>: :>clients which have a bad habit of making large numbers of nntp
:>: :>connections all at once. This makes random things fail on the
:>: :>server.
:>:
:>:     No, FreeBSD does not allocate swap space.  Lookee here, program #2:
:>
:>OK, then *you* tell *me* what EAGAIN from fork means. :-) When I
:>checked the kernel code, it looked like nothing short of a swap
:>shortage would cause it. (Well, running out of slots for child
:>processes could, too, but I don't think that's the case here.
:>Eveen at peak times, I've still got about 50% leeway in my process
:>slots before I hit the limit.)

    I think you jumped to conclusions as to what exactly was being
    duplicated.  The only memory the kernel really needs to allocate
    to do a fork() is a few pages here and there... the page directory,
    process structure, descriptor array, plus a few pages that get 
    written to immediately, such as one page in the stack, maybe a page or two
    of data... it really isn't much.  We are talking perhaps 16 KBytes,
    maybe a little more if the page table is huge (though mayhaps FreeBSD
    makes the pagetable pages copy-on-write as well :-)).

    The only other thing that could cause fork to fail is the user
    process limit, which is normally 20 or 40 on FreeBSD machines by
    default.  For my test, I unlimited that resource.

    On FreeBSD machines, actual swap space is only allocated when a
    pageout must occur, and even then we are only talking about one
    page no matter how many tasks are sharing the data.

						-Matt

-- 
    Matthew Dillon   Engineering, BEST Internet Communications, Inc.
		     <dillon@best.net>
    [always include a portion of the original email in any response!]