*BSD News Article 80897


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!serv.hinet.net!news.cc.nctu.edu.tw!spring.edu.tw!howland.erols.net!www.nntp.primenet.com!nntp.primenet.com!news1.best.com!nntp1.best.com!flash.noc.best.net!not-for-mail
From: dillon@best.com (Matthew Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 17 Oct 1996 15:53:24 -0700
Organization: Best Internet Communications, Inc. (info@best.com)
Lines: 170
Distribution: world
Message-ID: <546dd4$bn7@flash.noc.best.net>
References: <537ddl$3cc@amd40.wecs.org> <544nas$b5h@flash.noc.best.net> <5462it$r37@twwells.com> <5467p6$bl4@flash.noc.best.net>
NNTP-Posting-Host: flash.noc.best.net

:In article <5467p6$bl4@flash.noc.best.net>,
:Matthew Dillon <dillon@best.com> wrote:
:>:In article <5462it$r37@twwells.com>, T. William Wells <bill@twwells.com> wrote:
:>:>In article <544nas$b5h@flash.noc.best.net>,
:>:>Matthew Dillon <dillon@best.com> wrote:
>:>: :>One other thing: you simply cannot run streaming and nonstreaming
:...
:>
:>Oct 17 12:47:01 5H:news1 newslink[13484]: nntp1.best.com:nntp1.S43565^Ifinal^I 113 secs  739 acc   44 dup    0 rej,  783 tot (415/min latency us/them: 43/28 mS)
:>Oct 17 12:50:31 5H:news1 newslink[13642]: nntp-best.primenet.com:primenet.S40440^Ifinal^I  15 secs    8 acc  807 dup    0 rej,  815 tot (3260/min latency us/them: 0/17 mS)
:>
:>    The first line is our newsfeed -> newsreader feed.  With most articles 
:>    accepted, it takes approximately 113 seconds to transmit 739 articles.  
:>    Batches are 5 minutes apart, so there is another 287 seconds of 100% IDLE.
:>
:>    the second line shows what history file caching does for you... in 
:>    this particular case, primenet got behind in their feed to us and then 
:>    caught up... but we already had the articles.  The article rate 
:>    reflects the history file caching - over 54 articles/sec, most NOT
:>    accepted.  In this case, the entire 5 minute batch took 15 seconds

    Oops, I posted the wrong log lines!!!

    The first line is correct, but it is the outgoing feed from the point
    of view of the newsfeeds machine ... so the numbers reflect the
    incoming feed from the point of view of the news*reader* machine.

    the second line, however, is actually our bilateral outgoing feed to
    primenet... (primenet is probably the best feed on the planet... they
    run realtime channels on a honking machine while we run 5 minute 
    batches, so they generally already have the articles we send them
    by a few minutes)

    I'm not sure I can find any good incoming feed examples that show
    off the bursts in the log files, because most of the incoming feeds
    are from channels.  Ah!   I know, the article log....

Oct 17  14:50:23.515 + newshub.sdsu.edu
Oct 17  14:50:23.645 + newshub.sdsu.edu
Oct 17  14:50:23.694 + newshub.sdsu.edu
Oct 17  14:50:23.784 + newshub.sdsu.edu
Oct 17  14:50:23.870 + news.sgi.com
Oct 17  14:50:23.969 + news.sgi.com
Oct 17  14:50:24.292 + news.sgi.com
Oct 17  14:50:24.439 + news.sgi.com
Oct 17  14:50:24.874 + news.sgi.com
Oct 17  14:50:25.293 + news.sgi.com
Oct 17  14:50:25.355 + news.sgi.com
Oct 17  14:50:25.783 + news.sgi.com
Oct 17  14:50:26.410 + noos.hooked.net
Oct 17  14:50:26.496 + noos.hooked.net
Oct 17  14:50:26.697 + noos.hooked.net
Oct 17  14:50:26.765 + noos.hooked.net
Oct 17  14:50:26.831 + noos.hooked.net
Oct 17  14:50:27.313 + noos.hooked.net
Oct 17  14:50:27.617 + noos.hooked.net
Oct 17  14:50:27.636 + noos.hooked.net
Oct 17  14:50:27.777 + noos.hooked.net
Oct 17  14:50:27.955 + noos.hooked.net
Oct 17  14:50:42.560 - newshub.sdsu.edu
Oct 17  14:50:42.698 + newshub.sdsu.edu
Oct 17  14:50:42.801 + newshub.sdsu.edu
Oct 17  14:50:42.926 + newshub.sdsu.edu
Oct 17  14:50:42.928 - newshub.sdsu.edu
Oct 17  14:50:43.362 + newshub.sdsu.edu
Oct 17  14:50:43.486 + newshub.sdsu.edu
Oct 17  14:50:43.655 + newshub.sdsu.edu
Oct 17  14:50:43.832 + newshub.sdsu.edu
Oct 17  14:50:44.009 + newshub.sdsu.edu
Oct 17  14:50:44.404 + newshub.sdsu.edu
Oct 17  14:50:44.623 + news.sgi.com
Oct 17  14:50:44.713 + news.sgi.com
Oct 17  14:50:45.372 + noos.hooked.net
Oct 17  14:50:45.491 + noos.hooked.net
Oct 17  14:50:46.334 + newshub.sdsu.edu
Oct 17  14:50:46.482 + newshub.sdsu.edu
Oct 17  14:50:46.633 + newshub.sdsu.edu
Oct 17  14:50:46.692 + newshub.sdsu.edu
Oct 17  14:50:46.785 + newshub.sdsu.edu
Oct 17  14:50:46.788 - newshub.sdsu.edu
Oct 17  14:50:46.992 + newshub.sdsu.edu
Oct 17  14:50:47.310 + newshub.sdsu.edu
Oct 17  14:50:47.612 + newshub.sdsu.edu
Oct 17  14:50:47.752 + newshub.sdsu.edu
Oct 17  14:50:48.016 + newshub.sdsu.edu
Oct 17  14:50:48.020 - newshub.sdsu.edu
Oct 17  14:50:48.245 + noos.hooked.net
Oct 17  14:50:48.404 + noos.hooked.net
Oct 17  14:50:48.531 + noos.hooked.net
Oct 17  14:50:48.578 + noos.hooked.net

    ok, to understand the above, you have to understand how INND calculates
    that timestamp on the left.  Normally, INND does this once every 
    select() loop.  I've hacked it to calculate the exact time as of when
    it logs the article.

    I want you note a couple of things:

    (a) Note how the incoming feeds tend to 'burst'... you see a whole bunch
	from a single source, then a whole bunch from another source, and
	so on.  This is streaming.. wehther the incoming feeds are from
	file batches or buffered channels.

    (b) Note the dead time.  There is one point 14:50:27 to 14:50:42 in this
	particular sample where innd is 100% idle for 15 seconds!  (and, no,
	innd was not swapped out :-)).  (this occurs a lot, but I am not
	going to post thousands of lines of log files to prove it :-)).

    (c) Note that article rates.  About 0.2 seconds per article written
	to disk == 5 articles/sec (though I should note that at the time
	I took these readings, a fastrm was in progress).

    (d) Note the earlier assertion that streaming mode can steal INND away
	for seconds at a time... seems to be true given the above numbers.
	For example, newshub.sdsu.edu near the end takes innd away for 
	2 seconds.

    I would assert the following as a conclusion:

    * Until you get up to a dozen or more *full* feeds, the only thing
      that counts are article-creation rates.  That is, the 
      'I already have it' response tends to be cached and therefore
      ignorable.

    * That streaming mode is more efficient (cavet: in the face of 
      non-streaming mode, see below)... for several reasons.  It saves
      TCP packets, it allows disk and network latencies to overlap, and
      it allows statistically significant locality of reference to propogate
      in the face of a large number of incoming feeds.

    * That while non-streaming mode feeds will suffer, I suggest that the
      dead time is sufficient to handle most lower-latency non-streaming
      mode feeds.  Higher-latency non-streaming mode feeds may have 
      problems, though, because they will not be able to get enough
      transactions in the periods of dead time given to them.

    My frank opinion is that everyone should run streaming-mode feeds.  It
    removes a lot of question marks for full feeds, and just does not matter
    for non-full feeds.  As news administators, you have control over what
    full feeds you want to bring in.

    Most real full-feed hubs use streaming nowadays anyway... it is not as if
    you will have much of a choice.   In the last 12 months, all but one
    of my incoming full feeds went from non-streaming to streaming.

    Non full-feed sites?  Well, for outgoing feeds we don't care.  For
    incoming feeds the article rate is low enough that it doesn't matter
    either.

    News has gotten a WHOLE lot more reliable for us in the last year.  Our 
    outgoing feeds not only do not get behind any more, but when something
    goes wrong and they find a need to catch up, they catch up very 
    quickly... usually within two hours.  I couldn't say that a year ago. 
    A year ago, if something went wrong, it took 24 to 48 hours to catch up
    again.  Many things have changed, but I think the two most noteable 
    reasons for the improvement have been the switch to streaming mode 
    connections and improved network connectivity.

    The funny thing is, the improvement has not been due so much to the
    article creation streaming... in fact, I would say that the majority
    of the improvement has been due to the ability to stream ihave 
    requests that will be mostly 'I already have it' responses.  This is
    especially true when one is playing catchup.

						-Matt

-- 
    Matthew Dillon   Engineering, BEST Internet Communications, Inc.
		     <dillon@best.net>
    [always include a portion of the original email in any response!]