Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!serv.hinet.net!news.cc.nctu.edu.tw!spring.edu.tw!howland.erols.net!www.nntp.primenet.com!nntp.primenet.com!news1.best.com!nntp1.best.com!flash.noc.best.net!not-for-mail
From: dillon@best.com (Matthew Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD as news-server??
Date: 17 Oct 1996 15:53:24 -0700
Organization: Best Internet Communications, Inc. (info@best.com)
Lines: 170
Distribution: world
Message-ID: <546dd4$bn7@flash.noc.best.net>
References: <537ddl$3cc@amd40.wecs.org> <544nas$b5h@flash.noc.best.net> <5462it$r37@twwells.com> <5467p6$bl4@flash.noc.best.net>
NNTP-Posting-Host: flash.noc.best.net
:In article <5467p6$bl4@flash.noc.best.net>,
:Matthew Dillon <dillon@best.com> wrote:
:>:In article <5462it$r37@twwells.com>, T. William Wells <bill@twwells.com> wrote:
:>:>In article <544nas$b5h@flash.noc.best.net>,
:>:>Matthew Dillon <dillon@best.com> wrote:
>:>: :>One other thing: you simply cannot run streaming and nonstreaming
:...
:>
:>Oct 17 12:47:01 5H:news1 newslink[13484]: nntp1.best.com:nntp1.S43565^Ifinal^I 113 secs 739 acc 44 dup 0 rej, 783 tot (415/min latency us/them: 43/28 mS)
:>Oct 17 12:50:31 5H:news1 newslink[13642]: nntp-best.primenet.com:primenet.S40440^Ifinal^I 15 secs 8 acc 807 dup 0 rej, 815 tot (3260/min latency us/them: 0/17 mS)
:>
:> The first line is our newsfeed -> newsreader feed. With most articles
:> accepted, it takes approximately 113 seconds to transmit 739 articles.
:> Batches are 5 minutes apart, so there is another 287 seconds of 100% IDLE.
:>
:> the second line shows what history file caching does for you... in
:> this particular case, primenet got behind in their feed to us and then
:> caught up... but we already had the articles. The article rate
:> reflects the history file caching - over 54 articles/sec, most NOT
:> accepted. In this case, the entire 5 minute batch took 15 seconds
Oops, I posted the wrong log lines!!!
The first line is correct, but it is the outgoing feed from the point
of view of the newsfeeds machine ... so the numbers reflect the
incoming feed from the point of view of the news*reader* machine.
the second line, however, is actually our bilateral outgoing feed to
primenet... (primenet is probably the best feed on the planet... they
run realtime channels on a honking machine while we run 5 minute
batches, so they generally already have the articles we send them
by a few minutes)
I'm not sure I can find any good incoming feed examples that show
off the bursts in the log files, because most of the incoming feeds
are from channels. Ah! I know, the article log....
Oct 17 14:50:23.515 + newshub.sdsu.edu
Oct 17 14:50:23.645 + newshub.sdsu.edu
Oct 17 14:50:23.694 + newshub.sdsu.edu
Oct 17 14:50:23.784 + newshub.sdsu.edu
Oct 17 14:50:23.870 + news.sgi.com
Oct 17 14:50:23.969 + news.sgi.com
Oct 17 14:50:24.292 + news.sgi.com
Oct 17 14:50:24.439 + news.sgi.com
Oct 17 14:50:24.874 + news.sgi.com
Oct 17 14:50:25.293 + news.sgi.com
Oct 17 14:50:25.355 + news.sgi.com
Oct 17 14:50:25.783 + news.sgi.com
Oct 17 14:50:26.410 + noos.hooked.net
Oct 17 14:50:26.496 + noos.hooked.net
Oct 17 14:50:26.697 + noos.hooked.net
Oct 17 14:50:26.765 + noos.hooked.net
Oct 17 14:50:26.831 + noos.hooked.net
Oct 17 14:50:27.313 + noos.hooked.net
Oct 17 14:50:27.617 + noos.hooked.net
Oct 17 14:50:27.636 + noos.hooked.net
Oct 17 14:50:27.777 + noos.hooked.net
Oct 17 14:50:27.955 + noos.hooked.net
Oct 17 14:50:42.560 - newshub.sdsu.edu
Oct 17 14:50:42.698 + newshub.sdsu.edu
Oct 17 14:50:42.801 + newshub.sdsu.edu
Oct 17 14:50:42.926 + newshub.sdsu.edu
Oct 17 14:50:42.928 - newshub.sdsu.edu
Oct 17 14:50:43.362 + newshub.sdsu.edu
Oct 17 14:50:43.486 + newshub.sdsu.edu
Oct 17 14:50:43.655 + newshub.sdsu.edu
Oct 17 14:50:43.832 + newshub.sdsu.edu
Oct 17 14:50:44.009 + newshub.sdsu.edu
Oct 17 14:50:44.404 + newshub.sdsu.edu
Oct 17 14:50:44.623 + news.sgi.com
Oct 17 14:50:44.713 + news.sgi.com
Oct 17 14:50:45.372 + noos.hooked.net
Oct 17 14:50:45.491 + noos.hooked.net
Oct 17 14:50:46.334 + newshub.sdsu.edu
Oct 17 14:50:46.482 + newshub.sdsu.edu
Oct 17 14:50:46.633 + newshub.sdsu.edu
Oct 17 14:50:46.692 + newshub.sdsu.edu
Oct 17 14:50:46.785 + newshub.sdsu.edu
Oct 17 14:50:46.788 - newshub.sdsu.edu
Oct 17 14:50:46.992 + newshub.sdsu.edu
Oct 17 14:50:47.310 + newshub.sdsu.edu
Oct 17 14:50:47.612 + newshub.sdsu.edu
Oct 17 14:50:47.752 + newshub.sdsu.edu
Oct 17 14:50:48.016 + newshub.sdsu.edu
Oct 17 14:50:48.020 - newshub.sdsu.edu
Oct 17 14:50:48.245 + noos.hooked.net
Oct 17 14:50:48.404 + noos.hooked.net
Oct 17 14:50:48.531 + noos.hooked.net
Oct 17 14:50:48.578 + noos.hooked.net
ok, to understand the above, you have to understand how INND calculates
that timestamp on the left. Normally, INND does this once every
select() loop. I've hacked it to calculate the exact time as of when
it logs the article.
I want you note a couple of things:
(a) Note how the incoming feeds tend to 'burst'... you see a whole bunch
from a single source, then a whole bunch from another source, and
so on. This is streaming.. wehther the incoming feeds are from
file batches or buffered channels.
(b) Note the dead time. There is one point 14:50:27 to 14:50:42 in this
particular sample where innd is 100% idle for 15 seconds! (and, no,
innd was not swapped out :-)). (this occurs a lot, but I am not
going to post thousands of lines of log files to prove it :-)).
(c) Note that article rates. About 0.2 seconds per article written
to disk == 5 articles/sec (though I should note that at the time
I took these readings, a fastrm was in progress).
(d) Note the earlier assertion that streaming mode can steal INND away
for seconds at a time... seems to be true given the above numbers.
For example, newshub.sdsu.edu near the end takes innd away for
2 seconds.
I would assert the following as a conclusion:
* Until you get up to a dozen or more *full* feeds, the only thing
that counts are article-creation rates. That is, the
'I already have it' response tends to be cached and therefore
ignorable.
* That streaming mode is more efficient (cavet: in the face of
non-streaming mode, see below)... for several reasons. It saves
TCP packets, it allows disk and network latencies to overlap, and
it allows statistically significant locality of reference to propogate
in the face of a large number of incoming feeds.
* That while non-streaming mode feeds will suffer, I suggest that the
dead time is sufficient to handle most lower-latency non-streaming
mode feeds. Higher-latency non-streaming mode feeds may have
problems, though, because they will not be able to get enough
transactions in the periods of dead time given to them.
My frank opinion is that everyone should run streaming-mode feeds. It
removes a lot of question marks for full feeds, and just does not matter
for non-full feeds. As news administators, you have control over what
full feeds you want to bring in.
Most real full-feed hubs use streaming nowadays anyway... it is not as if
you will have much of a choice. In the last 12 months, all but one
of my incoming full feeds went from non-streaming to streaming.
Non full-feed sites? Well, for outgoing feeds we don't care. For
incoming feeds the article rate is low enough that it doesn't matter
either.
News has gotten a WHOLE lot more reliable for us in the last year. Our
outgoing feeds not only do not get behind any more, but when something
goes wrong and they find a need to catch up, they catch up very
quickly... usually within two hours. I couldn't say that a year ago.
A year ago, if something went wrong, it took 24 to 48 hours to catch up
again. Many things have changed, but I think the two most noteable
reasons for the improvement have been the switch to streaming mode
connections and improved network connectivity.
The funny thing is, the improvement has not been due so much to the
article creation streaming... in fact, I would say that the majority
of the improvement has been due to the ability to stream ihave
requests that will be mostly 'I already have it' responses. This is
especially true when one is playing catchup.
-Matt
--
Matthew Dillon Engineering, BEST Internet Communications, Inc.
<dillon@best.net>
[always include a portion of the original email in any response!]