Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!serv.hinet.net!news.cc.nctu.edu.tw!spring.edu.tw!howland.erols.net!www.nntp.primenet.com!nntp.primenet.com!news1.best.com!nntp1.best.com!flash.noc.best.net!not-for-mail From: dillon@best.com (Matthew Dillon) Newsgroups: comp.unix.bsd.freebsd.misc Subject: Re: FreeBSD as news-server?? Date: 17 Oct 1996 15:53:24 -0700 Organization: Best Internet Communications, Inc. (info@best.com) Lines: 170 Distribution: world Message-ID: <546dd4$bn7@flash.noc.best.net> References: <537ddl$3cc@amd40.wecs.org> <544nas$b5h@flash.noc.best.net> <5462it$r37@twwells.com> <5467p6$bl4@flash.noc.best.net> NNTP-Posting-Host: flash.noc.best.net :In article <5467p6$bl4@flash.noc.best.net>, :Matthew Dillon <dillon@best.com> wrote: :>:In article <5462it$r37@twwells.com>, T. William Wells <bill@twwells.com> wrote: :>:>In article <544nas$b5h@flash.noc.best.net>, :>:>Matthew Dillon <dillon@best.com> wrote: >:>: :>One other thing: you simply cannot run streaming and nonstreaming :... :> :>Oct 17 12:47:01 5H:news1 newslink[13484]: nntp1.best.com:nntp1.S43565^Ifinal^I 113 secs 739 acc 44 dup 0 rej, 783 tot (415/min latency us/them: 43/28 mS) :>Oct 17 12:50:31 5H:news1 newslink[13642]: nntp-best.primenet.com:primenet.S40440^Ifinal^I 15 secs 8 acc 807 dup 0 rej, 815 tot (3260/min latency us/them: 0/17 mS) :> :> The first line is our newsfeed -> newsreader feed. With most articles :> accepted, it takes approximately 113 seconds to transmit 739 articles. :> Batches are 5 minutes apart, so there is another 287 seconds of 100% IDLE. :> :> the second line shows what history file caching does for you... in :> this particular case, primenet got behind in their feed to us and then :> caught up... but we already had the articles. The article rate :> reflects the history file caching - over 54 articles/sec, most NOT :> accepted. In this case, the entire 5 minute batch took 15 seconds Oops, I posted the wrong log lines!!! The first line is correct, but it is the outgoing feed from the point of view of the newsfeeds machine ... so the numbers reflect the incoming feed from the point of view of the news*reader* machine. the second line, however, is actually our bilateral outgoing feed to primenet... (primenet is probably the best feed on the planet... they run realtime channels on a honking machine while we run 5 minute batches, so they generally already have the articles we send them by a few minutes) I'm not sure I can find any good incoming feed examples that show off the bursts in the log files, because most of the incoming feeds are from channels. Ah! I know, the article log.... Oct 17 14:50:23.515 + newshub.sdsu.edu Oct 17 14:50:23.645 + newshub.sdsu.edu Oct 17 14:50:23.694 + newshub.sdsu.edu Oct 17 14:50:23.784 + newshub.sdsu.edu Oct 17 14:50:23.870 + news.sgi.com Oct 17 14:50:23.969 + news.sgi.com Oct 17 14:50:24.292 + news.sgi.com Oct 17 14:50:24.439 + news.sgi.com Oct 17 14:50:24.874 + news.sgi.com Oct 17 14:50:25.293 + news.sgi.com Oct 17 14:50:25.355 + news.sgi.com Oct 17 14:50:25.783 + news.sgi.com Oct 17 14:50:26.410 + noos.hooked.net Oct 17 14:50:26.496 + noos.hooked.net Oct 17 14:50:26.697 + noos.hooked.net Oct 17 14:50:26.765 + noos.hooked.net Oct 17 14:50:26.831 + noos.hooked.net Oct 17 14:50:27.313 + noos.hooked.net Oct 17 14:50:27.617 + noos.hooked.net Oct 17 14:50:27.636 + noos.hooked.net Oct 17 14:50:27.777 + noos.hooked.net Oct 17 14:50:27.955 + noos.hooked.net Oct 17 14:50:42.560 - newshub.sdsu.edu Oct 17 14:50:42.698 + newshub.sdsu.edu Oct 17 14:50:42.801 + newshub.sdsu.edu Oct 17 14:50:42.926 + newshub.sdsu.edu Oct 17 14:50:42.928 - newshub.sdsu.edu Oct 17 14:50:43.362 + newshub.sdsu.edu Oct 17 14:50:43.486 + newshub.sdsu.edu Oct 17 14:50:43.655 + newshub.sdsu.edu Oct 17 14:50:43.832 + newshub.sdsu.edu Oct 17 14:50:44.009 + newshub.sdsu.edu Oct 17 14:50:44.404 + newshub.sdsu.edu Oct 17 14:50:44.623 + news.sgi.com Oct 17 14:50:44.713 + news.sgi.com Oct 17 14:50:45.372 + noos.hooked.net Oct 17 14:50:45.491 + noos.hooked.net Oct 17 14:50:46.334 + newshub.sdsu.edu Oct 17 14:50:46.482 + newshub.sdsu.edu Oct 17 14:50:46.633 + newshub.sdsu.edu Oct 17 14:50:46.692 + newshub.sdsu.edu Oct 17 14:50:46.785 + newshub.sdsu.edu Oct 17 14:50:46.788 - newshub.sdsu.edu Oct 17 14:50:46.992 + newshub.sdsu.edu Oct 17 14:50:47.310 + newshub.sdsu.edu Oct 17 14:50:47.612 + newshub.sdsu.edu Oct 17 14:50:47.752 + newshub.sdsu.edu Oct 17 14:50:48.016 + newshub.sdsu.edu Oct 17 14:50:48.020 - newshub.sdsu.edu Oct 17 14:50:48.245 + noos.hooked.net Oct 17 14:50:48.404 + noos.hooked.net Oct 17 14:50:48.531 + noos.hooked.net Oct 17 14:50:48.578 + noos.hooked.net ok, to understand the above, you have to understand how INND calculates that timestamp on the left. Normally, INND does this once every select() loop. I've hacked it to calculate the exact time as of when it logs the article. I want you note a couple of things: (a) Note how the incoming feeds tend to 'burst'... you see a whole bunch from a single source, then a whole bunch from another source, and so on. This is streaming.. wehther the incoming feeds are from file batches or buffered channels. (b) Note the dead time. There is one point 14:50:27 to 14:50:42 in this particular sample where innd is 100% idle for 15 seconds! (and, no, innd was not swapped out :-)). (this occurs a lot, but I am not going to post thousands of lines of log files to prove it :-)). (c) Note that article rates. About 0.2 seconds per article written to disk == 5 articles/sec (though I should note that at the time I took these readings, a fastrm was in progress). (d) Note the earlier assertion that streaming mode can steal INND away for seconds at a time... seems to be true given the above numbers. For example, newshub.sdsu.edu near the end takes innd away for 2 seconds. I would assert the following as a conclusion: * Until you get up to a dozen or more *full* feeds, the only thing that counts are article-creation rates. That is, the 'I already have it' response tends to be cached and therefore ignorable. * That streaming mode is more efficient (cavet: in the face of non-streaming mode, see below)... for several reasons. It saves TCP packets, it allows disk and network latencies to overlap, and it allows statistically significant locality of reference to propogate in the face of a large number of incoming feeds. * That while non-streaming mode feeds will suffer, I suggest that the dead time is sufficient to handle most lower-latency non-streaming mode feeds. Higher-latency non-streaming mode feeds may have problems, though, because they will not be able to get enough transactions in the periods of dead time given to them. My frank opinion is that everyone should run streaming-mode feeds. It removes a lot of question marks for full feeds, and just does not matter for non-full feeds. As news administators, you have control over what full feeds you want to bring in. Most real full-feed hubs use streaming nowadays anyway... it is not as if you will have much of a choice. In the last 12 months, all but one of my incoming full feeds went from non-streaming to streaming. Non full-feed sites? Well, for outgoing feeds we don't care. For incoming feeds the article rate is low enough that it doesn't matter either. News has gotten a WHOLE lot more reliable for us in the last year. Our outgoing feeds not only do not get behind any more, but when something goes wrong and they find a need to catch up, they catch up very quickly... usually within two hours. I couldn't say that a year ago. A year ago, if something went wrong, it took 24 to 48 hours to catch up again. Many things have changed, but I think the two most noteable reasons for the improvement have been the switch to streaming mode connections and improved network connectivity. The funny thing is, the improvement has not been due so much to the article creation streaming... in fact, I would say that the majority of the improvement has been due to the ability to stream ihave requests that will be mostly 'I already have it' responses. This is especially true when one is playing catchup. -Matt -- Matthew Dillon Engineering, BEST Internet Communications, Inc. <dillon@best.net> [always include a portion of the original email in any response!]