Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!newsgate.cuhk.edu.hk!hpg30a.csc.cuhk.hk!news.cuhk.edu.hk!news.sprintlink.net!news-stk-11.sprintlink.net!www.nntp.primenet.com!nntp.primenet.com!howland.erols.net!news.mathworks.com!uunet!in3.uu.net!twwells!twwells!not-for-mail From: bill@twwells.com (T. William Wells) Newsgroups: comp.unix.bsd.freebsd.misc Subject: Re: FreeBSD as news-server?? Date: 17 Oct 1996 00:05:49 -0400 Organization: None, Mt. Laurel, NJ Lines: 155 Message-ID: <544bat$41o@twwells.com> References: <537ddl$3cc@amd40.wecs.org> <53u1ic$61i@flash.noc.best.net> <53ucuj$8qh@twwells.com> <543urf$ar3@flash.noc.best.net> NNTP-Posting-Host: twwells.com In article <543urf$ar3@flash.noc.best.net>, Matthew Dillon <dillon@best.com> wrote: : :>Also, experience (and my theoretical analysis) shows that multiple : :>parallel feeds generally work better than streaming. : : Well, I've definitely never had a problem running streaming : mode, and I *have* tested it with and without. As have I. My results are the exact opposite of yours. Sigh! : Perhaps the machine you were running it on wasn't tuned for it. : I find that you get much better results with larger TCP window : sizes... it tends to make the streaming much more efficient. This is contrary to my expectations, if one is disk bottlenecked. This, I suspect, is the difference between your system and mine. The disks I use are pretty generic; I suspect that they're really not suited to the task. However, given the growth of feeds, what's true for my system today is almost certainly going to become true for everyone else in the not too distant future. Exponential growth does that. :-) Anyway, the reason this is important is that if the overhead of writing articles gets too large, it exceeds the ability of the protocol to overlap it with network latency. Once that starts happening, the protocol slows down dramatically and things like increasing the TCP window size will only make things worse. One other thing: you simply cannot run streaming and nonstreaming feeds into the same server. Or, you can, but the nonstreaming feeds will get so far behind as to be pointless. Even with fast disks, this will be true.... : :>What this means is that optimizations regarding the history file : :>are generally pointless. Keeping the history file in memory cuts : :>out at most 8K per article of disk activity -- while INN spends : :>time waiting on that 64K (it's mostly directory stuff, so INN : :>doesn't get buffer cache benefits for it). Since these two : :>operations can be done somewhat asynchronously, you don't get : :>much "win" by minimizing history accesses. : : Perhaps it is pointless with a single feed, but it certainly : is NOT pointless if you have multiple redundant feeds. I have four feeds. My disk statistics really don't reflect your opinion. Or, put it this way: if you have ten incoming feeds and they all require a disk hit, that's twenty five disk hits per second. This doesn't strain the disks at all.... : History file caching is EXTREMELY important, because it means : that 6 out of 7 responses to IHAVE requests will be cached : (because the response is 'I've already got the article'), and : thus involve *NO* disk activity whatsoever. Alas, this is only true if your feeds are all so close to "real time" that things remain in the cache. Otherwise, caching doesn't do anything for you. (In my system, I solve this problem with a message id daemon, which eliminates most redundant history lookups.) : There are a thousand things that cause create references : to unlinked history files... literally! I don't have them. Ever. Maybe it's just luck. :-) : Without enough room : to manuever, any one of these items can completely destroy : your history file if you do not have enough free space on : the partition. I haven't had that experience. The few times that I've had that partition overflow during the daily history rebuild, expire caught the overflow and just didn't bother renaming the files. Thus the history wasn't rebuilt that day but it got rebuilt the next. Which was good enough. : :>But be sure to put the overview files in a separate directory tree : :>-- otherwise overchan spends a lot of time directory searching. : : The overview file is normally near the beginning of the directory. No it is not. Because when you do an expireover, it makes a new history file and renames it to the old one. There's no guarantee that that will end up near the beginning of the directory. In fact, odds are pretty good it won't be. : Statistically, it's a wash. Other people have completely different experiences. The INN documentation also disagrees. : Besides, overchan is an asynchronous : process. It does not really matter if it takes a little extra : overhead... Yes it does. Because if innd can't buffer it, you get entries lost into the batch file. Unless you go to pains to ensure that those entries get processed, you end up with nnrpds wasting time recreating those entries. : it's in the noise because the directory in question : has *already* been cached by the act of writing out the article : file in the first place. The namei caching works for .overview : files as well. Alas not, because overchan is asynchronous. By the time it's ready to fiddle with the overview file, that directory stuff is likely to be long gone. : I'm beginning to wonder.... what kind of hardware are you : running this stuff on? I described it in another post.... : :>As I said, I don't think this makes much difference anymore. For : :>sure, on the system I have, it makes things *much* worse to have : :>a large data segment for innd. : : Any UNIX that implements vfork() will not care at all, and FreeBSD : doesn't care whether you use fork() *or* vfork(). It's a big zero : time-wise, even with huge data segments. Irrelevant because, even if FreeBSD doesn't copy or write the data, it _does_ allocate swap space. Get a bunch of these all at once and your server will refuse to fork. There are certain news clients which have a bad habit of making large numbers of nntp connections all at once. This makes random things fail on the server. : You are only running a few nnrpd's, it doesn't matter. But : shared-active saves a huge amount of startup processing plus, Good point. : if : you have a full active file, on the order of the size of the : active file (500K to 1MB usually) per nnrpd process. Yeah. For awhile, I was running a hacked nnrpd that read the active file for each new newsgroup the user wanted. 300K images. :-) Only problem was some newsreaders that positively insisted on looking up several hundred newsgroups all at once, so I regretfully had to retire that hack. : :>screw you up memory-wise. Basically, it's a bad idea to run : :>channel feeds. For that matter, I think I'm going to remove the : :>last of mine (for overview). Then innd will *never* fork -- and : :>that's one less thing to get in the way of shovelling articles as : :>fast as possible. :-) : : Hmm.. file batching overview :-) :-) :-) No particular reason not to. It certainly works for C news and the nnrpds will deal with records not in the file quite yet.