Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!solace!dataphone!www.nntp.primenet.com!nntp.primenet.com!enews.sgi.com!news.sgi.com!newshub.sdsu.edu!news1.best.com!nntp1.best.com!usenet From: dillon@flea.best.net (Matt Dillon) Newsgroups: comp.mail.sendmail,comp.mail.smail,comp.unix.bsd.freebsd.misc Subject: Re: Sendmail vs. Smail... Date: 10 Dec 1996 06:33:53 GMT Organization: BEST Internet Communications, Inc. Lines: 115 Message-ID: <58j08h$1mn@nntp1.best.com> References: <57tf61$gq7@raven.eva.net> <58be63$eu@stdismas.bogon.com> <58i45b$apn@nntp1.best.com> <58i81f$ajf@crystal.WonderWorks.COM> NNTP-Posting-Host: flea.best.net Xref: euryale.cc.adfa.oz.au comp.mail.sendmail:34974 comp.mail.smail:2672 comp.unix.bsd.freebsd.misc:32328 :In article <58i81f$ajf@crystal.WonderWorks.COM>, :Kyle Jones <kyle_jones@wonderworks.com> wrote: :>Matt Dillon <dillon@flea.best.net> wrote: :> > [...] :> > * Turn ON ForkEachJob ... you don't really have a choice. :> > If you don't, the sendmail's running the queue can build :> > to five times their normal RSS and effectively run the :> > machine out of memory. Unfortunately, turning :> > ForkEachJob off also blows the connection cache... oh :> > well. Maybe a later version of sendmail will allow one :> > to specify how many jobs per fork one can run :-). :> :>You can combat this growth with smaller queues. Take fifteen :>thousand queued message and spread them over 150 directories and :>queue runs aren't so bad. The sendmail process runs out of jobs :>before it gets really fat. Limiting queue size is a good idea :>anyway because of the linear directory searches combined with :>directory update locking that can keep open() and unlink() :>blocked for a long time. I've found that limiting the queue size causes some pieces of mail to sit in there for hours to perfectly valid destinations while others breeze through in minutes. I've pretty much given up on it as a means to limit queue runs. I've used the multiple-queue approach before too, but threw it away when MinQueueAge came out... you can get nearly the same effect simply by increasing MinQueueAge, or running X sendmail's with one MinQueueAge value and Y sendmail's with another. The only reason I was using the multiple-queue approach at all was due to some catastrophic cascade failures due to linear searches of the spool direct by the kernel for file create/remove. MinQueueAge and the .hoststat stuff pretty much fixed it, and the problem went away entirely when we switched to FreeBSD. The huge queues also created problems for us when we were still using -q5m no, -q15m, no... -q30m, no ... :-) It never worked. Basically this sort of cascade failure occurs when the directory gets large enough such that -qXXXm winds up starting more sendmail's then the system can deal with, all due to the large queue, and file create/remove starts to create clogged directories (processes sitting on filesystem locks trying to update the directory). God, what a mess that was. The current system gets some pretty spectacular tests... whenever the network to a particular huge.. actually very huge but unnamed provider barfs (heh), our mail queues shoot up at a rate of 3000 messages an hour. The pagers start going off when it hits 10,000 messages, and I start praying when it passes 30,000. It taught me a very important lesson: NEVER, NEVER mount /var/spool/mqueue as it's own partition... you not only can't rename it, you can't clear the blocks allocated to the directory either! Oh, another good reason to run with ForkEachJob turned on... it allows you to kill sendmail 'nicely'... you simply kill the daemon and the children of the queue-running maintenance program... the children of the children are the ones doing the actual queue processing, and you let them finish up their current queue file and exit normally. Poof, you've brought down your mail system without a single repeated message! Nice! :> > There's a story here: We once received a mail bomb where :> > the bomber sent the entire message body as a header. :> > There were only about 50 of these messages in the mail :> > queue, but they caused the sendmail's running the queue :> > to grow to about 8 MBytes RSS. The machine, with 128 MB :> > of ram, started to swap! Holy cow! :> :>Groovy. sendmail used to crash when faced with such headers, :>freeing memory in the process. Sometimes sendmail bugs are you :>friends. :) Tell me about it! There was one spam that had a badly munged address that caused sendmail to: * make connection to destination * send the email message to the destination * then crash before it could remove the queue file * repeat ... OUCH! The real clincher: It didn't update the queue file either, so MinQueueAge had no effect on the retries. I call it the auto-remote-spamming tool. I've got the blown up address save away in case I ever need to use it on someone ;-) :> :> > I've also tried to turn off ForkEachJob with the latest :> > 8.8.4 release... it doesn't work. The sendmail's still :> > build up to around a 3 MB RSS and kill the machine. :> :>You might be able to get by with fewer queue run processes if you :>run some of them with drastically smaller connection timeout :>values. If a host doesn't respond in 15 seconds. it probably :>isn't going to respond at all. Instead of (typically) waiting 60 :>seconds, give up and move on. A complete pass of the queue takes :>much less time, and responsive hosts are rewarded by getting :>their mail delivered sooner. The sluggish hosts will be serviced :>eventually by the queue runs that use the RFC 1123 minimum :>timeouts. You should not let the fast queue runnners write into :>the persistent host status cache, or slow hosts will never get :>their mail. Ah, an interesting idea. I may experiment with this some. .hoststat seems to have made the DNS delays an order of magnitude less invasive, and sendmail 8.8.x has some nifty options for connect and initial-connect timeouts. -Matt