Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!news.ececs.uc.edu!news.kei.com!news.mathworks.com!news.sprintlink.net!news-peer.sprintlink.net!arclight.uoregon.edu!nntp.primenet.com!news1.best.com!nntp1.best.com!usenet From: dillon@flea.best.net (Matt Dillon) Newsgroups: comp.mail.sendmail,comp.mail.smail,comp.unix.bsd.freebsd.misc Subject: Re: Sendmail vs. Smail... Date: 9 Dec 1996 22:34:19 GMT Organization: BEST Internet Communications, Inc. Lines: 144 Message-ID: <58i45b$apn@nntp1.best.com> References: <57tf61$gq7@raven.eva.net> <589e5u$35t@stdismas.bogon.com> <589jsm$o2v@ezekiel.eunet.ie> <58be63$eu@stdismas.bogon.com> NNTP-Posting-Host: flea.best.net Xref: euryale.cc.adfa.oz.au comp.mail.sendmail:34973 comp.mail.smail:2671 comp.unix.bsd.freebsd.misc:32318 :In article <58be63$eu@stdismas.bogon.com>, :John Henders <jhenders@bogon.com> wrote: :>In <589jsm$o2v@ezekiel.eunet.ie> nick@eunet.ie (Nick Hilliard) writes: :> :>>John Henders (jhenders@bogon.com) said: >>: Then why don't you look at exim. It really has the best of both worlds. :>>: Human readably config files, and much more flexibility than smail. How :> :>>... but retains smail's poor queueing strategy and lack of header rewriting :>>capabilities :-( :> :>Umm, exim has header rewriting. I use it. I don't know what queue :>strategy problems you mean, as I've had 7000 messages queued up for a :>host we provide secondary MX when it went down with a hard drive crash :>for a half a day, and it took less than 40 minutes to send them all when :>the machine came up again. A machine I recently changed from smail to :>exim used to typically have 1-200 messages in the input queue at any one :>time, now it averages about 40. :> :> :>-- :> Artificial Intelligence stands no chance against Natural Stupidity. :> GAT d- -p+(--) c++++ l++ u++ t- m--- W--- !v :> b+++ e* s-/+ n-(?) h++ f+g+ w+++ y* :> This is a relatively simple case because most of your queue was going to a single machine. Queueing strategy problems tend to become interesting when you have significant outgoing mail that is more randomized.... goes to many different destinations. The problem is that you wind up with lots of sendmail's hanging on timeouts... either connection timeouts or DNS timeouts. If you have just one destination, this isn't a problem because all your sendmail's get stuck and unstuck at the same time. However, if you have multiple destinations, you can wind up in a situation where mail to good destinations gets backed up because sendmail's are hanging on mail to bad destinations. The sendmail's running the queue get starved for real traffic because they are all 'blown' on the bad destinations, and mail to the good destinations comes in faster then the sendmail's running the queue can process them. This leads to a cascade failure. The limiting factor is that you can only run so many sendmail (or equivalent) processes on any given machine at once. I run about 80 sendmail's on the queue in each of our FreeBSD mail machines (two 200Mhz pentium pro systems w/ 128MB of ram each). This in turn leads to a certain mail-volume-handling ability which is one number for normal operations, and another lower number when something is blown up somewhere on the net. Below is an example. At this particular moment there are 28 sendmail's running the queue out of 80 possible. Most of the sendmail's are stuck in the 'from queue' state, which translates mostly to 'waiting on DNS' or 'waiting on the first connection'. Only 5 of the sendmail's are actually pushing data! And this is *with* .hoststat and MinQueueAge turned on. root 19203 28673 280 688 0:00.02 sendmail: OAA27931: from queue root 19361 28206 280 624 0:00.02 sendmail: VAA08388: from queue root 19199 27293 280 740 0:00.04 sendmail: SAA06612: from queue root 18810 26274 280 736 0:00.03 sendmail: SAA10910: from queue root 19175 25305 280 696 0:00.02 sendmail: RAA03845: from queue root 19153 24911 280 692 0:00.02 sendmail: QAA17152: from queue root 19188 24445 280 680 0:00.03 sendmail: OAA11702: from queue root 19216 22315 280 628 0:00.02 sendmail: WAA19029: from queue root 18963 21238 280 700 0:00.03 sendmail: MAA11108: from queue root 19227 19459 280 648 0:00.02 sendmail: XAA25929: from queue root 19347 17960 280 872 0:00.04 sendmail: OAA13728 holly.colostate.edu.: client DATA status root 19176 17824 280 700 0:00.02 sendmail: QAA27734: from queue root 10717 16684 280 1200 0:00.07 sendmail: OAA12139: from queue root 19262 15796 280 844 0:00.03 sendmail: OAA12946 shellx.best.com.: client MAIL root 19165 15164 280 692 0:00.02 sendmail: DAA01515: from queue root 19275 13982 280 636 0:00.02 sendmail: BAA14916: from queue root 17660 12336 280 1208 0:00.07 sendmail: NAA04998 mailhost.tardis.ed.ac.uk.: client EHLO root 19182 12261 280 672 0:00.03 sendmail: LAA21511: from queue root 18882 11900 280 640 0:00.02 sendmail: OAA11000: from queue root 19282 9961 280 688 0:00.02 sendmail: SAA18855: from queue root 19189 9740 280 740 0:00.03 sendmail: TAA11556: from queue root 19235 8836 280 740 0:00.03 sendmail: MAA21528: from queue root 19274 6954 280 864 0:00.03 sendmail: LAA00748 bowsoft.bowsoft.com.: user open root 19284 6224 280 680 0:00.02 sendmail: KAA03552: from queue root 18600 3032 280 2300 0:00.27 sendmail: NAA01143 mailhost.cityscape.co.uk.: user open root 19207 3007 280 740 0:00.03 sendmail: RAA02724: from queue root 18734 790 280 648 0:00.02 sendmail: RAA14607: from queue root 19341 565 280 672 0:00.02 sendmail: KAA22982: from queue My 'cascade failure avoidance' strategy works something like this: (this strategy is applicable to any machine running more then one mail message every 10 seconds of volume. Our machines each run about 3 mail messages a second at peak) * set to queue-only in sendmail.cf file. If you set to background, you can crash the machines if someone mail bombs you or even if (when used as a relay)... a commonly used destination is down. * do not used the -q15m or equivalent option to sendmail at all. * run a program that maintains not more then N (80 in my case) sendmail -q's running the queue, and fork/exec's a new one once every 2 minutes while the total number running is less then N. Choose N such that all N sendmail's can be running without killing the machine. * Set MinQueueAge to at least 30 minutes. I use an hour, and turn on the .hoststat failure-caching features (if using sendmail 8.8.4 or greater). Leave the host status timeout at 30m. You can adjust MinQueueAge depending on your situation. For example, on our mailing lists machine we use a 2 hour MinQueueAge and actually run several maintainence processes that run X, Y, and Z number of sendmail's with different MinQueueAge's. * Turn ON ForkEachJob ... you don't really have a choice. If you don't, the sendmail's running the queue can build to five times their normal RSS and effectively run the machine out of memory. Unfortunately, turning ForkEachJob off also blows the connection cache... oh well. Maybe a later version of sendmail will allow one to specify how many jobs per fork one can run :-). There's a story here: We once received a mail bomb where the bomber sent the entire message body as a header. There were only about 50 of these messages in the mail queue, but they caused the sendmail's running the queue to grow to about 8 MBytes RSS. The machine, with 128 MB of ram, started to swap! Holy cow! I've also tried to turn off ForkEachJob with the latest 8.8.4 release... it doesn't work. The sendmail's still build up to around a 3 MB RSS and kill the machine. Some people believe that ForkEachJob uses more cpu resources... in fact it uses about the same cpu resources but only 1/4 of the memory, and relatively consistant short term memory use at that so the page reclaim rate is very good. So, ForkEachJob is turned on. That's it... pretty simple, eh? -Matt