*BSD News Article 84616

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!news.ececs.uc.edu!news.kei.com!news.mathworks.com!news.sprintlink.net!news-peer.sprintlink.net!arclight.uoregon.edu!nntp.primenet.com!news1.best.com!nntp1.best.com!usenet
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.mail.sendmail,comp.mail.smail,comp.unix.bsd.freebsd.misc
Subject: Re: Sendmail vs. Smail...
Date: 9 Dec 1996 22:34:19 GMT
Organization: BEST Internet Communications, Inc.
Lines: 144
Message-ID: <58i45b$apn@nntp1.best.com>
References: <57tf61$gq7@raven.eva.net> <589e5u$35t@stdismas.bogon.com> <589jsm$o2v@ezekiel.eunet.ie> <58be63$eu@stdismas.bogon.com>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.mail.sendmail:34973 comp.mail.smail:2671 comp.unix.bsd.freebsd.misc:32318

:In article <58be63$eu@stdismas.bogon.com>,
:John Henders <jhenders@bogon.com> wrote:
:>In <589jsm$o2v@ezekiel.eunet.ie> nick@eunet.ie (Nick Hilliard) writes:
:>
:>>John Henders (jhenders@bogon.com) said:
>>: Then why don't you look at exim. It really has the best of both worlds.
:>>: Human readably config files, and much more flexibility than smail. How
:>
:>>... but retains smail's poor queueing strategy and lack of header rewriting
:>>capabilities :-(
:>
:>Umm, exim has header rewriting. I use it. I don't know what queue
:>strategy problems you mean, as I've had 7000 messages queued up for a
:>host we provide secondary MX when it went down with a hard drive crash
:>for a half a day, and it took less than 40 minutes to send them all when
:>the machine came up again. A machine I recently changed from smail to
:>exim used to typically have 1-200 messages in the input queue at any one
:>time, now it averages about 40.
:>
:>
:>-- 
:>      Artificial Intelligence stands no chance against Natural Stupidity.
:>                GAT d- -p+(--) c++++ l++ u++ t- m--- W--- !v
:>                     b+++ e* s-/+ n-(?) h++ f+g+ w+++ y*
:>

    This is a relatively simple case because most of your queue was going
    to a single machine.

    Queueing strategy problems tend to become interesting when you have
    significant outgoing mail that is more randomized.... goes to many
    different destinations.  The problem is that you wind up with lots of
    sendmail's hanging on timeouts... either connection timeouts or DNS
    timeouts.

    If you have just one destination, this isn't a problem because all your
    sendmail's get stuck and unstuck at the same time.  However, if you
    have multiple destinations, you can wind up in a situation where mail
    to good destinations gets backed up because sendmail's are hanging on
    mail to bad destinations.  The sendmail's running the queue get starved
    for real traffic because they are all 'blown' on the bad destinations,
    and mail to the good destinations comes in faster then the sendmail's 
    running the queue can process them.  This leads to a cascade failure.

    The limiting factor is that you can only run so many sendmail (or
    equivalent) processes on any given machine at once.  I run about 80
    sendmail's on the queue in each of our FreeBSD mail machines (two 200Mhz
    pentium pro systems w/ 128MB of ram each).  This in turn leads to a
    certain mail-volume-handling ability which is one number for normal
    operations, and another lower number when something is blown up somewhere
    on the net.

    Below is an example.  At this particular moment there are 28 sendmail's
    running the queue out of 80 possible.  Most of the sendmail's are stuck
    in the 'from queue' state, which translates mostly to 'waiting on DNS'
    or 'waiting on the first connection'.  Only 5 of the sendmail's 
    are actually pushing data!  And this is *with* .hoststat and MinQueueAge
    turned on.

root     19203 28673  280  688   0:00.02 sendmail: OAA27931: from queue
root     19361 28206  280  624   0:00.02 sendmail: VAA08388: from queue
root     19199 27293  280  740   0:00.04 sendmail: SAA06612: from queue
root     18810 26274  280  736   0:00.03 sendmail: SAA10910: from queue
root     19175 25305  280  696   0:00.02 sendmail: RAA03845: from queue
root     19153 24911  280  692   0:00.02 sendmail: QAA17152: from queue
root     19188 24445  280  680   0:00.03 sendmail: OAA11702: from queue
root     19216 22315  280  628   0:00.02 sendmail: WAA19029: from queue
root     18963 21238  280  700   0:00.03 sendmail: MAA11108: from queue
root     19227 19459  280  648   0:00.02 sendmail: XAA25929: from queue
root     19347 17960  280  872   0:00.04 sendmail: OAA13728 holly.colostate.edu.: client DATA status
root     19176 17824  280  700   0:00.02 sendmail: QAA27734: from queue
root     10717 16684  280 1200   0:00.07 sendmail: OAA12139: from queue
root     19262 15796  280  844   0:00.03 sendmail: OAA12946 shellx.best.com.: client MAIL
root     19165 15164  280  692   0:00.02 sendmail: DAA01515: from queue
root     19275 13982  280  636   0:00.02 sendmail: BAA14916: from queue
root     17660 12336  280 1208   0:00.07 sendmail: NAA04998 mailhost.tardis.ed.ac.uk.: client EHLO
root     19182 12261  280  672   0:00.03 sendmail: LAA21511: from queue
root     18882 11900  280  640   0:00.02 sendmail: OAA11000: from queue
root     19282  9961  280  688   0:00.02 sendmail: SAA18855: from queue
root     19189  9740  280  740   0:00.03 sendmail: TAA11556: from queue
root     19235  8836  280  740   0:00.03 sendmail: MAA21528: from queue
root     19274  6954  280  864   0:00.03 sendmail: LAA00748 bowsoft.bowsoft.com.: user open
root     19284  6224  280  680   0:00.02 sendmail: KAA03552: from queue
root     18600  3032  280 2300   0:00.27 sendmail: NAA01143 mailhost.cityscape.co.uk.: user open
root     19207  3007  280  740   0:00.03 sendmail: RAA02724: from queue
root     18734   790  280  648   0:00.02 sendmail: RAA14607: from queue
root     19341   565  280  672   0:00.02 sendmail: KAA22982: from queue


    My 'cascade failure avoidance' strategy works something like this:

	(this strategy is applicable to any machine running more then
	 one mail message every 10 seconds of volume.  Our machines each
	 run about 3 mail messages a second at peak)

    * set to queue-only in sendmail.cf file.  If you set to background,
      you can crash the machines if someone mail bombs you or even if
      (when used as a relay)... a commonly used destination is down.

    * do not used the -q15m or equivalent option to sendmail at all.

    * run a program that maintains not more then N (80 in my case) 
      sendmail -q's running the queue, and fork/exec's a new one once every 
      2 minutes while the total number running is less then N.  Choose N such 
      that all N sendmail's can be running without killing the machine.

    * Set MinQueueAge to at least 30 minutes.  I use an hour, and turn on
      the .hoststat failure-caching features (if using sendmail 8.8.4 or
      greater).  Leave the host status timeout at 30m.  You can adjust 
      MinQueueAge depending on your situation.  For example, on our mailing
      lists machine we use a 2 hour MinQueueAge and actually run several
      maintainence processes that run X, Y, and Z number of sendmail's with
      different MinQueueAge's.

    * Turn ON ForkEachJob ... you don't really have a choice.  If you don't,
      the sendmail's running the queue can build to five times their normal
      RSS and effectively run the machine out of memory.  Unfortunately,
      turning ForkEachJob off also blows the connection cache... oh well.
      Maybe a later version of sendmail will allow one to specify how many
      jobs per fork one can run :-).

      There's a story here:  We once received a mail bomb where the bomber
      sent the entire message body as a header.  There were only about 50
      of these messages in the mail queue, but they caused the sendmail's
      running the queue to grow to about 8 MBytes RSS.  The machine, with
      128 MB of ram, started to swap!  Holy cow!

      I've also tried to turn off ForkEachJob with the latest 8.8.4 release...
      it doesn't work.  The sendmail's still build up to around a 3 MB RSS
      and kill the machine. 

      Some people believe that ForkEachJob uses more cpu resources... in fact
      it uses about the same cpu resources but only 1/4 of the memory, and
      relatively consistant short term memory use at that so the page reclaim
      rate is very good.

      So, ForkEachJob is turned on.


    That's it... pretty simple, eh?

						-Matt