Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!news.ececs.uc.edu!news.kei.com!news.mathworks.com!news.sprintlink.net!news-peer.sprintlink.net!arclight.uoregon.edu!nntp.primenet.com!news1.best.com!nntp1.best.com!usenet
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.mail.sendmail,comp.mail.smail,comp.unix.bsd.freebsd.misc
Subject: Re: Sendmail vs. Smail...
Date: 9 Dec 1996 22:34:19 GMT
Organization: BEST Internet Communications, Inc.
Lines: 144
Message-ID: <58i45b$apn@nntp1.best.com>
References: <57tf61$gq7@raven.eva.net> <589e5u$35t@stdismas.bogon.com> <589jsm$o2v@ezekiel.eunet.ie> <58be63$eu@stdismas.bogon.com>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.mail.sendmail:34973 comp.mail.smail:2671 comp.unix.bsd.freebsd.misc:32318
:In article <58be63$eu@stdismas.bogon.com>,
:John Henders <jhenders@bogon.com> wrote:
:>In <589jsm$o2v@ezekiel.eunet.ie> nick@eunet.ie (Nick Hilliard) writes:
:>
:>>John Henders (jhenders@bogon.com) said:
>>: Then why don't you look at exim. It really has the best of both worlds.
:>>: Human readably config files, and much more flexibility than smail. How
:>
:>>... but retains smail's poor queueing strategy and lack of header rewriting
:>>capabilities :-(
:>
:>Umm, exim has header rewriting. I use it. I don't know what queue
:>strategy problems you mean, as I've had 7000 messages queued up for a
:>host we provide secondary MX when it went down with a hard drive crash
:>for a half a day, and it took less than 40 minutes to send them all when
:>the machine came up again. A machine I recently changed from smail to
:>exim used to typically have 1-200 messages in the input queue at any one
:>time, now it averages about 40.
:>
:>
:>--
:> Artificial Intelligence stands no chance against Natural Stupidity.
:> GAT d- -p+(--) c++++ l++ u++ t- m--- W--- !v
:> b+++ e* s-/+ n-(?) h++ f+g+ w+++ y*
:>
This is a relatively simple case because most of your queue was going
to a single machine.
Queueing strategy problems tend to become interesting when you have
significant outgoing mail that is more randomized.... goes to many
different destinations. The problem is that you wind up with lots of
sendmail's hanging on timeouts... either connection timeouts or DNS
timeouts.
If you have just one destination, this isn't a problem because all your
sendmail's get stuck and unstuck at the same time. However, if you
have multiple destinations, you can wind up in a situation where mail
to good destinations gets backed up because sendmail's are hanging on
mail to bad destinations. The sendmail's running the queue get starved
for real traffic because they are all 'blown' on the bad destinations,
and mail to the good destinations comes in faster then the sendmail's
running the queue can process them. This leads to a cascade failure.
The limiting factor is that you can only run so many sendmail (or
equivalent) processes on any given machine at once. I run about 80
sendmail's on the queue in each of our FreeBSD mail machines (two 200Mhz
pentium pro systems w/ 128MB of ram each). This in turn leads to a
certain mail-volume-handling ability which is one number for normal
operations, and another lower number when something is blown up somewhere
on the net.
Below is an example. At this particular moment there are 28 sendmail's
running the queue out of 80 possible. Most of the sendmail's are stuck
in the 'from queue' state, which translates mostly to 'waiting on DNS'
or 'waiting on the first connection'. Only 5 of the sendmail's
are actually pushing data! And this is *with* .hoststat and MinQueueAge
turned on.
root 19203 28673 280 688 0:00.02 sendmail: OAA27931: from queue
root 19361 28206 280 624 0:00.02 sendmail: VAA08388: from queue
root 19199 27293 280 740 0:00.04 sendmail: SAA06612: from queue
root 18810 26274 280 736 0:00.03 sendmail: SAA10910: from queue
root 19175 25305 280 696 0:00.02 sendmail: RAA03845: from queue
root 19153 24911 280 692 0:00.02 sendmail: QAA17152: from queue
root 19188 24445 280 680 0:00.03 sendmail: OAA11702: from queue
root 19216 22315 280 628 0:00.02 sendmail: WAA19029: from queue
root 18963 21238 280 700 0:00.03 sendmail: MAA11108: from queue
root 19227 19459 280 648 0:00.02 sendmail: XAA25929: from queue
root 19347 17960 280 872 0:00.04 sendmail: OAA13728 holly.colostate.edu.: client DATA status
root 19176 17824 280 700 0:00.02 sendmail: QAA27734: from queue
root 10717 16684 280 1200 0:00.07 sendmail: OAA12139: from queue
root 19262 15796 280 844 0:00.03 sendmail: OAA12946 shellx.best.com.: client MAIL
root 19165 15164 280 692 0:00.02 sendmail: DAA01515: from queue
root 19275 13982 280 636 0:00.02 sendmail: BAA14916: from queue
root 17660 12336 280 1208 0:00.07 sendmail: NAA04998 mailhost.tardis.ed.ac.uk.: client EHLO
root 19182 12261 280 672 0:00.03 sendmail: LAA21511: from queue
root 18882 11900 280 640 0:00.02 sendmail: OAA11000: from queue
root 19282 9961 280 688 0:00.02 sendmail: SAA18855: from queue
root 19189 9740 280 740 0:00.03 sendmail: TAA11556: from queue
root 19235 8836 280 740 0:00.03 sendmail: MAA21528: from queue
root 19274 6954 280 864 0:00.03 sendmail: LAA00748 bowsoft.bowsoft.com.: user open
root 19284 6224 280 680 0:00.02 sendmail: KAA03552: from queue
root 18600 3032 280 2300 0:00.27 sendmail: NAA01143 mailhost.cityscape.co.uk.: user open
root 19207 3007 280 740 0:00.03 sendmail: RAA02724: from queue
root 18734 790 280 648 0:00.02 sendmail: RAA14607: from queue
root 19341 565 280 672 0:00.02 sendmail: KAA22982: from queue
My 'cascade failure avoidance' strategy works something like this:
(this strategy is applicable to any machine running more then
one mail message every 10 seconds of volume. Our machines each
run about 3 mail messages a second at peak)
* set to queue-only in sendmail.cf file. If you set to background,
you can crash the machines if someone mail bombs you or even if
(when used as a relay)... a commonly used destination is down.
* do not used the -q15m or equivalent option to sendmail at all.
* run a program that maintains not more then N (80 in my case)
sendmail -q's running the queue, and fork/exec's a new one once every
2 minutes while the total number running is less then N. Choose N such
that all N sendmail's can be running without killing the machine.
* Set MinQueueAge to at least 30 minutes. I use an hour, and turn on
the .hoststat failure-caching features (if using sendmail 8.8.4 or
greater). Leave the host status timeout at 30m. You can adjust
MinQueueAge depending on your situation. For example, on our mailing
lists machine we use a 2 hour MinQueueAge and actually run several
maintainence processes that run X, Y, and Z number of sendmail's with
different MinQueueAge's.
* Turn ON ForkEachJob ... you don't really have a choice. If you don't,
the sendmail's running the queue can build to five times their normal
RSS and effectively run the machine out of memory. Unfortunately,
turning ForkEachJob off also blows the connection cache... oh well.
Maybe a later version of sendmail will allow one to specify how many
jobs per fork one can run :-).
There's a story here: We once received a mail bomb where the bomber
sent the entire message body as a header. There were only about 50
of these messages in the mail queue, but they caused the sendmail's
running the queue to grow to about 8 MBytes RSS. The machine, with
128 MB of ram, started to swap! Holy cow!
I've also tried to turn off ForkEachJob with the latest 8.8.4 release...
it doesn't work. The sendmail's still build up to around a 3 MB RSS
and kill the machine.
Some people believe that ForkEachJob uses more cpu resources... in fact
it uses about the same cpu resources but only 1/4 of the memory, and
relatively consistant short term memory use at that so the page reclaim
rate is very good.
So, ForkEachJob is turned on.
That's it... pretty simple, eh?
-Matt