*BSD News Article 56597

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.hawaii.edu!ames!agate!howland.reston.ans.net!usc!news.cerf.net!news.titan.com!news.tcst.com!op.net!candle.pha.pa.us!not-for-mail
From: root@candle.pha.pa.us (Bruce Momjian)
Newsgroups: comp.unix.bsd.bsdi.misc
Subject: Re: BSDI 2.0.1 swapspace leak fix?
Date: 11 Dec 1995 00:32:33 GMT
Organization: a consultant's basement
Lines: 216
Message-ID: <4afu71$ai@picasso.op.net>
References: <4a0hk0$20d@news2.ucsd.edu> <4aa9mf$6pr@picasso.op.net> <4aaehi$eav@moon.igcom.net>
NNTP-Posting-Host: s1-03.ppp.op.net

David Bauman (david@terra.igcom.net) wrote:
: : Brian Kantor (brian@nothing.ucsd.edu) wrote:
: : : We're suffering from running out of swapspace; it appears that it's a
: : : known problem with BSDI 2.0.1 (and probably earlier) where processes
: : : that fork chew up swapspace.  Any patch to fix this yet?
: 
: : I asked BSDI if the next release will fix this problem and was told the
: : "swap overallocation" bug has been improved but not eliminated in the
: : 2.0/2.0.1 release.  I read this to say they do not have a fix for this
: : in the next release.
: 
: This is unacceptable.  I have multiple machines running BSDI BSD/OS 2.0.1 
: and my whole business relies on BSD/OS.  The price that BSD charges for 
: their software should run with NO bugs whatsoever.

I don't know how realistic that standard is.  If you required all
software you paid for to ship with no bugs you wouldn't have much
software.

I worked with Mike Karels to identify the bug in 1.0.  He has looked
into it and talked to the initial Mach developer and the solution is not
easy.  It exists in all 386 BSD implementions as far as I know.

They have made some changes to make 2.0 less prone to this problem.

Attached is a posting addressing the issue.

---------------------------------------------------------------------------

Because the topic has come up, I would like to just clear up the cause
of swap overallocation bug, and to confirm that it can lock up the
machine completely with no warning.  I know because I worked with Mike
Karels to find the bug.  Basically with 5MB of RAM and no X-Windows, I
locked up every seven days.    "pstat -T" showed swap allocated getting
bigger and bigger until the system locked up.

This is probably not the problem this particular person is having, but
it is possible.  I have advised the user to log "pstat -T" from a cron
job to elminate this as a possible cause.

The new version of BSD/OS does not fix this bug, though Mike Karels is
aware of it and certainly wants to fix it, but it is a major job.  The
easiest solution for users is to add more RAM so the condition does not
occur.  With 16MB RAM running X, I get no lockups.

Attached is an old posting outlining the problem:

---------------------------------------------------------------------------


>From maillist Tue Dec 14 23:52:45 1993
Subject: Swap overallocation
To: bsdi-users@bsdi.com (BSDI mailing list)
Date: Tue, 14 Dec 1993 23:52:45 -40962758 (EST)
Cc: karels@bsdi.com
X-Mailer: ELM [version 2.4 PL20]
Content-Type: text
Content-Length: 5046      
Status: OR

I am running BSD/386 from BSDI.  When running with 5MB of RAM, I found
that the system locked up about once every week.  In researching the
problem with Mike Karels of BSDI, I think we have found a bug that
exists on BSD/386 and most free 386-based *BSD systems.  Here are the
details.

First, let me define copy-on-write(COW):  When a process forks, the OS
maps the address space of both the parent and child to the same memory
pages, and both process start running.  If either process makes changes
to its shared memory pages, the OS makes a copy of the shared page.  One
process gets the original, another gets the copy.

Ok, here is the bug we have found:  If a process forks a child, and the
parent writes to its memory pages (forcing a COW), and those pages are
paged out to swap before the child exec's or exits, the parent's and
child's<!> swap space is not released until the parent exits.

The ramifications of this is that if you have a long-running process
that forks a lot, like a shell, and your system does a lot of paging,
those long-running process will allocated more and more swap until they
exit.  It is particularly a problem with non-csh shells (csh, uses
vfork and exec), because they often run scripts by forking themselves,
and the child running the script may exist for quite some time without
exec'ing or exit'ing.

Here are Mike Karels more detailed words on the subject:

---------------------------------------------------------------------------

... The problem here is that if the process forks, and the parent modifies
data pages while the child exists, it must make copies of those pages
(copy-on-write after fork).  If those copies are paged out, then both
the copies and the originals will occupy space until the parent exits,
even if the child exits.

I think I described the chains of shadow objects that were accumulating,
and the fact that those are supposed to get coalesced.  It turns out
that the code to coalesce does not work if an object has been paged out.
This is the scenario that causes problems:

	- a long-lived program forks repeatedly,
	- the parent modifies data space before the child does exec or exit,
	  and
	- the parent's modified pages get paged out before child does exec
	  or exit.

The only situation in which this seems to be a problem is if a login
shell (or any long-running interactive shell) runs scripts by forking
and running them directly.  This will not happen with csh; I don't
know about ksh or bash.  (It does not happen with csh because it uses
vfork, and re-exec's itself if running a csh script).  It also does
not happen if the scripts are "executable" scripts, i.e. those that start
with #!/bin/sh.  It is also a problem only if the script or other system
activity uses enough memory for the shell to be paged out while the
script is running.

The bad news is that this problem is not easy to solve... However, I
think there are some workarounds that can be used for the moment.

---------------------------------------------------------------------------

My experience with 5MB of RAM and 20MB of swap running several screens
(no X, no networking) was that because I never logged out, my shell
accumulated swap space until it ran out.  About every 7 days the system
had to be rebooted (everything had stopped running).

I hope this helps explain some lockup problems some people may be
having.  Has anyone solved this problem?  I don't know the specifics of
why it is occurring, or why it is hard to solve, but if someone has
already solved it, I would love to hear about it.

Attached is a program that illustrates the problem.  With MAKE_CHILD
undefined, swap space is allocated the first time through the loop, and
stays pretty constant.  With MAKE_CHILD defined, swap decreases rapidly
each time through the loop until the system runs out of swap space and
locks up.  Note that each child is killed before the loop is restarted,
yet the swap space continues to decline rapidly.  You will need to
define some things at the top before you compile, including your systems
program for monitoring swap space.

---------------------------------------------------------------------------


/* show swap overallocation bug in child processes */
/* Bruce Momjian, root@candle.uucp */

/* tabs = 4 */

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>

#define MAKE_CHILD

/* make this higher if you have more than 8 MB of RAM */
#define SYSTEM_RAM	4

/* program to show remaining swap space, vmstat? */
#define SHOWSWAP	"swaptotal"		

int k = 1024;

void main()
{
	char *y;
	int c_pid;
	int j;
	char *t;

	/* make my address space big */
	if ( (y=malloc(SYSTEM_RAM*k*k)) == NULL)
	{
		perror("Malloc");
		exit(1);
	}

	while (1)
	{
#ifdef MAKE_CHILD
		if ((c_pid = fork()) == 0)
			sleep(1000);
#endif

		/* parent touches memory to force COW copy */
		for (j=0,t=y; j < SYSTEM_RAM*k*k; j+=k)
		{
			*t = 'x';
			t += k;
		}

#ifdef MAKE_CHILD
		kill(c_pid,SIGHUP);
#endif

		puts("done ");
		system(SHOWSWAP);
	}
	/* NOT REACHED */
}











-- 
Bruce Momjian                          |  830 Blythe Avenue
root@candle.pha.pa.us                  |  Drexel Hill, Pennsylvania 19026 
  +  If your life is a hard drive,     |  (610) 353-9879(w) 
  +  Christ can be your backup.        |  (610) 853-3000(h)