*BSD News Article 5452

Xref: sserve comp.bugs.4bsd:1909 comp.unix.bsd:5500 comp.sys.sun.admin:5366
Path: sserve!manuel!munnari.oz.au!yoyo.aarnet.edu.au!news.adelaide.edu.au!cs.adelaide.edu.au!cagney
From: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid)
Newsgroups: comp.bugs.4bsd,comp.unix.bsd,comp.sys.sun.admin
Subject: Bug fix for amd, stop it `hanging machines'
Message-ID: <19p2vdINNb4i@huon.itd.adelaide.edu.au>
Date: 23 Sep 92 06:31:09 GMT
Reply-To: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid)
Followup-To: comp.unix.bsd
Organization: Comp Sci, Uni of Adelaide, Australia
Lines: 75
NNTP-Posting-Host: winnie.cs.adelaide.edu.au

[ Followups to: comp.unix.bsd ]

Below is a patch for amd5.3beta that fixes one bug that results in amd
`hanging' a unix system.  The bug was present in earlier versions of amd.

A machine which has this problem will have an error-hook in the amq output
and many many processes in the wait state.

The problem occures when (for a given mount map):

	1. upon evaluating a poorly constructed mount map entry
	   (eg cd /bug/bad) amd finds no members match so it creates
	   an error-hook eg:

		bad     type:=link;fs:=/usr/local;host==edam

	   (the host here is not edam, so nothing matches -> error-hook)

	2. another entry in the same map (eg cd /bug/good) is evaluated and
	   it may or may not result in a mount, however one of its members
	   also doesn't match.

		good    type:=link;fs:=/usr/local;host==achilles \
			|| type:=link;fs:=/no.such.app

	   (the first entry misses as the host isn't achilles either, the
	   second entry is ok)

	3. a reference to the first (eg cd /bug/bad) entry is made (all
	   within 19 seconds which is the timeout for the error-hook entry).

Things go wrong at point 2. Amd maintains an internal list of all the `mounts'
including error-hook mounts.  During 2, amd attempts to re-use the error-hook
created in 1.  In doing this the routine find_mntfs() incorrectly changes the
error-hook status to (effectively) `being mounted in the background'. This is
something that will never finish :-).

>From this point on, any lookup on the above map that finds the error hook
will be marked as `being mounted in the background' and will hence never
return.

The patch below, stops amd modifying the error-hook entry (I don't see any
reason for doing this) when find_mntfs() finds an error-hook mount.

I should note that this patch only fixes the above case.  Similar problems
occure (on rare occasions) when a remote mount is being slow.  Does any one
have a more general fix?

					Andrew Cagney
					Computer Science
					Adelaide University


*** mntfs.c.orig        Mon Aug 10 17:58:09 1992
--- mntfs.c     Mon Aug 10 18:52:45 1992
***************
*** 171,184 ****
--- 171,186 ----
                        if (ops == &efs_ops) {
                                /*
                                 * If the existing ops are not efs_ops
                                 * then continue...
                                 */
                                if (mf->mf_ops != &efs_ops)
                                        continue;
+                               else
+                                       return dup_mntfs(mf);
                        } else /* ops != &efs_ops */ {
                                /*
                                 * If the existing ops are efs_ops
                                 * then continue...
                                 */
                                if (mf->mf_ops == &efs_ops)
                                        continue;
.......................