Return to BSD News archive
Xref: sserve comp.bugs.4bsd:1909 comp.unix.bsd:5500 comp.sys.sun.admin:5366 Path: sserve!manuel!munnari.oz.au!yoyo.aarnet.edu.au!news.adelaide.edu.au!cs.adelaide.edu.au!cagney From: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid) Newsgroups: comp.bugs.4bsd,comp.unix.bsd,comp.sys.sun.admin Subject: Bug fix for amd, stop it `hanging machines' Message-ID: <19p2vdINNb4i@huon.itd.adelaide.edu.au> Date: 23 Sep 92 06:31:09 GMT Reply-To: cagney@cs.adelaide.edu.au (Andrew Cagney - aka Noid) Followup-To: comp.unix.bsd Organization: Comp Sci, Uni of Adelaide, Australia Lines: 75 NNTP-Posting-Host: winnie.cs.adelaide.edu.au [ Followups to: comp.unix.bsd ] Below is a patch for amd5.3beta that fixes one bug that results in amd `hanging' a unix system. The bug was present in earlier versions of amd. A machine which has this problem will have an error-hook in the amq output and many many processes in the wait state. The problem occures when (for a given mount map): 1. upon evaluating a poorly constructed mount map entry (eg cd /bug/bad) amd finds no members match so it creates an error-hook eg: bad type:=link;fs:=/usr/local;host==edam (the host here is not edam, so nothing matches -> error-hook) 2. another entry in the same map (eg cd /bug/good) is evaluated and it may or may not result in a mount, however one of its members also doesn't match. good type:=link;fs:=/usr/local;host==achilles \ || type:=link;fs:=/no.such.app (the first entry misses as the host isn't achilles either, the second entry is ok) 3. a reference to the first (eg cd /bug/bad) entry is made (all within 19 seconds which is the timeout for the error-hook entry). Things go wrong at point 2. Amd maintains an internal list of all the `mounts' including error-hook mounts. During 2, amd attempts to re-use the error-hook created in 1. In doing this the routine find_mntfs() incorrectly changes the error-hook status to (effectively) `being mounted in the background'. This is something that will never finish :-). >From this point on, any lookup on the above map that finds the error hook will be marked as `being mounted in the background' and will hence never return. The patch below, stops amd modifying the error-hook entry (I don't see any reason for doing this) when find_mntfs() finds an error-hook mount. I should note that this patch only fixes the above case. Similar problems occure (on rare occasions) when a remote mount is being slow. Does any one have a more general fix? Andrew Cagney Computer Science Adelaide University *** mntfs.c.orig Mon Aug 10 17:58:09 1992 --- mntfs.c Mon Aug 10 18:52:45 1992 *************** *** 171,184 **** --- 171,186 ---- if (ops == &efs_ops) { /* * If the existing ops are not efs_ops * then continue... */ if (mf->mf_ops != &efs_ops) continue; + else + return dup_mntfs(mf); } else /* ops != &efs_ops */ { /* * If the existing ops are efs_ops * then continue... */ if (mf->mf_ops == &efs_ops) continue; .......................