*BSD News Article 61611

#! rnews 6181 bsd
Newsgroups: comp.unix.bsd.bsdi.misc
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.cs.su.oz.au!metro!metro!munnari.OZ.AU!spool.mu.edu!agate!howland.reston.ans.net!gatech!newsfeed.internetmci.com!in2.uu.net!news.new-york.net!spcuna!spcvxb!terry
From: terry@spcvxb.spc.edu (Terry Kennedy, Operations Mgr.)
Subject: Re: Help please....reboots when changin /etc
X-Nntp-Posting-Host: spcvxa.spc.edu
References: <4g5bq4$gqm@news.mixcom.com>
Sender: news@spcuna.spc.edu (Network News)
X-Nntp-Posting-User: TERRY
Organization: St. Peter's College, US
Date: Sat, 17 Feb 1996 21:33:34 GMT
Message-ID: <1996Feb17.163334.1@spcvxb.spc.edu>
Lines: 112

In article <4g5bq4$gqm@news.mixcom.com>, root@I_should_not_put_my_domain_in_etc_NNTP_INEWS_DOMAIN (root) writes:
> 	I have a BSDI server running.  It seems to be running perfectly 
> except for one major, anooying problem.  Any time I change any file in the
> /etc directory, the kernel panics and reboots.  I can't change the password,
> add ftp access, etc.  I'm an experienced Unix administrator so it's not
> that I'm doing something stupid.  Can anyone help me out with this?

  You don't say what version of BSD/OS, nor what the panic message was, so
there's not a lot I can do to help you. My advice would be to make sure you
have all of the patches for your version applied, and to read the following
article I wrote on debugging and reporting crash dumps. Though it's targeted
to the "trap type 12" panic, the debugging hints apply to all panics.

[Begin article]
Jim Hribnak <hribnak@nucleus.com> writes:

> Trap type 12 code 0xefbf0003 eip 0xf00...

> Any help on this I would certainly appreciate it...

  [I think something similar to what follows is in the Lizard Book - however,
it bears repeating].

  Lots of the following (tracing through the crash and pinpointing what is
causing it) will almost certainly require the kernel sources to be loaded.

  Trap type 12's come in two main flavors - bad hardware and bad software.
The hardware kind will generally happen randomly and will give different
values each time. Software problems will generally have some sort of pattern
and will usually have identical values for most of the registers.

  For hardware problems, first try disabling external motherboard cache,
the CPU cache, then adding wait states to memory and cache, then try swap-
ping the motherboard with a different brand to see if the problem goes
away.

  For software problems, build a kernel with debugging enabled (config -g).
Save the resulting bsd.gdb file for later and install the new kernel in
/bsd. Boot the system and produce several crashes (if you can make it crash,
or wait for it to crash a few times). Make sure you have a /var/crash dir-
ectory and that it's got enough space (crash dumps take up disk space equal
to the amount of main memory - a 64MB system makes a 64MB crash file).

  Now for the fun part. Do a "gdb -k /bsd.gdb bsdcore.#" and then a "bt" to
see where the system is faulting. In the following example, the problem is
in the line that starts with "#3". Looking at that line, we see that we
were in a fchmod() syscall and failed at line 1499 in vfs_syscalls.c. If 
there are more crashes, we'd look at all of them to see if they're all hap-
pening at the same place.

  Next, we look at the source for the module that's been identified, and
look at the lines around the line number called out in the debug display.

  We see that they are:

1497	LEASE_CHECK(vp, p, p->p_ucred, LEASE_WRITE);
1498	VOP_LOCK(vp);
1499	if (vp->v_mount->mnt_flag & MNT_RDONLY)
1500		error = EROFS;
1501	else {

  So it appears that we have a problem evaluating vp->v_mount->mnt_flag.
In fact, it's dereferencing a null pointer.

  So, now we can ask BSDI support if they know of a problem with the fchmod
syscall crashing due to a null pointer problem. It turns out they do, and
this is fixed in 2.1. Presumably when you ask them about your problem, they
won't know about it and may want your bsd.gdb and crash dumps to help pin-
point the problem. Or, it may be obvious from the crash where the problem
is. In any event, by providing a concise description of the problem, you'll
be able to get a faster answer. You may even be able to fix the problem
yourself (if you're feeling adventurous), in which case you should send your
diff to BSDI along with the description of what problem it fixes.

  Here's the complete debugging session including finding out what user and
program caused the crash:

Script started on Wed Jan  3 08:09:18 1996
(0:1) ritz:/home/terry/crash# gdb -k /bsd.gdb bsdcore.9
#0  boot (arghowto=256) at ../../i386/i386/machdep.c:564
../../i386/i386/machdep.c:564: No such file or directory.
sp=efbffe0c pc=f00841e7 psr=0
panic: page fault
(kgdb) bt
#0  boot (arghowto=256) at ../../i386/i386/machdep.c:564
#1  0xf0017665 in panic (fmt=0xf00865b3 "page fault")
    at ../../kern/subr_prf.c:126
#2  0xf0086973 in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -267868940, 
      tf_esi = -257798528, tf_ebp = -272629928, tf_isp = -272630100, 
      tf_ebx = 0, tf_edx = 91936, tf_ecx = 26400, tf_eax = 0, tf_trapno = 12, 
      tf_err = 0, tf_eip = -268260428, tf_cs = 8, tf_eflags = 66198, 
      tf_esp = 0, tf_ss = -260425472}) at ../../i386/i386/trap.c:227
#3  0xf002abb4 in fchmod (p=0xf07a3900, uap=0xefbfffa4, retval=0xefbfff9c)
    at ../../kern/vfs_syscalls.c:1499
#4  0xf0086f3b in syscall (frame={sf_edi = 120196, sf_esi = 120100, 
      sf_ebp = -272639472, sf_ebx = 2400, sf_edx = 0, sf_ecx = 0, 
      sf_eax = 124, sf_eflags = 535, sf_eip = 24425, sf_cs = 31, 
      sf_esp = -272639484, sf_ss = 39}) at ../../i386/i386/trap.c:556
(kgdb) set $p = (struct proc *) 0xf07a3900
(kgdb) print $p.p_pid
$1 = 28142
(kgdb) quit
(0:2) ritz:/home/terry/crash# ps -aux -M bsdcore.9 | grep 28142
benjamin 28142  0.0  0.3   328  208  p4- R     7:28AM    0:06.77 /home/benjamin
(0:3) ritz:/home/terry/crash# exit
Script done on Wed Jan  3 08:11:02 1996
[End article]

	Terry Kennedy		  Operations Manager, Academic Computing
	terry@spcvxa.spc.edu	  St. Peter's College, Jersey City, NJ USA
        +1 201 915 9381 (voice)   +1 201 435-3662 (FAX)