Return to BSD News archive
#! rnews 6181 bsd Newsgroups: comp.unix.bsd.bsdi.misc Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.cs.su.oz.au!metro!metro!munnari.OZ.AU!spool.mu.edu!agate!howland.reston.ans.net!gatech!newsfeed.internetmci.com!in2.uu.net!news.new-york.net!spcuna!spcvxb!terry From: terry@spcvxb.spc.edu (Terry Kennedy, Operations Mgr.) Subject: Re: Help please....reboots when changin /etc X-Nntp-Posting-Host: spcvxa.spc.edu References: <4g5bq4$gqm@news.mixcom.com> Sender: news@spcuna.spc.edu (Network News) X-Nntp-Posting-User: TERRY Organization: St. Peter's College, US Date: Sat, 17 Feb 1996 21:33:34 GMT Message-ID: <1996Feb17.163334.1@spcvxb.spc.edu> Lines: 112 In article <4g5bq4$gqm@news.mixcom.com>, root@I_should_not_put_my_domain_in_etc_NNTP_INEWS_DOMAIN (root) writes: > I have a BSDI server running. It seems to be running perfectly > except for one major, anooying problem. Any time I change any file in the > /etc directory, the kernel panics and reboots. I can't change the password, > add ftp access, etc. I'm an experienced Unix administrator so it's not > that I'm doing something stupid. Can anyone help me out with this? You don't say what version of BSD/OS, nor what the panic message was, so there's not a lot I can do to help you. My advice would be to make sure you have all of the patches for your version applied, and to read the following article I wrote on debugging and reporting crash dumps. Though it's targeted to the "trap type 12" panic, the debugging hints apply to all panics. [Begin article] Jim Hribnak <hribnak@nucleus.com> writes: > Trap type 12 code 0xefbf0003 eip 0xf00... > Any help on this I would certainly appreciate it... [I think something similar to what follows is in the Lizard Book - however, it bears repeating]. Lots of the following (tracing through the crash and pinpointing what is causing it) will almost certainly require the kernel sources to be loaded. Trap type 12's come in two main flavors - bad hardware and bad software. The hardware kind will generally happen randomly and will give different values each time. Software problems will generally have some sort of pattern and will usually have identical values for most of the registers. For hardware problems, first try disabling external motherboard cache, the CPU cache, then adding wait states to memory and cache, then try swap- ping the motherboard with a different brand to see if the problem goes away. For software problems, build a kernel with debugging enabled (config -g). Save the resulting bsd.gdb file for later and install the new kernel in /bsd. Boot the system and produce several crashes (if you can make it crash, or wait for it to crash a few times). Make sure you have a /var/crash dir- ectory and that it's got enough space (crash dumps take up disk space equal to the amount of main memory - a 64MB system makes a 64MB crash file). Now for the fun part. Do a "gdb -k /bsd.gdb bsdcore.#" and then a "bt" to see where the system is faulting. In the following example, the problem is in the line that starts with "#3". Looking at that line, we see that we were in a fchmod() syscall and failed at line 1499 in vfs_syscalls.c. If there are more crashes, we'd look at all of them to see if they're all hap- pening at the same place. Next, we look at the source for the module that's been identified, and look at the lines around the line number called out in the debug display. We see that they are: 1497 LEASE_CHECK(vp, p, p->p_ucred, LEASE_WRITE); 1498 VOP_LOCK(vp); 1499 if (vp->v_mount->mnt_flag & MNT_RDONLY) 1500 error = EROFS; 1501 else { So it appears that we have a problem evaluating vp->v_mount->mnt_flag. In fact, it's dereferencing a null pointer. So, now we can ask BSDI support if they know of a problem with the fchmod syscall crashing due to a null pointer problem. It turns out they do, and this is fixed in 2.1. Presumably when you ask them about your problem, they won't know about it and may want your bsd.gdb and crash dumps to help pin- point the problem. Or, it may be obvious from the crash where the problem is. In any event, by providing a concise description of the problem, you'll be able to get a faster answer. You may even be able to fix the problem yourself (if you're feeling adventurous), in which case you should send your diff to BSDI along with the description of what problem it fixes. Here's the complete debugging session including finding out what user and program caused the crash: Script started on Wed Jan 3 08:09:18 1996 (0:1) ritz:/home/terry/crash# gdb -k /bsd.gdb bsdcore.9 #0 boot (arghowto=256) at ../../i386/i386/machdep.c:564 ../../i386/i386/machdep.c:564: No such file or directory. sp=efbffe0c pc=f00841e7 psr=0 panic: page fault (kgdb) bt #0 boot (arghowto=256) at ../../i386/i386/machdep.c:564 #1 0xf0017665 in panic (fmt=0xf00865b3 "page fault") at ../../kern/subr_prf.c:126 #2 0xf0086973 in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -267868940, tf_esi = -257798528, tf_ebp = -272629928, tf_isp = -272630100, tf_ebx = 0, tf_edx = 91936, tf_ecx = 26400, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -268260428, tf_cs = 8, tf_eflags = 66198, tf_esp = 0, tf_ss = -260425472}) at ../../i386/i386/trap.c:227 #3 0xf002abb4 in fchmod (p=0xf07a3900, uap=0xefbfffa4, retval=0xefbfff9c) at ../../kern/vfs_syscalls.c:1499 #4 0xf0086f3b in syscall (frame={sf_edi = 120196, sf_esi = 120100, sf_ebp = -272639472, sf_ebx = 2400, sf_edx = 0, sf_ecx = 0, sf_eax = 124, sf_eflags = 535, sf_eip = 24425, sf_cs = 31, sf_esp = -272639484, sf_ss = 39}) at ../../i386/i386/trap.c:556 (kgdb) set $p = (struct proc *) 0xf07a3900 (kgdb) print $p.p_pid $1 = 28142 (kgdb) quit (0:2) ritz:/home/terry/crash# ps -aux -M bsdcore.9 | grep 28142 benjamin 28142 0.0 0.3 328 208 p4- R 7:28AM 0:06.77 /home/benjamin (0:3) ritz:/home/terry/crash# exit Script done on Wed Jan 3 08:11:02 1996 [End article] Terry Kennedy Operations Manager, Academic Computing terry@spcvxa.spc.edu St. Peter's College, Jersey City, NJ USA +1 201 915 9381 (voice) +1 201 435-3662 (FAX)