*BSD News Article 18640

Path: sserve!newshost.anu.edu.au!munnari.oz.au!constellation!osuunx.ucc.okstate.edu!moe.ksu.ksu.edu!crcnis1.unl.edu!wupost!howland.reston.ans.net!darwin.sura.net!udel!sbcs.sunysb.edu!stark.UUCP!gene
From: gene@cs.sunysb.edu!stark (Gene Stark)
Newsgroups: comp.os.386bsd.questions
Subject: Re: KERNEL PANIC ANYONE
Date: 19 Jul 93 14:35:00
Organization: Gene Stark's home system
Lines: 44
Message-ID: <GENE.93Jul19143500@stark.uucp>
References: <22222@durer.cme.nist.gov> <1993Jul12.180752.29982@prepress.com>
	<GENE.93Jul14074659@stark.uucp> <CA8wL9.7G4@veda.is>
	<GENE.93Jul17074743@stark.uucp> <22brtkINNi99@kralizec.zeta.org.au>
NNTP-Posting-Host: stark.uucp
In-reply-to: bde@kralizec.zeta.org.au's message of 19 Jul 1993 01:57:40 +1000

In article <22brtkINNi99@kralizec.zeta.org.au> bde@kralizec.zeta.org.au (Bruce Evans) writes:

   [Lots of previous quotes about bootstrap and kernel bzeroing BSS]

   Maybe that problem I noticed with the DMA bounce buffer at 0xFE098000?
   Together with the extra 7 pages for the initial page table, this would
   make the limit 0xFE091000.

Now I really don't know what's going on.  In trying to upgrade to
patchkit 0.2.4 I started getting crashes when I installed the new interrupt
code.  The problem appeared to be triggered by my TW-523 driver, because
the system crashed when I first tried to use that device.
(Bruce, you know about this because I assumed it had something to do with
the interrupt code, and I subjected you to a barrage of E-mail messages :-))

After a long weekend of debugging, the best I could determine was that
a stray pointer somewhere was causing memory to be trashed, leading to
a crash *much later* when a corrupted stack caused a return from "hardclock"
to jump into never-never land.  I did get a chance to appreciate in detail
Bruce's very nice work on the interrupt code, though.

I had noticed the "kernel must bzero" messages coming from the bootstrap
program, but I had ignored them up until that point.  It occurred to me
that a nonzero initialization of a BSS pointer variable could produce
problems of the type that I was seeing.  I rebuilt my kernel with some
inessential stuff left out so that it loads under 0xFE090000, and the
system has been up ever since.

I saw the comments in locore.s, and I assumed that I had correctly surmised
the problem.  Now when I take a closer look, I realize that I don't have the
foggiest idea.  I don't know about the bounce buffer stuff.  Where are those
defined?  I don't have any SCSI device drivers loaded, and I only have 8MB
of memory.  I haven't yet dug in to find out what is going on with the page
tables, GDT, etc., and I don't have ready access to the proper hardware
manuals to really understand what is going on.

So, I still conclude that there is a very subtle problem with loading
kernels in a certain size range, but I really am not very sure what is
causing it.

							- Gene Stark

--
							stark@cs.sunysb.edu