*BSD News Article 3151

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!news.hawaii.edu!ames!elroy.jpl.nasa.gov!usc!sdd.hp.com!caen!hellgate.utah.edu!fcom.cc.utah.edu!gateway.univel.com!gateway.novell.com!thisbe!terry
From: terry@thisbe.npd.Novell.COM (Terry Lambert)
Subject: Re: mysterious system hangups
Message-ID: <1992Aug5.172213.829@gateway.novell.com>
Sender: terry@thisbe (Terry Lambert)
Nntp-Posting-Host: thisbe.eng.sandy.novell.com
Organization: Novell NPD -- Sandy, UT
References:  <1992Aug4.175738.7008@Unibase.SK.CA>
Date: Wed, 5 Aug 1992 17:22:13 GMT
Lines: 60


Thanks for the succinct summary!

In article <1992Aug4.175738.7008@Unibase.SK.CA>, roe@Unibase.SK.CA (Roe Peterson) writes:
|> This is in relation to the many reports of mysterious system hangups
|> during heavy disk load.
|> 
|> Summary:
|> 	- regardless of disk controller, heavy disk access (ie:
|> 	  extract, kernel rebuild, libc.a rebuild) seems to
|> 	  cause system lockups at unpredictable intervals.
|> 	- sync() still seems to be running, as no file-system
|> 	  damage occurs after a reset.
|> 	- interrupts seem to work (keyboard LEDs still
|> 	  respond, character echo still works, telnet
|> 	  connects, but no login process appears).
|> 	- amount of memory and swap space appear to have nothing
|> 	  to do with the problem (despite previous theories
|> 	  posted by yours truly).
|> 	- occasionally, rather than a hangup, a "panic: kmem_malloc:
|> 	  kern_map too small" happens.
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This item is not related to the problem -- it is another problem, having to
do with the amount of memory being larger than can be vectored to, given the
number of kernel map entries.  From James da Silva:

	"I don't think this is related to the NFS patch.  Bill posted
	 about this problem last week.  A kernel table is sized too small for
	 active machines with large (12MB or more) memory.  Try modifying the
	 value of MAX_KMAPENT in /sys/vm/vm_map.h from 500 to 1000.  It worked
	 for me."


|> 	- running manual sync commands during such disk activity
|> 	  seems to make things work better.
|> 
|> I've finally had my system stay up long enough to compile a debug
|> version of the kernel, and a different thing happens:  I get a
|> "panic: remrq"
|> 
|> Is anyone out there familiar enough with BSD to tell me if this
|> problem could be related to the sync daemon?

This happens to me on install, so I don't think that the sync daemon is it.
It also still occurs at a low clock rate, suggesting that the problem is not
related to the lbolt value, or HZ.


Per my posting just previous to this one:  Anyone want to post a "dist.fs"
with O_WRITESYNC permanently wired on?  As the conversation bu Bugs Bunny goes:
"Batten down them hatches!"  "They're already battened down!"  "Well, batten
'em down again!  We'll teach those hatches!".


					Terry Lambert
					terry_lambert@gateway.novell.com
					terry@icarus.weber.edu
---
Disclaimer:  Any opinions in this posting are my own and not those of
my present or previous employers.