Return to BSD News archive
Newsgroups: comp.os.386bsd.bugs Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!math.ohio-state.edu!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Nethack Message-ID: <1993Jun29.181749.5833@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University, Ogden, UT References: <20bfrm$le7@pdq.coe.montana.edu> <20cab6$b2d@binkley.cs.mcgill.ca> <C990xF.43n@sneaky.lonestar.org> Date: Tue, 29 Jun 93 18:17:49 GMT Lines: 78 In article <C990xF.43n@sneaky.lonestar.org> gordon@sneaky.lonestar.org (Gordon Burditt) writes: >> Whenever I tried to play it more than once or twice, it would >> die (on start up) complaining of some "init-prob error on 4 (215%)" >> or SOMETHING like that (my memory fails me). > >I get this also, sometimes. The error suggests that the sum of the >probabilities in some table don't add to 100%. In this case, since >the program worked once, it means the tables have been trashed. > >Some additional information: > >After a failure, on a quiet system, you get the same failure, over and >over. If you compare the installed executable vs. the one in the >build directory, they are identical. HOWEVER, if you copy the >executable in the build directory over the installed one, the problem >seems to go away. For a while. > >After a failure, doing something time-consuming and disk-intensive, >like building a kernel or grepping the news spool, the problem may >go away. > >Conclusion, totally without proof: > >Modified data is getting cached somewhere, probably in the VM system. >I suspect Nethack is modifying read-only storage somehow and the >modified version is getting re-used. But I haven't found the place >where it's doing it. This shouldn't be true. The data cache for a particular inode is in a list off the vnode for that inode. The in core memory structure for the inode is a copy of the inode off the disk as a substructure of the in core vnode -- there is no raw in core inode. When the reference count on the vnode goes to 0, it is placed on the free list. Writeback of modified data takes place at this time. Since the text pages are marked read only, obviously writes are occurring that are not trapped. This is not an unreasonable assumption, given that writes are not trapped through a normal mechanism on 386 processers -- an exception is not generated in protected mode, only in unprotected mode. Data pages getting written to (*this* is the problem that gives the symptoms you are seeing!) and not being copied on write to swap is the problem you are seeing. This can result from one of four situations: 1) The data pages are being written back to the file. 2) You are running multiple copies simultaneously so that data pages are being written only in core, but the core copy is shared. 3) Some global data is assumed to be aggregate initialized (most likely to 0) by the compiler, yet this is not occuring (ie: a compiler bug). 4) Some stack variables are being used before they are initialized, and you are getting the same pages for your stack on consecutive runs. The way this is handled in most protected mode OS's on the brain-damaged Intel architecture is to ensure that copy-on-write data pages are marked read only, and that copy-on-write is actually handled during the trap... this implies a reverse (or indexed) lookup to determine if the trap is occuriing on a real read-only page or a copy-on-write page marked read only to generate the trap. A piece of this soloution was in the first patchkit as part of the write fault fix. This is part of the general soloution to the overall problem of issues involved in using your real program image as a swap store instead of using real swap. Implementations, anyone? Terry Lambert terry@icarus.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.