*BSD News Article 17906

Newsgroups: comp.os.386bsd.bugs
Path: sserve!newshost.anu.edu.au!munnari.oz.au!uunet!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!utah-morgan!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: Nethack
Message-ID: <1993Jul3.055522.4000@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University, Ogden, UT
References: <20cab6$b2d@binkley.cs.mcgill.ca> <C990xF.43n@sneaky.lonestar.org> <C9J9H8.Ltu@sneaky.lonestar.org>
Date: Sat, 3 Jul 93 05:55:22 GMT
Lines: 124

In article <C9J9H8.Ltu@sneaky.lonestar.org> gordon@sneaky.lonestar.org (Gordon Burditt) writes:
>Ok, I managed to duplicate the "nethack problem" in a much simpler program.
>System:  386bsd, patchkit 0.2.3.

[ ... ]

>Now, run gub twice.  The second time, you get password file data for the
>before AND after printouts.  The modified page is hanging around 
>somewhere in the system between executions.
>No, this highly useful program :-( is NOT being used by more than one
>user at a time, and it doesn't fork(), and the sticky bit isn't set.
>% cmp bug gub
>The changes DO NOT get written back to the executable.  Maybe
>some other bug does this, but this one doesn't.
>% cp bug gub
>Run gub.  You get the correct BEFORE printout this time.
>Somehow the data hanging around in the system got flushed or fixed.
>Run gub.  The problem is back again.
>% cmp bug gub
>Changes still not written back to executable ...
>
>The bar and baz variables are needed to make sure the page foo is on
>is not written to by ordinary store instructions as well as read().
>
>
>Now, the question I have is, with this bug in the system, why does
>it stay up for more than 10 minutes?  Why can I run the compiler
>without it crashing?  
>
>Is there a 486-specific fix for this (set the WP bit in the cr0 register?  
>anything else needed or is that alone enough?)

This would probably be enough if the process creation code didn't depend
on it being unenforced during create.

Given exactly the behaviour you have described (note: it is *still*
possible for a write-back to occur to the swap store file given the
particular conditions in place during a system shutdown), the problem is
obvious.  The question is which of the three things that aren't being
done are the root cause:

1)	Data pages for the process are being written, but aren't being
	marked dirty like they should be.

2)	When the reference count drops to 0 on an in core vnode, the
	FS cache buffers associated with it aren't marked invalid and
	returned to the pool.  This is a result of the way vnodes are
	shared between all processes (and is incidently a pain in the
	ass, since it limits the maximum size of on disk inode data
	to 188 bytes, a nice binary number -- NOT!).  The only fix would
	be to make the vnode a field in the inode instead of the other
	way around and set up the hash function to have a list of per
	file system inodes containing the vnodes that get passed around.
	The vnode code itself sucks out (in vgoneall() in vfs_subr.c,
	the stupid thing is removed from the hash chain without the
	inode hash being cleaned out until later -- vgoneall is called
	on the swap store vnode in exit() which is called from rexit()
	which is called from the exit system call).

3)	When the hash list is checked on open in iget() in ufs_inode.c,
	if the thing isn't found, the initialization doesn't make sure
	that all associated cache buffers are freed (this could be argued
	to be a problem in getnewvnode(), since the problem is probably
	common to all file systems).  Since you haven't made any other
	references to files, you get the same vnode as before and ...
	the cache chain serendipitously points to the pages at the
	addresses you expect.  You can probably amuse yourself for hours
	by writing two programs and watching them contaminate each other.

As to why the compiler doesn't have this behaviour, it's probably because
it does all of the following:

1)	Explicitly calls exit(); the "exit by running off the end of the
	program" cleanup seems to have a bug as well.

2)	It opens a bunch of vnodes after the one it's using (you do this
	too, but you put them back on the freelist in the same order so
	that when your program exits, it's vnode is always the first one).

3)	It jumps around enough in it's memory that the modified data is
	forced to swap (otherwise you'd run out of memory fast) instead of
	just hanging around off the vnode.


So:

1)	Put used vnodes on the end of the haslist instead of the start
	(this is a false fix, since the vnode you are using may be the
	last free one -- oh c'mon!  it *could* happen).

2)	Change the vnode allocation/deallocation interface  (this needs to
	be done anyway to allow real file systems to be written).

3)	Fix the copyin/copyout programs to write dirty and check and fake
	page faults (there's a fix for this, but it generates false faults).

4)	Set the flag on the 486 and fix the process start code.

5)	Fix the process exit code so the non-exit() call process exit will
	work correctly.

6)	Fix getnewvnode() so that it invalidates the buffers (if not actually
	deallocating them).

7)	Fix the process swapping so that it comes from swap rather than the
	program image (this will also get rid of the bizarre crashes that
	can occur when you are out of memory, allow real system dumps without
	needing more memory than swap has, and speed up swapping at the cost
	of slowing down startup slightly, since copy to swap can be done on
	an as-needed basis and the current trade off is "optimizing the boot
	code at the expense of the program").



Suffice it to say, it's a rather involved set of problems, not just a single
problem, and there are people working on it, but no, there's not a fix yet
(but actually explicitly calling exit() might help for right now.


					Terry Lambert
					terry@icarus.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.