Return to BSD News archive
Newsgroups: comp.os.386bsd.development Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: File Truncation Philosophy Message-ID: <1993Apr13.203234.16408@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1993Apr8.025858.22137@uvm.edu> <1993Apr11.035322.19610@fcom.cc.utah.edu> <C5FJx6.o5w@ns1.nodak.edu> Date: Tue, 13 Apr 93 20:32:34 GMT Lines: 111 In article <C5FJx6.o5w@ns1.nodak.edu> tinguely@plains.NoDak.edu (Mark Tinguely) writes: >the dumb approach. > Once the file starts executing, fail writes to the file. Though this is > extremely simple, it is dumb because once opened successfully, writes > should not fail. Also Terry points out EBUSY is not POSIX compliant. EBUSY *is* in Posix ..it's ETXTBSY that's missing; but the return value of EBUSY would be incorrectly overloaded (as if it were a lock) on return from open(). There is also the issue of an EBUSY resulting from a read/write/ other operation on a file --this seems (from my reading) to be an illegal return. Definitely not acceptable. >better (than the dumb) approach. > When a program wants to execute an already open file (again as Terry > said preferably a writable open file) or open a executable file, copy > the file as a temporary in the filesystem. By adding a new vnode > reference to the the vnode structure, we can allow other programs that > also start executing the now open file, to use this copy of the program > (so we do not fill the filesystem with these temporaries). > > We have two choices of the life time of the temporary (assume the > original write lasts longer than the running of the program), we can > keep the lifetime of the temporary until we close the write. In this > way we keep the file closer to the original and cut down in the copying > overhead (in a sense you could think of this like the old days when > programs were copied to swap and used the sticky bit). On the other > hand we could make the temporary disappear with the last use and if it > gets executed several time, a new copy is made (just like normal > execute). Obviously if the temporary executes longer than the write, > the temporary will stay around until the program finishes. > > I think appropriate approach is closing the file after all the copies of > executing programs have ended and creating a new one if needed. > > When this thread was started, I was thinking we would have to implement > this temporary file approach. Do we lose anything by this temporary in > the filesystem versus in the VM (swap/memory)? Uh, um, erg... >*ahem*<... uh... Well, there seems to be two issues to consider. One is the dnlc (directory name lookup cache) and the other is subsequent opens. The dnlc (which isn't called the dnlc, and seems to be less general purpose than you'd want in the 386BSD implementation) caches file names shorter than some watermark length and the vnode pointer for the directory that the file name lives in, and the vnode for the open instance for that file. What this boils down to is that opens and/or lookups don't necessarily fetch an inode reference to do the open if there is a cache hit. Since the exec doesn't have its own lookup hooks at the VFS interface, there is no way to distinguish what kind of open is occurring (an open for exec, which should return the "temporary" inode, or an open (which should return the "real" inode). I don't see a way the distinction could be made at the VFS layer between two in core vnodes pointing to the same file... the lookup will pick one or the other (the first one in cache) to return. This leaves us with two possible approaches to implement the temporary inode mechsnism, both of them unpleasent. The first is to coerce cache flushes of the UFS dnlc usages in the case of a "shadowed" inode, and the second is to push the knowledge of the type of lookup/open below the VFS interface (by adding an extension to it). Both of these require changes to each file system type instance below the VFS interface, and the second requires changes to the NFS protocol to support an additional op. The interface I envisioned would involve a copy-on-write-to-file of the text pages to swap, and a remarking of the process pages as page from swap instead of page-from-file. This is expensive because there is no back reference from a vnode pointer to the processes which are executing the image (although we can either provide a hash-link or a cache of the in-core structure to allow this, with an additional flag). The main issue to address here is copy-on-write text pages once they are in the swap, since by default is to assume that if I have a page in swap, it got there because I already modified it (and thus I am free to modify it again)... a hellacious security hole in a naieve implementation. New processes executed from a modified file would use the modified file as a swap store until they terminate, write a text page, or the file was modified again (making it unusable as a swap store). I think the gymnastics can be hidden in the kernel interface presented the VFS by providing a modification advisory call that a file system makes before opening a file for write or truncating (calling it with the vnode). Trying to copy the original image pre-modification seems to be a mistake (this was the suggestion I made in my last post) because it would by definition *prevent* direct unification of the VM and buffer caches (a suggection I also made in the last post as a speedup). A non-anonymous unification, where control of a page is explictly haded off between the two would still be a possible soloution, but it's more complicated than it has to be... besides, it makes more sense to run the code in the file instead of the code that used to be in the file (the only exception is the case of a fork of the original process -- it would keep the swap image of the original file as shared text). I think I (and others) need to hit the 1003.1 book and see what can be slipped in through the cracks in the standard to arrive at the best approach... this assumes Posix compliance is a goal: it's one of mine, but potentially not one of Bill and Lynne's... any comments? Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------