Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!munnari.OZ.AU!news.ecn.uoknor.edu!qns3.qns.com!imci4!newsfeed.internetmci.com!in1.uu.net!news.artisoft.com!usenet From: Terry Lambert <terry@lambert.org> Newsgroups: comp.os.linux.development.system,comp.unix.bsd.freebsd.misc Subject: Re: Ideal filesystem Date: 10 Apr 1996 07:43:09 GMT Organization: Artisoft, Inc. Lines: 243 Message-ID: <4kfoqd$dgs@coyote.Artisoft.COM> References: <4hptj4$cf4@cville-srv.wam.umd.edu> <4jerrj$f12@park.uvsc.edu> <jlemonDp1GFM.H4I@netcom.com> <4jpjb6$77c@park.uvsc.edu> <jlemonDpEw1v.4Ez@netcom.com> NNTP-Posting-Host: hecate.artisoft.com Xref: euryale.cc.adfa.oz.au comp.os.linux.development.system:21081 comp.unix.bsd.freebsd.misc:17066 jlemon@netcom.com (Jonathan Lemon) wrote: Finally, someone who understands the hash problem which is introduced by weenieing-out on looking for the executable fork! ] % chmod +attributed_directory_bits $home/bin/foo ] % chmod +executable_directory_bits $home/bin/foo ] % chmod +attributed_directory_bits $home/bin/ls ] % rehash ] % ls ] foo ls ] % rm foo/a.out ] % foo ] % foo: executable not found. ] ] Yes, this breaks 'drop-through' shell hashing, in that if I had ] /usr/X11/bin/foo, it would not get run. So what? I don't think ] it's that big of a deal, and could probably be implemented either way. It's a gratuitous change in behaviour. It is therefore, by definition, evil. It can be avoided. It should be avoided. THe problem exists because the a.out fork is seperable from the file foo. This is an artifact of the file foo being a directory and the conents of directories being seperable -- either\ intentionally, as in your example above, or unintentionally as a result of a system carsh, bad block, whatever. The association between "foo" and it's "a.out fork" is not handled atomically, and operations on the fork are not idempotent -- therefore one cannot make consistency guarantees about it. And all because we want to abuse the existing FS instead of solving the problem below the user/kernel boundry. ] >A real index would be searchable in O(log2(n)+1) compares for ] >a total of n entries, whereas a directory used as an index ] >varies by implementation, and is never better than that, and ] >is more frequently simply O(n/2+1). ] ] I wasn't claiming that it was an efficient index, only that it _was_ one. ] Besides, O(n/2) is for a pure sequential search - if you have a multilevel ] directory, (eg: /u/j/jl/jlemon) you certainly get better than O(n/2). I was only referring to the "terminal" component, which has been converted to a directory, in the case that the a.out is "implied" by an attribute bit -- an attribute bit not guaranteed to be consistent with the a.out's existance. ] >By restricting the allowable entry manipulations to not include ] >exposure in the FS namespace, the implementation can prevent it ] >from ever being possible to dissociated a fork from the node ] >that contains it. ] ] Uh. In an ideal world, maybe. But in an ideal world, lost+found would never ] be used, either. You don't need an ideal world to kill lost+found. You need to use LFS instead of UFS on your BSD system. Or you need to use VXFS on your SVR4/UnixWAre system. Or JFS on your AIX system. Or you need to integrate the code from Appendix A of the Soft Updates paper and use UFS (putting the block count consistency fix in the mount code itself) and delete fsck entirely. I have Implemented an attributed file system before for Novell/USG as part of the NetWare for UNIX 4.x product. Once you know what you need, it's not that hard... the hard part is making sure that the changes you introduce are minimal. All you need is parent inode and fork ID. You throw away forks when you throw away the las reference for the parent inode. You use a name space escape (preferrrably POSIX, if you are willing to modify the lookup code -- we didn't for licensing reasons), and you are in. Reassociation recovery is as simple as looking for "secondary inodes" without parent references and reconnecting them using the parent pointer reference in the secondary. The interesting part of this (now sidetracked into directories vs. forks) discussion is the ability to tie code into file system events -- looking at file systems as directed graphs of events in a hierarchy (one to which you could apply ordering, commutation, association, and conflict resoloution rules in order to apply a technology like soft udates generally instead of having to use FS specific code). ] In another article of yours, I think that you mentioned something like ] "copying an EA file from one filesystem to another - either everything gets ] copied, or nothing.". ] ] Please tell me what happens when we pull the power switch in the middle of ] the copy. Does fsck just delete everything because it is an incomplete ] copy? The target is partially complete. Obviously, this would require that the copy operated on a fork by fork basis. In reality, you want the copy to be an atomic kernel operation, where possible. Specifically, if I have two filesystems mounted from a remote machine, and I copy from one to the other, I want to send a message to the server to do the copy instead of pulling the bits over the wire and pushing them back. As a logically atomic operation, the operation would rool forward or back, depending on the complexity of available logging semantics in the underlying FS. ] If the disk block containing the "root" of the EA is damaged, does that mean ] I automatically lose _all_ of the files within that EA? You mean "all the EA's in the file". The answer is "yes". You might as well ask "if the inode for a file is damages so that I don't have any of my block pointers available, do I lose the data in the blocks that were pointed to by the block pointers that are no longer there". Effectively, you have damaged the file beyond algorithmic recoverability in both cases. On the other hand, if I lose a file to lost and found (say we are stupid, and don't implement either log structuring or soft updates or synchronous ordering of any kind), I do *not* get a file for each of the attributes for the lost file -- I get one file with all EA's, intact. ] >The failure mode in the directories is from the lack of a parent ] >pointer due to its exposure in the FS name space, specifically, ] >the way in which hard links are currently implemented. ] ] This sounds like you want to have a double-linked nodes (eg, the node knows ] about it's parent), which is currently not possible, as you pointed out, since ] with hard links, a file may have more than one "parent". Leaving this aside ] for the moment, wouldn't having double links increase the failure modes of ] the filesystem? No. It would only increase the recovery modes. Since using ordered I/O *guarantees* deterministic recoverability, the links will be consistently maintained. The hard links work because you dicorce the object pointed to by the directory reference from the file. That is, each directory entry points to one object, and each object points to one file in the flat numeric namespace. The directory entry target objects are maintained as a forward linked list (again, deterministic recoverability: only one link may be severed at a time, so the ring is always recoverable). Each "directory node" is a reference instance for the flat name space object, and the flat name space object has a pointer to one parent (the original, to start with, but maybe further along the ring if the original is deleted). Since it doesn't matter which ring element is pointed to by a file, the file back-link is always recoverable. Each directory node has a pointer to its parent node. Thus each hard link describes a single vector to the root of the file system from its location. An in core open file instance must reference the directory node, not the flat node, and the directory node in core references the in core flat node -- the traditional vnode. The flat node goes in the system open file table, while the directory instance goes in the per process open file table. Thus from any open fd, you can recover the path used to open it, assuming the path still exists. An unlinked file will be detectable as such. One could easily add a "create entry for "in core directory node" to cause the entire process context to be checkpointable. ] >A fork is not a hard link, and if it is to not be limited by the ] >size of an on disk inode structure, it must not be associated ] >the same way inodes are associated with names. ] > ] >To think of it another way, DOS, OS/2, NetWare, and NTFS can not ] >lose file names because they are not directory entries, they are ] >attributes of the file. UNIX can, because logically names are ] >not attributes. ] ] Um, so I blow away the FAT table in DOS. I have not lost the filename, ] I have lost the _file_, "because the filename is an attribute of the file". ] Thanks but no thanks. In the link architecture I'm suggesting, the file name is *not* an attribute of a file. It is a pointer to a directory node. But since each directory node has only one directory entry pointing to it, even in the hard link case, and each directory node has a pointer to its parent, one can recover the name information if one stored it in an attribute, to recover the file. So the file won't be lost, it will be replaced where it came from. This is important, because we can not afford to have a fork dissociated from the file that contains it (the main drawback of the file-as-directory). Obviously, if I destroy information, then information is destroyed. To complain about that is to complain about tautology. 8-). If you blow away the inode's block pointers in current UNIX file systems (ie: clri/fsdb), you are just as screwed as the DOS example. The point to my use of the DOS example was the issue of reverse mapping, and the *option* to place a recovery attribute on a file (it being impossible to seperate an attribute instance from a file instance, except logically) and actually recover the name information. In a decent implementation, you'd roll the transaction forward or back (depending on intent versus event logging) and be done with it... the name recovery is just an example application that would want the connection to be atomic and operations on it to be idempotent. It's not my *only* example. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.