Return to BSD News archive
Newsgroups: comp.os.386bsd.development Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Compressing file system ? Message-ID: <1993Aug9.233341.20631@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University, Ogden, UT References: <EgN6R_K00WB5JOZpRc@andrew.cmu.edu> Date: Mon, 9 Aug 93 23:33:41 GMT Lines: 98 In article <EgN6R_K00WB5JOZpRc@andrew.cmu.edu> "Alex R.N. Wetmore" <aw2t+@andrew.cmu.edu> writes: [ ... compressed FS ... ] >LKM's would seem to be the way to do this. On the other hand, it seems >the having shared libraries would give a large enough increase in disk >space, without all of the headaches of compression (although shared libs >have enough headaches themselves). Yes and no (OK, I jumped in 8-)). LKM's provide the ability to load junk into the kernel; there is still a piece missing, and that's the ability to attribute the file and to do the compression on block boundries (as has been suggested). File attribution is the most serious issue, and almost requires a Rosenthal or Heidemann to the idea of vnode stacking to handle. You *could* load layers as kernel modules *after* that condition was met, however. The use of an NFS file system for compressing/decompressing executables is broken in several ways. The most important (and most obvious) is the one which indicates the current handling of ETXTBUSY for using the image as the program swap store is broken, broken, broken. Specifically, there is no way on an NFS server to tell if the remote machine opened the executable to copy it or to write it or to execute it. The vnode getting the flagging is an alias generated by NFS and referring to the local buffer cache pages in any case. If I am decompressing a file from a block buffer and then copying the block buffer out "over the wire" (which may be a local loopback mount), I have the problem of maintaining attribution state between the client and the server based on the files intended use on the "remote machine". An NFS implementation would work, then but it would work poorly. It should be obvious that for a consistant external view, the decompression must occur on the NFS server, but for a consistent flagging for modification, the decompression must take place on the client. A trade-off must be made, and it will be expensive. An existing implementation allows reference-delay flagging of executables, such that the decompression occurs on reference, and the recompression occurs over an amount of idle time in which no file blocks are requested or written. This is clearly broken for many NFS uses (the most glaring being a remote swap partition, the second most glaring being as backing store for an executable, where the page requests to the server are timed out because of the frequencey of client cache hits, but the decompress/ compress cycles occur while the file is in use because not all pages are in the client cache, or can be flushed from the client cache by large non-hit activities (like an ls -R of some partition)). A partial fix that any sane person would stuff in this code immediately is to disallow the recompress while the file had an open reference instance. I have also seen a suggestion that the compression be on block boundries; this would be acceptable, if a different block handling mechanism were to be used, such that the cache blocks were all 8k or so and the on disk blocks were divided into low penalty fragments; with the exception of the last block in the file, the on-disk blocks would all be considered to contain 8k worth of data, and the in-core buffers would be filled from smaller than expected on-disk blocks through a decompressing fbread and a compressing fbwrite that copied to a seperate block from the vmcache; this would imply no unification was possible, or if unification was to be done, there must be a change in the vnode flags to include whether or not the file is compressed. This is in line with Peter's "chmod +c" suggestion, and with the treament of cache buffers as compressed or uncompressed based on which side of the block I/O interface they were on. I arbitrarily picked 8k to keep the minimum fragment size at 1k after it had been halved. The main problem with this approach is that the fragmentation and allocation policies on the FS must be drastically changed; for instance, since frags are used for all file contents, cylinder grouping must be thrown out the window. In addition, fragments must no longer be considered second class citizens, and this would certainly interfere with their ability to be shared between files. This would also get around the mmap problem, as the 8k buffers hung off the vnode as "buffer cache" would be in uncompressed format and could be considered atomic for the purposes of mapping. Perhaps the best implementation of a compressed file scheme I have seen was the one in the UCLA Ficus implementation in BSD4.4; it has many of the attributes I have tagged as "desirable" (noting that the aesthetics are dictated by teeny, tiny disks rather than clean kernel implementation or fast access). I'd encourage anyone who wants to get into building file systems to do so; there are very few live projects of this type out there, and I can count the number of commercial file system vendors (that's all they do) on one hand. I'd also point you to ftp.ucla.com (where you can get the Heidemann papers) and the various proceedings of Usenix for the past 4 years, where there have been many papers on this and other file system topics. Happy hacking! Terry Lambert terry@icarus.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.