Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!newshost.telstra.net!act.news.telstra.net!psgrain!newsfeed.internetmci.com!in2.uu.net!news.reference.com!cnn.nas.nasa.gov!gizmo.nas.nasa.gov!not-for-mail From: truesdel@gizmo.nas.nasa.gov (Dave Truesdell) Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux Date: 14 Mar 1996 18:10:22 -0800 Organization: A InterNetNews test installation Lines: 96 Message-ID: <4iajie$9fn@gizmo.nas.nasa.gov> References: <4gejrb$ogj@floyd.sw.oz.au> <4gilab$97u@park.uvsc.edu> <4giqu8$aqk@park.uvsc.edu> <4gira2$a9d@park.uvsc.edu> <hpa.31321eee.I.use.Linux@freya.yggdrasil.com> <4h7t5i$qoh@park.uvsc.edu> <DnoqB4.2sy@pe1chl.ampr.org> <4hirl9$nr7@gizmo.nas.nasa.gov> <Dnu8FD.CK2@pe1chl.ampr.org> NNTP-Posting-Host: gizmo.nas.nasa.gov X-Newsreader: NN version 6.5.0 #61 (NOV) Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:15565 comp.os.linux.development.system:19516 rob@pe1chl.ampr.org (Rob Janssen) writes: >In <4hirl9$nr7@gizmo.nas.nasa.gov> truesdel@gizmo.nas.nasa.gov (Dave Truesdell) writes: >>The point you seem to want to ignore is, while data integrity is not >>guaranteed, it only affects those files being written at the time of a >>crash. If you don't guarantee metadata integrity, you could loose *every* >>file on an active filesystem. >Please show us how that can happen, and how sync metadata is going to >avoid it. I think you are only spreading FUD. >(or is there some inherent fragility in FFS that is not in the classic >UNIX filesystems and ext2fs?) The "classic" UNIX filesystem? As opposed to those in V6 or V7? What makes you think the "classic" UNIX filesystem didn't have the same need for metadata integrity? The only difference between the "classic" days and today, is that the systems today tend to be much larger and stress the systems design and implementation to a greater extent. And *that* tends to exacerbate any weaknesses in either. How would sync metadata avoid these problems? First, as has been pointed out by others in this thread, what avoids these problems is metadata integrity. Sync metadata update is just one method of maintaining this. Other mechanisms, such as *ordered* async updates would do as well. Now, I haven't read the code for the ext2fs, so as far as I know it could maintain metadata integrity by ordering asynchronous writes. Now, what does metadata integrity mean for your filesystem? I'll give two examples, one from experience, the second a simple thought experiment. First case: Restoring a large filesystem on a large machine. Here's an example of one of those 8 hour restores I mentioned. The setup, a 500GB disk array, mounted async; 1GB memory (>500GB was allocated to the buffer cache); ~1.5 million i-nodes to restore; running the restore in single user (no update daemon running). If the restore had been running for several hours, and a hardware glitch crashed the machine, what state do you think the filesystem would be in? In this situation, data blocks, which are only written on once, would age quickly and get flushed to disk as new data came in. How about indirect blocks? They can be updated multiple times a files grow, so they don't age quite as fast. Directory blocks? They can get written multiple times, as new files and directories are created, so they don't age quite so fast, either, so they're less likely to get flushed to disk. The same is true for inode blocks, too. So, what situation are you left with? Unless all the metadata gets written to disk, you may have most of your data safely on disk, but if the metadata hasn't been flushed, you may not know what i-nodes have been allocated; which data blocks have been allocated; which data blocks belong to which i-nodes, etc. How would maintaining metadata integrity have changed things? Just like above, most of the data would have been flushed to disk, so no great difference there. What would be different, is that the file structure itself would have been maintained in a sensible state, on disk, instead of a random the patchwork of inconsistent information. While the average system running *BSD or Linux is several orders of magnitude smaller, the situation is different only in degree, not in kind. The large buffer cache, and the lack of a running update daemon, didn't create the problem, it only exaggerated the problem by allowing a larger number of inconsistencies to accumulate. Smaller caches and periodic sync's only narrows the window of vulnerability, it doesn't close it. BTW, Just to see what would happen, I tried to run an fsck on the partial filesystem. After what seemed like several hundred screens of just about every error that fsck could detect, it finally dumped core. Here's a thought experiment. Let's take a small filesystem, with only one non-zero length file in it. Call it file "A". Delete file "A" and create a second non-zero length file named "B". Now, crash the system, without completely syncing. When you go back and examine that filesystem, what will you find? Will you find file "A" still in existence and intact? Will you find file "B" in existence and intact? What would you find if one of "A"'s blocks had been reused by "B"? If the integrity of the metadata is not maintained, you could find file "A" with a chunk of "B"'s data in it. The situation gets worse if the reused block is an indirect block. How would the system interpret data the overwrote an indirect block? >>If you ever had to manage systems where a restore takes 8 hours to run, even >>when mounted async, you might care more about having a filesystem that >>maintained metadata integrity. >I have used and maintained UNIX systems for well over 12 years, I have >had to come back in over the weekend to move 80MB filesystems or to wait hours >just to load the base system, I have seen many interesting things happen >after system crashes, but I *never* have seen a system or even heard of >a system that lost all its files after a simple crash. How many of those systems didn't attempt to maintain consistent metadata? I've run V6 on a PDP-11/34 in a half meg of RAM, using a pair of RK05's for a whopping 10MB for the filesystem. I've written trillion byte files as part of testing new modifications to the filesystem code. I've tested filesystems that claimed to be great improvements over the FFS, that I've been able to trash (the filesystem could *NOT* be repaired) simply by writing two large files simultaneously. I've seen many people who think they've invented a "better" filesystem and how often they've been wrong. -- T.T.F.N., Dave Truesdell truesdel@nas.nasa.gov/postmaster@nas.nasa.gov Wombat Wrestler/Software Packrat/Baby Wrangler/Newsmaster/Postmaster