Return to BSD News archive
Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.cis.okstate.edu!news.ksu.ksu.edu!news.mid.net!newsfeeder.gi.net!newsfeed.internetmci.com!EU.net!sun4nl!cs.vu.nl!philip From: philip@cs.vu.nl (Philip Homburg) Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux Nntp-Posting-Host: centaur.cs.vu.nl References: <4gejrb$ogj@floyd.sw.oz.au> <4hirl9$nr7@gizmo.nas.nasa.gov> <Dnu8FD.CK2@pe1chl.ampr.org> <4iajie$9fn@gizmo.nas.nasa.gov> Sender: news@cs.vu.nl Organization: Fac. Wiskunde & Informatica, VU, Amsterdam Date: Fri, 15 Mar 1996 20:09:41 GMT Message-ID: <DoBs05.B19.0.-s@cs.vu.nl> Lines: 70 Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:16035 comp.os.linux.development.system:20021 In article <4iajie$9fn@gizmo.nas.nasa.gov>, Dave Truesdell <truesdel@gizmo.nas.nasa.gov> wrote: %First case: Restoring a large filesystem on a large machine. %Here's an example of one of those 8 hour restores I mentioned. The setup, a %500GB disk array, mounted async; 1GB memory (>500GB was allocated to the %buffer cache); ~1.5 million i-nodes to restore; running the restore in single %user (no update daemon running). If the restore had been running for several %hours, and a hardware glitch crashed the machine, what state do you think the %filesystem would be in? In this situation, data blocks, which are only written %on once, would age quickly and get flushed to disk as new data came in. How %about indirect blocks? They can be updated multiple times a files grow, so %they don't age quite as fast. Directory blocks? They can get written %multiple times, as new files and directories are created, so they don't age %quite so fast, either, so they're less likely to get flushed to disk. The same %is true for inode blocks, too. So, what situation are you left with? Unless %all the metadata gets written to disk, you may have most of your data safely %on disk, but if the metadata hasn't been flushed, you may not know what %i-nodes have been allocated; which data blocks have been allocated; which %data blocks belong to which i-nodes, etc. OK, 8 hours for 500GB and ~1.5 million i-nodes. Filesystem throughput: 500GB / (8*3600 seconds) = 17MB per second (quite impressive). Avarage file size: 500GB / 1.5 million i-nodes = 333KB per file. Files per second: (8*3600 seconds) / 1.5 million i-nodes = 52 files /second. At these speeds I don't see why you expect blocks to be cached for a long time. Furthermore, a filesystem that implements asynch metadata updates can still provide a synchronous sync(2) system call. Even an asynchronous sync system call which only writes all data to disk would be sufficient in this case. %BTW, Just to see what would happen, I tried to run an fsck on the partial %filesystem. After what seemed like several hundred screens of just about %every error that fsck could detect, it finally dumped core. That only tells something about the quality of the fsck implementation... %Here's a thought experiment. Let's take a small filesystem, with only one %non-zero length file in it. Call it file "A". Delete file "A" and create a %second non-zero length file named "B". Now, crash the system, without %completely syncing. When you go back and examine that filesystem, what will %you find? Will you find file "A" still in existence and intact? Will you %find file "B" in existence and intact? What would you find if one of "A"'s %blocks had been reused by "B"? If the integrity of the metadata is not %maintained, you could find file "A" with a chunk of "B"'s data in it. The %situation gets worse if the reused block is an indirect block. How would the %system interpret data the overwrote an indirect block? This does not `destroy' your filesystem: fsck will (should) duplicate all blocks shared by multiple files. %How many of those systems didn't attempt to maintain consistent metadata? %I've run V6 on a PDP-11/34 in a half meg of RAM, using a pair of RK05's for %a whopping 10MB for the filesystem. I've written trillion byte files as part %of testing new modifications to the filesystem code. I've tested filesystems %that claimed to be great improvements over the FFS, that I've been able to %trash (the filesystem could *NOT* be repaired) simply by writing two large %files simultaneously. I've seen many people who think they've invented a %"better" filesystem and how often they've been wrong. How do you define `could *NOT* be repaired'? How do you destroy old, untouched files by creating new files or by deleting files? Philip Homburg