*BSD News Article 64325

Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.cis.okstate.edu!news.ksu.ksu.edu!news.mid.net!newsfeeder.gi.net!newsfeed.internetmci.com!EU.net!sun4nl!cs.vu.nl!philip
From: philip@cs.vu.nl (Philip Homburg)
Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux
Nntp-Posting-Host: centaur.cs.vu.nl
References: <4gejrb$ogj@floyd.sw.oz.au> <4hirl9$nr7@gizmo.nas.nasa.gov> <Dnu8FD.CK2@pe1chl.ampr.org> <4iajie$9fn@gizmo.nas.nasa.gov>
Sender: news@cs.vu.nl
Organization: Fac. Wiskunde & Informatica, VU, Amsterdam
Date: Fri, 15 Mar 1996 20:09:41 GMT
Message-ID: <DoBs05.B19.0.-s@cs.vu.nl>
Lines: 70
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:16035 comp.os.linux.development.system:20021

In article <4iajie$9fn@gizmo.nas.nasa.gov>,
Dave Truesdell <truesdel@gizmo.nas.nasa.gov> wrote:

%First case:  Restoring a large filesystem on a large machine.
%Here's an example of one of those 8 hour restores I mentioned.  The setup, a
%500GB disk array, mounted async; 1GB memory (>500GB was allocated to the
%buffer cache); ~1.5 million i-nodes to restore; running the restore in single
%user (no update daemon running).  If the restore had been running for several
%hours, and a hardware glitch crashed the machine, what state do you think the
%filesystem would be in?  In this situation, data blocks, which are only written
%on once, would age quickly and get flushed to disk as new data came in.  How
%about indirect blocks?  They can be updated multiple times a files grow, so
%they don't age quite as fast.  Directory blocks?  They can get written
%multiple times, as new files and directories are created, so they don't age
%quite so fast, either, so they're less likely to get flushed to disk.  The same
%is true for inode blocks, too.  So, what situation are you left with?  Unless
%all the metadata gets written to disk, you may have most of your data safely
%on disk, but if the metadata hasn't been flushed, you may not know what
%i-nodes have been allocated; which data blocks have been allocated; which
%data blocks belong to which i-nodes, etc.

OK, 8 hours for 500GB and ~1.5 million i-nodes.

Filesystem throughput: 500GB / (8*3600 seconds) = 17MB per second
	(quite impressive).
Avarage file size: 500GB / 1.5 million i-nodes = 333KB per file.
Files per second: (8*3600 seconds) / 1.5 million i-nodes = 52 files /second.

At these speeds I don't see why you expect blocks to be cached for a long
time. Furthermore, a filesystem that implements asynch metadata updates can
still provide a synchronous sync(2) system call.
Even an asynchronous sync system call which only writes all data to disk
would be sufficient in this case.

%BTW, Just to see what would happen, I tried to run an fsck on the partial
%filesystem.  After what seemed like several hundred screens of just about
%every error that fsck could detect, it finally dumped core.

That only tells something about the quality of the fsck implementation...

%Here's a thought experiment.  Let's take a small filesystem, with only one
%non-zero length file in it.  Call it file "A".  Delete file "A" and create a
%second non-zero length file named "B".  Now, crash the system, without
%completely syncing.  When you go back and examine that filesystem, what will
%you find?  Will you find file "A" still in existence and intact?  Will you
%find file "B" in existence and intact?  What would you find if one of "A"'s
%blocks had been reused by "B"?  If the integrity of the metadata is not
%maintained, you could find file "A" with a chunk of "B"'s data in it.  The
%situation gets worse if the reused block is an indirect block.  How would the
%system interpret data the overwrote an indirect block?

This does not `destroy' your filesystem: fsck will (should) duplicate
all blocks shared by multiple files.

%How many of those systems didn't attempt to maintain consistent metadata?
%I've run V6 on a PDP-11/34 in a half meg of RAM, using a pair of RK05's for
%a whopping 10MB for the filesystem.  I've written trillion byte files as part
%of testing new modifications to the filesystem code.  I've tested filesystems
%that claimed to be great improvements over the FFS, that I've been able to
%trash (the filesystem could *NOT* be repaired) simply by writing two large
%files simultaneously.  I've seen many people who think they've invented a
%"better" filesystem and how often they've been wrong.

How do you define `could *NOT* be repaired'? How do you destroy old,
untouched files by creating new files or by deleting files?




					Philip Homburg