*BSD News Article 92238

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!solace!nntp.se.dataphone.net!nntp.uio.no!newsfeed.nacamar.de!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!newsxfer3.itd.umich.edu!news1.best.com!nntp1.ba.best.com!not-for-mail
From: dillon@flea.best.net (Matt Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.unix.bsd.bsdi.misc,comp.sys.sgi.misc
Subject: Re: no such thing as a "general user community"
Date: 29 Mar 1997 01:58:46 -0800
Organization: BEST Internet Communications, Inc.
Lines: 98
Message-ID: <5hip4m$ss7@flea.best.net>
References: <331BB7DD.28EC@net5.net> <5hfl3n$a3t@fido.asd.sgi.com> <5hh5n2$9q8@flea.best.net> <5hhi67$1gl@fido.asd.sgi.com>
NNTP-Posting-Host: flea.best.net
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:37980 comp.unix.bsd.bsdi.misc:6493 comp.sys.sgi.misc:29496

:In article <5hhi67$1gl@fido.asd.sgi.com>,
:Ray Chen <rcc@tilt.engr.sgi.com> wrote:
:>Ok, I've stayed out of this so far because I've been busy
:>fixing bugs and making some of those performance improvements
:>people have been grumping about us not doing :-) but I've got
:>to jump in now.
:>
:>In article <5hh5n2$9q8@flea.best.net>,
:>Matt Dillon <dillon@flea.best.net> wrote:
:>>    I don't think a log based filesystem will be much of a win over FFS
:...
:>
:>Matt, if you can give me more data to make it easier for us to
:>reproduce the scenario's you've seen, I'll try and see that the
:>problems are fixed.  Just because most of our customers don't do
:>sustained paging for example, that doesn't mean IRIX shouldn't do
:>it well.

    I'll email you on this.

:>But I'm sorry.  The filesystem comments are just flat-out wrong.
:>
:>FFS will *always* be slower doing file creates than XFS or for that
:>matter, any good journalling filesystem.
:>
:>The fundamental problem with FFS is that to guarantee safety, when
:>you do file deletion/creation, the writes have to be ordered.  The
:>first set of updates have to hit the disk regardless of how you order
:>the directory update and inode deallocation before the second set hits.

    Actually, this isn't entirely true.  To guarentee safety on file
    create, the only thing you need to do is guarentee that the inode
    is pre-cleared (before the create).  You can then update the inode
    and directory entry asynchronously.  If a crash occurs, fsck
    will either find a directory entry pointing to a clear inode,
    or an inode without an associated directory entry.

    To guarentee safety on file delete, the inode must be cleared 
    synchronously but the directory entry (if it does not split itself
    across a sector boundry) can be updated asynchronously.  If a crash
    occurs, fsck will possibly find a directory entry to a cleared
    inode.

:>Otherwise, you can get nice anomalies like a file changing to a named
:>pipe because the inode happened to be a named pipe before it was deleted
:>and reallocated as a file.

    ... which is easy to fix, since you have to update the inode on delete
    anyway, you might as well clear it (or mark it as unallocated).

:>Enough about journalling vs. FFS.  People have talked about XFS's
:>main claims to fame.  I'd like to set the record straight.
:>
:>XFS's main claim to fame are the S-words:  speed, scalability, safety.
:>
:>Speed:  we're fast.  We hit >300 MB/sec the first day we shipped and
:>that number's been going up ever since.  As the I/O hardware gets
:>faster, so do we.  We've done >500 MB/sec for something like over
:...
:>Scalability:  we can work on big files and filesystems.  80 GB
:>filesystems are routine.  So are 12 GB files.  We work on large
:...
:>directories.  Put a million files into a directory.  The filesystem
:>still runs fast.
:>
:>Safety:  if your computer crashes for some reason, the 80 GB filesystem 
:>recovers in < 15 seconds and it's just fine.

    These are all good points.  I agree completely.

:>>    crash very often), especially if FFS is further adjusted to set the 
:>>    clean bit on mounted filesystems that have been synced up and are idle.
:>>
:>>						-Matt
:>
:>We have 24x7 customers running high-availability configurations who
:>would disagree with you about fsck.  They don't *ever* want to run
:>fsck on a 40 GB filesystem.  If they crash, they want to be back up
:>fast.  fsck is just too slow.

    This is a valid point to, though I would never personally design such
    a system myself... too much can go wrong with the complex hardware
    AND software that makes up such a configuration.  I might use the
    configuration, but it would be in a duel-redundant machine setup
    rather then a quick-reboot setup.  Either that or I would use a
    dedicated NFS box.  I just don't trust complex operating systems
    (UNIX is a relatively complex operating system) enough... the worst
    thing that can happen is that a kernel bug in some unrelated subsystem
    will corrupt filesystem data.

						-Matt

:>--
:>Raymond C. Chen, PhD                 rcc@sgi.com
:>Member of Technical Staff            Silicon Graphics, Inc. (SGI)
:>High-End Operating Systems           Generic Disclaimer:  I speak only for me.