Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.ysu.edu!usenet.ins.cwru.edu!agate!howland.reston.ans.net!newsfeed.internetmci.com!inet-nntp-gw-1.us.oracle.com!news.caldera.com!news.cc.utah.edu!park.uvsc.edu!usenet From: Terry Lambert <terry@lambert.org> Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux Date: 12 Feb 1996 00:41:05 GMT Organization: Utah Valley State College, Orem, Utah Lines: 117 Message-ID: <4fm2b1$ivs@park.uvsc.edu> References: <4er9hp$5ng@orb.direct.ca> <strenDM7Gr4.Cn2@netcom.com> <DMD8rr.oIB@isil.lloke.dna.fi> <4f9skh$2og@dyson.iquest.net> <4fg8fe$j9i@pell.pell.chi.il.us> NNTP-Posting-Host: hecate.artisoft.com Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14119 comp.os.linux.development.system:17740 orc@pell.chi.il.us (Orc) wrote: ] Do you have any concrete evidence to back up this assertion? How about proof by induction? If I have N metadata writes outstanding, then in case of a crash, I must resolve the inconsistency cause by N(N-1) potentially "correct" states for the outstanding metadata (we can assume one write per state if we assume idempotent operations occur atomically). Ext2fs allows N pending metadata writes. UFS allows 1 pending metadata write. For Ext2fs, the number remains N(N-1). For UFS, the number is 1(1-1), or 0 (which is to say, the recovery process is deterministic). ] No, this isn't a Linux vs FreeBSD debate, though it's certainly ] one of the things that makes FreeBSD less attractive for my news ] machine; I keep people one one side stating that writing metadata ] out of order is safer than treating metadata like anything else, You mean "in order". Synchronous metadata writes ensure that the writes are "in order" with respect to other metadata, which is to say the the FS structure may be deterministically recovered in case of a failure to write metadata. Synchronous writes are actually unnecessary, as long as a delayed ordered write mechanism is employed to ensure idempotence. They are just the easiest way to implement ordering guarantees. It is the ordering guarantees that are important, not the synchronicity or non-synchronicity of the underlying mechanism for making those guarantees. UFS in Solaris and in SVR4 ES/MP (UnixWare 2.x) uses a delayed ordered write mechanism as part of the file system multithreading for support of Symmetric MultiProcessing. Use of synchronicity to provide ordering guarantees precludes reeentrancy for metadata operations. Other facilities, such as journalling and logging, *also* provide ordering guarantees. The best paper I have seen on this so far is Gregory R. Ganger and Yale N. Pratt's paper "Metadata Update Performance in File Systems", where they propose a mechanism they term "soft updates". A related paper, Eric H. Herrin II and Raphael A. Finkel's "The Viva File System" goes into some detail on what constitutes an idempotent vs. a non-idempotent operation, and where you must guarantee order atomicity -- as does the UCB "SPRITE" paper. ] and I've seen people on the other side mentioning that writing the ] metadata, then, at some distant time in the future, coming back and ] putting the data down opens a wonderful window of opportunity for ] squeaky-clean-looking but completely garbaged files. Yes and no. Yes in the case of a recovery, since with N (N>1), there are O((N-1)^2) potential "consistent" states to which the file system may be restored by the post-event recovery process. No in that case that async I/O on non-metadata data will potentially cause it to be corrupt anyway -- just not in such a way as to cause the file system to be inconsistent, and therefore unrunnable. UFS is concerned that the recovered state match the intended state prior to the crash down to the granularity of a single operation. It is also concerned with strict implementation of POSIX update semantic guarantees. Ext2fs is concerned that the recovered state match the intended state prior to the crash down to the granularity of the number of operations potentially outstanding at the far end if the sync frequency window. ] And my experience running news on filesystems without ] synchronous metadata writes certainly hasn't shown any ] vulnerability, even when I've been running beta software like ] a software disk array that showed the distressing tendency to ] lock up and die when being driven hard. (Okay, so it's possible ] that every time it died it caused filesystem problems only on ] the articles which I didn't read, but it certainly never ] corrupted the directory structure; that's only happened when ] I've foolishly dropped too many power eaters into the machine ] and had the disks starve in the middle of a metadata write.) Most likely you haven't hit the window. The disk syncing window on ext2fs is smaller that the UFS window (ie: it is synced more frequently in an attempt to foreshorten the window). This reduces the probability in direct proportion to the MTBF of your power supply or other event that may cause a spontaneous reboot (or require a user-directed reset without a normal shutdown). This does not mean that the window is not there. As far as successful recovery following a soft failure: all file system recovery tools will, when run, result in a consistent file system structure. The question is what is the probability of arriving at the "correct" consistent state given a large number of "potential" consistent states resulting from the permutations of predicted outcome for all potential outstanding metadata operations at the time of the crash. Regards, Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.