*BSD News Article 61977

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.bhp.com.au!mel.dit.csiro.au!munnari.OZ.AU!news.ecn.uoknor.edu!qns3.qns.com!imci4!newsfeed.internetmci.com!news.msfc.nasa.gov!sol.ctr.columbia.edu!hamblin.math.byu.edu!park.uvsc.edu!usenet
From: Terry Lambert <terry@lambert.org>
Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system
Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux
Date: 22 Feb 1996 18:57:52 GMT
Organization: Utah Valley State College, Orem, Utah
Lines: 95
Message-ID: <4giebg$70o@park.uvsc.edu>
References: <4er9hp$5ng@orb.direct.ca> <4g33tp$esr@park.uvsc.edu> <4g57cj$gc3@pell.pell.chi.il.us> <4g5k95$28m@park.uvsc.edu> <4ggol0$38h@pell.pell.chi.il.us>
NNTP-Posting-Host: hecate.artisoft.com
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14221 comp.os.linux.development.system:17840

orc@pell.chi.il.us (Orc) wrote:
] >That's an easy one to answer: so if you screw up, you *only*
] >screw up the data that you are writing instead of screwing up
] >all the data on the file system (or at least some of the data
] >that was there before and totally uninvolved in the screwed
] >transaction).
] 
]     Now this is the point where I get interested. Could you describe how
] sync metadata doesn't screw up unrelated data vs async metadata?  (I'm
] really interested in your analysis of this, since it's better than the
] IS NOT! IS TOO! that this conversation has become.)

Consider:

1)	I have a file "foo" that has inode 175 and points to
	blocks 10, 20, 30, and 40

2)	I delete file foo.  Inode 175 is freed.

3)	I create file "fie", which allocates inode 176.  I
	allocate block 10.  I write sensitive information in
	block 10.

4)	The system crashes before the new block list for inode
	175 is written and before "fie" is committed to disk,
	but after block 10 is committed to disk.

Because we are using async I/O and thus the ordering does not
enforce the series of structural modifications to ensure that
the operations are idempotent.

The I/O subsystem is free to reorder async I/O's for efficiency,
said reordering potentially resulting in the scenario described.

Basically I have silently corrupted an [apparently] unrelated
file (from a user perspective, after the crash, the non-commit
of the data has "rolled back" the deletion of "foo").

Alternately, I could have recommitted inode 175 to a new file,
but not updated its block list.  For instance, if the async write
to create the directory entry occurred before the async write to
reset the inode, I could easily find myself with someone elses
file in my directory.


It's possible to come up with similar scenarious using directory
contents, etc..  The only constraint is the reuse constraint on
blocks inherent in the ordering of operations in the file system
(for instance, I could *not* allocate a block that was already
allocated).

I'm not suggesting that this is frequent, only that the probability
is non-zero.


] >It's a limitation on the amount of damage you can do.
] 
]     My initial guess would be that sync metadata would increase
] the window for corrupting data, because first you write the
] metadata, then you go back and write the data; unless you've
] got a structure where the metadata is at one end of the disk
] and the data is at the other, it seems like it would average
] 1.5 passes over the platter to write things out.  Contrarily,
] if metadata and data is mixed together, it will get dumped to
] disk in one pass, and even though it's more likely you'll end
] up with data written and the metadata not updated, it's all
] shovelled to disk faster.

Your initial guess must be wrong.  First of all, there is large
cylinder locality between inodes and the data they contain.  That's
what cylinder groups are for.

Second, there is nothing preventing the async ordering from
resulting in the exact same ordering as the I/O requests were
made.  This would result in exactly the same (smaller) set of
failure modes as the sync metadata case.

Third, a sync metadata request followed by one or more async data
requests can never result in the data being written before the
metadata, only after.

That is, it is only possible to write data blocks for a file to
the correct file in the sync case.  This is not true of the async
case (as shown above).


It's common among engineers to want "faster" to be "more correct";
the universe rarely complies.  8-).


                                        Terry Lambert
                                        terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.