Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.bhp.com.au!mel.dit.csiro.au!munnari.OZ.AU!news.ecn.uoknor.edu!qns3.qns.com!imci4!newsfeed.internetmci.com!news.msfc.nasa.gov!sol.ctr.columbia.edu!hamblin.math.byu.edu!park.uvsc.edu!usenet From: Terry Lambert <terry@lambert.org> Newsgroups: comp.unix.bsd.freebsd.misc,comp.os.linux.development.system Subject: Re: The better (more suitable)Unix?? FreeBSD or Linux Date: 22 Feb 1996 18:57:52 GMT Organization: Utah Valley State College, Orem, Utah Lines: 95 Message-ID: <4giebg$70o@park.uvsc.edu> References: <4er9hp$5ng@orb.direct.ca> <4g33tp$esr@park.uvsc.edu> <4g57cj$gc3@pell.pell.chi.il.us> <4g5k95$28m@park.uvsc.edu> <4ggol0$38h@pell.pell.chi.il.us> NNTP-Posting-Host: hecate.artisoft.com Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:14221 comp.os.linux.development.system:17840 orc@pell.chi.il.us (Orc) wrote: ] >That's an easy one to answer: so if you screw up, you *only* ] >screw up the data that you are writing instead of screwing up ] >all the data on the file system (or at least some of the data ] >that was there before and totally uninvolved in the screwed ] >transaction). ] ] Now this is the point where I get interested. Could you describe how ] sync metadata doesn't screw up unrelated data vs async metadata? (I'm ] really interested in your analysis of this, since it's better than the ] IS NOT! IS TOO! that this conversation has become.) Consider: 1) I have a file "foo" that has inode 175 and points to blocks 10, 20, 30, and 40 2) I delete file foo. Inode 175 is freed. 3) I create file "fie", which allocates inode 176. I allocate block 10. I write sensitive information in block 10. 4) The system crashes before the new block list for inode 175 is written and before "fie" is committed to disk, but after block 10 is committed to disk. Because we are using async I/O and thus the ordering does not enforce the series of structural modifications to ensure that the operations are idempotent. The I/O subsystem is free to reorder async I/O's for efficiency, said reordering potentially resulting in the scenario described. Basically I have silently corrupted an [apparently] unrelated file (from a user perspective, after the crash, the non-commit of the data has "rolled back" the deletion of "foo"). Alternately, I could have recommitted inode 175 to a new file, but not updated its block list. For instance, if the async write to create the directory entry occurred before the async write to reset the inode, I could easily find myself with someone elses file in my directory. It's possible to come up with similar scenarious using directory contents, etc.. The only constraint is the reuse constraint on blocks inherent in the ordering of operations in the file system (for instance, I could *not* allocate a block that was already allocated). I'm not suggesting that this is frequent, only that the probability is non-zero. ] >It's a limitation on the amount of damage you can do. ] ] My initial guess would be that sync metadata would increase ] the window for corrupting data, because first you write the ] metadata, then you go back and write the data; unless you've ] got a structure where the metadata is at one end of the disk ] and the data is at the other, it seems like it would average ] 1.5 passes over the platter to write things out. Contrarily, ] if metadata and data is mixed together, it will get dumped to ] disk in one pass, and even though it's more likely you'll end ] up with data written and the metadata not updated, it's all ] shovelled to disk faster. Your initial guess must be wrong. First of all, there is large cylinder locality between inodes and the data they contain. That's what cylinder groups are for. Second, there is nothing preventing the async ordering from resulting in the exact same ordering as the I/O requests were made. This would result in exactly the same (smaller) set of failure modes as the sync metadata case. Third, a sync metadata request followed by one or more async data requests can never result in the data being written before the metadata, only after. That is, it is only possible to write data blocks for a file to the correct file in the sync case. This is not true of the async case (as shown above). It's common among engineers to want "faster" to be "more correct"; the universe rarely complies. 8-). Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.