Return to BSD News archive
Xref: sserve comp.os.386bsd.questions:12321 comp.os.386bsd.misc:3184 Newsgroups: comp.os.386bsd.questions,comp.os.386bsd.misc Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!msuinfo!agate!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!decwrl!netcomsv!netcomsv!calcite!vjs From: vjs@calcite.rhyolite.com (Vernon Schryver) Subject: Re: NFS buffering (was Whats wrong with Linux networking ???) Message-ID: <CuH8uH.4Ev@calcite.rhyolite.com> Organization: Rhyolite Software Date: Sat, 13 Aug 1994 14:13:28 GMT References: <32bflj$lig@cesdis1.gsfc.nasa.gov> <CuDJox.HE2@calcite.rhyolite.com> <32gk4d$ee@cesdis1.gsfc.nasa.gov> Lines: 74 In article <32gk4d$ee@cesdis1.gsfc.nasa.gov> becker@cesdis.gsfc.nasa.gov (Donald Becker) writes: >Vernon Schryver <vjs@calcite.rhyolite.com> wrote: >>>The NFS protocol assures the client that when the write-RPC returns, the >>>data block has been committed to persistent storage. For common >>>implementations that means the block has been physically queued for writing, >>>not just put in the buffer cache. ... >> >>An NFS server that only queues the block for writing before responding >>instead of waiting for the disk controller to say that the write has >>been completed does not meet the NFS "stable storage" rules. Such a > >Yes, Vernon, I deliberately used the word "queue" there. (I was going to >explain it, but felt it would detract from the main point of the article.) >It's not the operating system buffer cache I'm referring to, but the disk >controller queue. Most modern disk controllers, both IDE and SCSI, actually >just queue write requests and return immediately. Sure, the vulnerability >window is limited to tens of milliseconds, but I suspect most systems >technically violate the "committed to stable storage" rule. Not that I >think this is particularly bad or dangerous... Whether that is bad or dangerous is irrelevant. Your suspicion is completely wrong in the commercial world. You cannot report LADDIS numbers using such a write-caching disk controller. That's a fact. Well, you might cheat for a while, but you'll get busted. Separately, the commercial grade systems I know about emphatically do not turn on the write caches in disks. Doing so trashes "filesystem hardening" without gaining any performance. You don't spend lots of time on your disk queuiing algorithms while paying attention to filesystem hardening only to throw up your hands and just hope the disk firmware authors did their part, even if you have not found many serious firmware bugs in all vendors' drives. (No I personally haven't, but the people I work with who write the disk drivers have unending lists of firmware bugs in new and old drives.) Of course, what happens on Joe Hobbiest's homebrew PC with no-name motherboard is a different story. LADDIS numbers are not exactly relevant. For that matter, "stable storage" is not always ... a concern. >>> ... You can get around this by writing a client >>>implementation that allows multiple outstanding write requests for each >>>writing thread, at the expense of write order inconsistency. >A simple, common example: 'tail -f logfile', where "logfile" is written by a >NFS client. With multi-threaded writes it could show spurious zeroed blocks, >while a single-threaded client would produce the expected results. That is entirely false, in both premise and reasoning. 1. if you do `tail -f logfile` on the client doing the writing, you cannot tell anything about the order in which blocks are written to the disk, regardless of whether the disk is local or NFS or RFS. 2. if you do `tail -f logfile` on some other machine, then the effects of NFS retransmissions can show temporarily zeroed blocks. Some biods (or other NFS daemons) will finish sooner than others. 3. typical NFS client implementations, at least those influenced by both System V and BSD local filesystem designs, are not in the least careful about the order in which they write blocks from their caches. The update or bdflush daemon simply looks for dirty blocks in the common buffer cache and causes biod to do the NFS transaction or does the NFS transaction itself. Just as on the local disk. (2) and (3) can and do cause "spurious zerod blocks." I've seen them, but of course only in multiple-client situations, and not for slowly growing files like log files. Vernon Schryver vjs@rhyolite.com