Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!usenet.eel.ufl.edu!gatech!news.mathworks.com!uunet!in2.uu.net!news.artisoft.com!usenet From: Terry Lambert <terry@lambert.org> Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc Subject: Re: TCP latency Date: Tue, 16 Jul 1996 12:34:23 -0700 Organization: Me Lines: 119 Message-ID: <31EBEEBF.5E0B4E7E@lambert.org> References: <4paedl$4bm@engnews2.Eng.Sun.COM> <31E7C0DD.41C67EA6@dyson.iquest.net> <4s8tcn$jsh@fido.asd.sgi.com> <31E80ACA.167EB0E7@dyson.iquest.net> <4sadde$qsv@linux.cs.Helsinki.FI> <31EA9FBC.41C67EA6@star-gate.com> <DuLzKz.Fsy@kroete2.freinet.de> NNTP-Posting-Host: hecate.artisoft.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 2.01 (X11; I; Linux 1.1.76 i486) Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:45449 comp.unix.bsd.netbsd.misc:4081 comp.unix.bsd.freebsd.misc:23760 Erik Corry wrote: ] : > In article <31E80ACA.167EB0E7@dyson.iquest.net>, ] : > John S. Dyson <toor@dyson.iquest.net> wrote: ] : > >I think that was a kind-of cute situation. We decided NOT ] : > >to special case the syscall that Larry uses for the ] : > >null-syscall case. ] ] I think what John wrote above can only be interpreted as a complaint ] that Linux has a special case for the null syscall. I certainly ] interpreted it that way, so did Linus, and so did most people ] reading the message. If nobody special-cased the null syscall, why ] bring it up at all. I interpreted it to mean that Larry picked a poor system call in John's opinion because of the inherit VFS/VOP implementation bias in using a zero length write to /dev/null, as opposed to some other system call which did not measure FS layer overhead as well. In addition, since I am significantly better informed on the BSD FS internals than your average code hack, I understand the misinterpretation of the statement. However, I can insure you it was a misinterpretation. The /dev/null zero length write invokes the vfs_syscalls.c, then the vnode_if.c then the spec_vnops.c, tthen a structurally bogus lock/unlock pair, surrounding a call through the cdev. While it may be useful to measure VFS interface overhead, which you would do by zero writing a block device instead of a character device, and subtracting out the system call overhead: case VOP_UNLOCK(vp, 0, p); error = (*cdevsw[major(vp->v_rdev)]->d_write) (vp->v_rdev, uio, ap->a_ioflag); vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, p); return (error); case VBLK: if (uio->uio_resid == 0) return (0); if (uio->uio_offset < 0) return (EINVAL); and NOT descending into the devswitch (which is going away soon anyway, along with specfs itself), a null write in this fashion is far from simply system call overhead, which is what we are purporting to measure with the test. In case it isn't obvious, pretesting the uio in the VCHR case, as in the VBLK would be one way to special case the call, as John suggested, and pretesting the uio before descending into the VOP calls through vnode_if.c at all, would be the other. Both of these would significantly "improve" the BSD "performance" on this "benchmark". If this were applied at the system call layer, it's pretty obvious that the MSDOSFS semantics of "zero length write is set EOF" could not be supported through the interface. ] It looks from this as if John thinks the reason Larry benchmarks ] the null syscall is that Larry thinks people want to do thousands ] of null syscalls per second. Of course the null syscall isn't ] important, it's just a way of measuring the syscall overhead when ] you make a useful syscall. And that (I hope everyone can agree) is ] an interesting figure. It *is* an interesting figure. It just isn't the figure that is returned by this particular choice of system call in the BSD case, and thus you can not compare the Linux and BSD values as "system call overhead", which this test purports to do by the labelling of its output. ] If John thinks there's a better (historical?) way to test that ] overhead he doesn't say what it is. Yes, and "shame on John" for this. He's doing what I usually do, which is assuming a significant amount of context: in this case, a knowledge of the BSD FS call path implementation to know whether that particular call is a good one for measuring what it is purporting to measure. My personal suggestion would be something like setgid(), and toggle back and forth between groups (to avoid optimistic caching, in case you were wondering). This could still invoke group validation instead of simple call overhead, so you should be aware of the implementation on the system you are testing. I can't in good conscience suggest getpid(); as has been pointed out, it is a poor NULL system call, since a correct implementation would perform user space caching in the library (and implement cache coerency in the child side of the fork call in the library). Other calls are subject to skew for reasons other than caching. Probably the best bet would be an agreed upon "null system call" kernel entry to cause the system call turn around to be the *only* thing measured. You could do this on most moder systems (even down to SRV3) if you are willing to write a loadable system call (which you can do in SVR3 if you are clever at all). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.