*BSD News Article 74602

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.mel.connect.com.au!news.mira.net.au!inquo!in-news.erinet.com!newsfeeder.sdsu.edu!nntp.primenet.com!news.mathworks.com!enews.sgi.com!fido.asd.sgi.com!neteng!lm
From: lm@neteng.engr.sgi.com (Larry McVoy)
Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc
Subject: Re: TCP latency
Date: 24 Jul 1996 21:07:40 GMT
Organization: Silicon Graphics Inc., Mountain View, CA
Lines: 98
Message-ID: <4t63as$f2q@fido.asd.sgi.com>
References: <4paedl$4bm@engnews2.Eng.Sun.COM> <4sadde$qsv@linux.cs.Helsinki.FI> <31E9E3A7.41C67EA6@dyson.iquest.net> <4sefde$f0l@fido.asd.sgi.com> <4socfr$3ot@dworkin.wustl.edu>
Reply-To: lm@slovax.engr.sgi.com
NNTP-Posting-Host: neteng.engr.sgi.com
X-Newsreader: TIN [version 1.2 PL2]
Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:46228 comp.unix.bsd.netbsd.misc:4156 comp.unix.bsd.freebsd.misc:24335

Chuck Cranor (chuck@ccrc.wustl.edu) wrote:
: In article <4sefde$f0l@fido.asd.sgi.com>,
: Larry McVoy <lm@slovax.engr.sgi.com> wrote:
: >Umm, I'd be happy to entertain suggestions for a better measurement of
: >a null entry into the system.  I don't want something that anyone special
: >cases - that's just worthless.  I want something that is actually measuring
: >all the work you need to do to get to the point that you can do something in
: >the kernel.

: I took Larry's lat_syscall.c and a few of J Wunsch's suggestion for 
: different system calls to try and ran a few tests.    Here are the results:

: [note: sparc 2 is running SunOS 4.1.3_U1 (48MB RAM), P5-133MHz is running
:  NetBSD/i386 (32MB RAM) ... both systems unloaded.   all numbers are
:  microseconds]

: program		description		Sparc2		P5-133MHz
: lat_syscall	write 1 to /dev/null		61 		6	
: lat_gettime	gettimeofday(&tv,0)		27		5
: lat_kill	kill(1,0)			23		2
: lat_umask	umask(0)			19		2
: lat_getppid	getppid()			17		1

: Given that data, it seems like lat_syscall's writing 1 byte to /dev/null
: is indeed a poor measurement of "Null syscall."   This leaves me with
: two questions:

: 1. Larry, when you were designing lat_syscall, did you see numbers like 
:    the above?   If not, then I would consider that a mistake.   If so, 
:    then why did you stick with "write 1 to /dev/null" as a measurement
:    of "Null syscall" (which I also consider a mistake)?

Sure did.  If there is a mistake here, it is my choice of name for the
benchmark.  What I wanted was an entry into the kernel that represented
the approximate real, average cost of getting to the point of being able
to do something useful & common.  I'll try and provide some insight into
the thinking that went into this:

	getpid()	It can be trivially optimized down to a memory
			read.  The variance from one system to the next
			does not reflect anything that can be used as a 
			performance comparison.  
	
	getppid()	This one turns into a trap plus a read.  It is a
			"read only" type benchmark, I wanted one that 
			had to do some work.
	
	gettimeofday()	Some systems have a global variable that gets
			updated out of hardclock() every HZ (typically
			10 millisecs).  So this also can degenerate into
			a trap plus a memory load.  But other systems 
			actually read a high resolution clock for this
			system call, and reading it takes variable amounts
			of time, again making the results not very useful.
	
	kill(), umask()	These are the best "null system call" benchmark
			choices I've seen.  I'd vary the mask in the umask 
			one so that it was actually changing state.  The 
			only rationale for not using these is this:  the
			main reason I wanted a "null system call" test was
			that for all of the other benchmarks, I wanted to
			be able to "decompose" them into the various costs.
			For example, the pipe latency benchmark is really

			process 1		process 2
			write()
			ctx switch   ->
						read()
						write()
					<-	ctx sw
			read()

			So it is 4 system calls and 2 context switches.
			I wanted to be able to look at the pipe benchmark
			and have the numbers roughly add up.  And they
			typically do.  

So, I'm willing to cop to the critism that the labeling was crappy and
perhaps I should call it the "null I/O syscall".  

I'm also willing (and interested) to find a different syscall that just
measures trap overhead, but I haven't seen one yet that I really like.
The getppid() may be the best out there, though, it's hard to cache 
that.  Thoughts?

: 2. How much of the difference between FreeBSD lat_syscall and Linux
:    lat_syscall can be attributed to VFS overhead in FreeBSD?   Or 
:    more generally, how does the overhead and functionality of Linux's 
:    VFS layer compare with FreeBSD's VFS layer?

Both FreeBSD and Linux offer roughly the same VFS interface.  It took me
a while to wrap my brain around Linus' thinking in his stuff, having
come from a SunOS/BSD background and having spent a lot of time working
in that area, but at this point, I think I can do everything in Linux that
I could do in *BSD or SunOS, in the VFS areas.
--
---
Larry McVoy     lm@sgi.com     http://reality.sgi.com/lm     (415) 933-1804