*BSD News Article 13230

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!soda.berkeley.edu!wjolitz
From: wjolitz@soda.berkeley.edu (William F. Jolitz)
Newsgroups: comp.os.386bsd.bugs
Subject: Re: Excessive Interrupt Latencies
Date: 23 Mar 1993 19:39:40 GMT
Organization: U.C. Berkeley, CS Undergraduate Association
Lines: 134
Distribution: world
Message-ID: <1onp1s$nbf@agate.berkeley.edu>
References: <GENE.93Mar15115756@stark.stark.uucp>
NNTP-Posting-Host: soda.berkeley.edu

In article <GENE.93Mar15115756@stark.stark.uucp> gene@stark.uucp (Gene Stark) writes:
>I have been trying to get some insight into the *real* problems underlying
>the "com: silo overflow" problems.  By hacking in some instrumentation
>using the recently posted high-precision "microtime" routine, I have been
>able to convince myself that the problem is that the latency between the
>time the com hardware requests an interrupt and the time control reaches
>the comintr routine is often as much as 400us (on my 486DX/33) and can be
>as long as 1.5ms or more.  In addition, the system sometimes seems to get
>into a state where latencies over 1ms seem to be the norm, rather than the
>exception.

I can confirm that this is occurring. I've instrumented the kernel and
instituted trace mechanisms that have found some serious time wasters.

[BTW, I had microtime working in the pre-net2 system, it just gave the
wrong results because I did not notice that the value was a down counter,
I'd assumed up counting... so it was taken out. I needed it for the
time traces lately, so it's back in. Add the line:
	outb(port+3, timer<<6);	/* emit latch command */
to the function getit() before the first inb(), and add the function:

/*
 * get fraction of tick in units of microseconds
 */
getmicrofraction() {
	extern unsigned it_ticksperintr;	/* starting count of timer */

	return((it_ticksperintr - getit(0, 0)
		* it_ticksperintr * hz / 1000000);
}

and replace the line before the while in the function microtime() with:
	tvp->tv_usec += tick + getmicrofract();

If you are interested in microsecond times in 0.1 (sorry, no diffs as I'm
doing this off the top of my head for 0.1 -- 0.2 has different files/layout
at the moment that makes a 0.1 "patch" inconvienent), you can incorporate
the above changes into your code.]

At the moment, one thing that can be done with Compaq/EISA machines is to set
the second timer unit (which provides an NMI on timeout) for a
millisecond whenever the *first* spl/interrupt lockout occurs, and
disabling the timer when the spl/interrupt clear occurs. By saving the
return PC or interrupt vector, one can display the originating "block"
to decipher why the interrupt has been locked out.

> [stuff about promoting comintr to splhigh]

Yes, the tty driver is botched in a few places in ttyinput() -- there is
some naive code which iterates assuming clists that will be removed soon --
but the problems I've found so far have been the result of an unfinished
driver. I suggest that you use time measurement and profiling to find the 
problems. Some are structural.

>One problem in trying to figure out what is going on is that it is very
>difficult to track priority levels through the code in locore.s.

... And elsewhere in the kernel ...

>I have a sneaking suspicion that under certain circumstances control
>is leaving the context switcher and reaching a user process in the system
>at splhigh when it shouldn't.  This would cause a long stretch of system code
>to be executed with interrupts masked, producing the observed latencies.

This did occur with an previous version of the system. That was why
after completing a trap that spl0 was done (likewise syscall). I've
not seen these in 0.1

>In trying to understand what is happening, I came across the following code
>in locore.s (occurs about line 1302, at the end of "swtch"):
>
>	movl	%ecx,_curproc		# into next process
>	movl	%edx,_curpcb
>
>	/* pushl	PCB_IML(%edx)
>	call	_splx
>	popl	%eax*/
>
>	movl	%edx,%eax		# return (1);
>	ret

A change in the system occured just prior to net2's release -- this obsoleted
the saving/restoring of interrupt priority level by swtch(). Instead,
the section of code to call swtch must splclock() before calling swtch
(to lock out changes of the process run queues), and afterward set to
the appropriate level desired (in most cases, splnone()).

The bug in 0.0 (and net2, btw) was a missing splnone() after swtch()
in certain places. This was fixed in 0.1. 0.2, naturally does this
differently (soon, real soon -- don't ask about it just yet) since the
costs of the damn spl's are a fair portion of a context switch, and
0.2 is attempting to be a considerably more "lightwieght" system for
reasons like this. However, 0.1 is correct and consistant here, so
this is not a problem.

>The thing that concerns me is the commented-out code for restoring the
>priority level from the pcb.  It looks to me like when this code is
>called at splclock(), (see for example, the end of "cpu_exit" in
>vm_machdep.c), then control could be returning to a user process in the
>system at splclock, instead of whatever priority it ought to be running at.

NO! swtch() just goes to some other kernel mode process that called
swtch(), so all returns from swtch() need to have an splnone() or
somesuch (excepting cpu_exit(), which is guaranteed never to occur,
obviously!). Note that if no process is ready to run, the kernel idles
at splnone().

There are many improvements made in 0.2 concerning time management, and, with
the new diagnostics I've added, the next release promises to keep them
safe at interrupt level. It's been a perpetual problem with all systems
I've worked on (excepting the true real-time ones) that serial ports
drop characters due to interrupt lockout because of programming errors.

This problem with UNIX-like systems is rooted in a driver/system interface
that relies on implicit changes in priority level as part of the
programming interface. This is not so with other systems -- especially with
the PC. I've found that the "right" solution has to enforce appropriate
use of short term interrupt lockout, if used.

If you don't solve the root problem here, then a later improvement or
new driver just busts it again. If you don't buy this argument, look
at various editions of SunOS for proof. The solution requires a significant
change (which should have happened back in the PDP-11 days, ...) that
has been postponed too long. Interrupt priority levels need to be
a managed resource/interface. More on this later.

Hope this helps and is not too leading, but 0.1 does have a few limitations
in the performance arena, and this is one of the excellent reasons for
finishing and releasing 0.2. Please continue to look carefully at 0.1,
maybe you'll spot something I've missed along the way, and I'd appreciate
hearing about it -- critical feedback is always welcome, especially
when one goes to the trouble of getting some numbers to justify such.

Bill.