Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!soda.berkeley.edu!wjolitz From: wjolitz@soda.berkeley.edu (William F. Jolitz) Newsgroups: comp.os.386bsd.bugs Subject: Re: Excessive Interrupt Latencies Date: 23 Mar 1993 19:39:40 GMT Organization: U.C. Berkeley, CS Undergraduate Association Lines: 134 Distribution: world Message-ID: <1onp1s$nbf@agate.berkeley.edu> References: <GENE.93Mar15115756@stark.stark.uucp> NNTP-Posting-Host: soda.berkeley.edu In article <GENE.93Mar15115756@stark.stark.uucp> gene@stark.uucp (Gene Stark) writes: >I have been trying to get some insight into the *real* problems underlying >the "com: silo overflow" problems. By hacking in some instrumentation >using the recently posted high-precision "microtime" routine, I have been >able to convince myself that the problem is that the latency between the >time the com hardware requests an interrupt and the time control reaches >the comintr routine is often as much as 400us (on my 486DX/33) and can be >as long as 1.5ms or more. In addition, the system sometimes seems to get >into a state where latencies over 1ms seem to be the norm, rather than the >exception. I can confirm that this is occurring. I've instrumented the kernel and instituted trace mechanisms that have found some serious time wasters. [BTW, I had microtime working in the pre-net2 system, it just gave the wrong results because I did not notice that the value was a down counter, I'd assumed up counting... so it was taken out. I needed it for the time traces lately, so it's back in. Add the line: outb(port+3, timer<<6); /* emit latch command */ to the function getit() before the first inb(), and add the function: /* * get fraction of tick in units of microseconds */ getmicrofraction() { extern unsigned it_ticksperintr; /* starting count of timer */ return((it_ticksperintr - getit(0, 0) * it_ticksperintr * hz / 1000000); } and replace the line before the while in the function microtime() with: tvp->tv_usec += tick + getmicrofract(); If you are interested in microsecond times in 0.1 (sorry, no diffs as I'm doing this off the top of my head for 0.1 -- 0.2 has different files/layout at the moment that makes a 0.1 "patch" inconvienent), you can incorporate the above changes into your code.] At the moment, one thing that can be done with Compaq/EISA machines is to set the second timer unit (which provides an NMI on timeout) for a millisecond whenever the *first* spl/interrupt lockout occurs, and disabling the timer when the spl/interrupt clear occurs. By saving the return PC or interrupt vector, one can display the originating "block" to decipher why the interrupt has been locked out. > [stuff about promoting comintr to splhigh] Yes, the tty driver is botched in a few places in ttyinput() -- there is some naive code which iterates assuming clists that will be removed soon -- but the problems I've found so far have been the result of an unfinished driver. I suggest that you use time measurement and profiling to find the problems. Some are structural. >One problem in trying to figure out what is going on is that it is very >difficult to track priority levels through the code in locore.s. ... And elsewhere in the kernel ... >I have a sneaking suspicion that under certain circumstances control >is leaving the context switcher and reaching a user process in the system >at splhigh when it shouldn't. This would cause a long stretch of system code >to be executed with interrupts masked, producing the observed latencies. This did occur with an previous version of the system. That was why after completing a trap that spl0 was done (likewise syscall). I've not seen these in 0.1 >In trying to understand what is happening, I came across the following code >in locore.s (occurs about line 1302, at the end of "swtch"): > > movl %ecx,_curproc # into next process > movl %edx,_curpcb > > /* pushl PCB_IML(%edx) > call _splx > popl %eax*/ > > movl %edx,%eax # return (1); > ret A change in the system occured just prior to net2's release -- this obsoleted the saving/restoring of interrupt priority level by swtch(). Instead, the section of code to call swtch must splclock() before calling swtch (to lock out changes of the process run queues), and afterward set to the appropriate level desired (in most cases, splnone()). The bug in 0.0 (and net2, btw) was a missing splnone() after swtch() in certain places. This was fixed in 0.1. 0.2, naturally does this differently (soon, real soon -- don't ask about it just yet) since the costs of the damn spl's are a fair portion of a context switch, and 0.2 is attempting to be a considerably more "lightwieght" system for reasons like this. However, 0.1 is correct and consistant here, so this is not a problem. >The thing that concerns me is the commented-out code for restoring the >priority level from the pcb. It looks to me like when this code is >called at splclock(), (see for example, the end of "cpu_exit" in >vm_machdep.c), then control could be returning to a user process in the >system at splclock, instead of whatever priority it ought to be running at. NO! swtch() just goes to some other kernel mode process that called swtch(), so all returns from swtch() need to have an splnone() or somesuch (excepting cpu_exit(), which is guaranteed never to occur, obviously!). Note that if no process is ready to run, the kernel idles at splnone(). There are many improvements made in 0.2 concerning time management, and, with the new diagnostics I've added, the next release promises to keep them safe at interrupt level. It's been a perpetual problem with all systems I've worked on (excepting the true real-time ones) that serial ports drop characters due to interrupt lockout because of programming errors. This problem with UNIX-like systems is rooted in a driver/system interface that relies on implicit changes in priority level as part of the programming interface. This is not so with other systems -- especially with the PC. I've found that the "right" solution has to enforce appropriate use of short term interrupt lockout, if used. If you don't solve the root problem here, then a later improvement or new driver just busts it again. If you don't buy this argument, look at various editions of SunOS for proof. The solution requires a significant change (which should have happened back in the PDP-11 days, ...) that has been postponed too long. Interrupt priority levels need to be a managed resource/interface. More on this later. Hope this helps and is not too leading, but 0.1 does have a few limitations in the performance arena, and this is one of the excellent reasons for finishing and releasing 0.2. Please continue to look carefully at 0.1, maybe you'll spot something I've missed along the way, and I'd appreciate hearing about it -- critical feedback is always welcome, especially when one goes to the trouble of getting some numbers to justify such. Bill.