Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!spool.mu.edu!howland.reston.ans.net!bogus.sura.net!udel!sbcs.sunysb.edu!stark.UUCP!gene From: gene@stark.uucp (Gene Stark) Newsgroups: comp.os.386bsd.bugs Subject: Excessive Interrupt Latencies Date: 15 Mar 93 11:57:56 Organization: Gene Stark's home system Lines: 49 Distribution: world Message-ID: <GENE.93Mar15115756@stark.stark.uucp> NNTP-Posting-Host: stark.uucp I have been trying to get some insight into the *real* problems underlying the "com: silo overflow" problems. By hacking in some instrumentation using the recently posted high-precision "microtime" routine, I have been able to convince myself that the problem is that the latency between the time the com hardware requests an interrupt and the time control reaches the comintr routine is often as much as 400us (on my 486DX/33) and can be as long as 1.5ms or more. In addition, the system sometimes seems to get into a state where latencies over 1ms seem to be the norm, rather than the exception. I hacked up a version of the com driver for which the comintr routine runs at splhigh, performing minimal service and queueing work to run later at spltty. This code dramatically decreases, but does not completely eliminate, silo overflow errors. I conclude that some portion of the system is occasionally running for periods over 1ms with interrupts masked. This seems excessive, and I have been trying to track down where this might be occurring. One problem in trying to figure out what is going on is that it is very difficult to track priority levels through the code in locore.s. I have a sneaking suspicion that under certain circumstances control is leaving the context switcher and reaching a user process in the system at splhigh when it shouldn't. This would cause a long stretch of system code to be executed with interrupts masked, producing the observed latencies. In trying to understand what is happening, I came across the following code in locore.s (occurs about line 1302, at the end of "swtch"): movl %ecx,_curproc # into next process movl %edx,_curpcb /* pushl PCB_IML(%edx) call _splx popl %eax*/ movl %edx,%eax # return (1); ret The thing that concerns me is the commented-out code for restoring the priority level from the pcb. It looks to me like when this code is called at splclock(), (see for example, the end of "cpu_exit" in vm_machdep.c), then control could be returning to a user process in the system at splclock, instead of whatever priority it ought to be running at. Can anyone shed some light on this? - Gene Stark -- stark@cs.sunysb.edu