Return to BSD News archive
Path: sserve!manuel!munnari.oz.au!spool.mu.edu!agate!ames!sgi!rigden.wpd.sgi.com!rpw3 From: rpw3@rigden.wpd.sgi.com (Rob Warnock) Newsgroups: comp.unix.bsd Subject: Re: ASTs for device driver usage (was Re: COM driver and X11) Message-ID: <n6su1e0@sgi.sgi.com> Date: 12 Jul 92 12:09:38 GMT Article-I.D.: sgi.n6su1e0 Sender: rpw3@rigden.wpd.sgi.com Organization: Silicon Graphics, Inc. Mountain View, CA Lines: 146 cgd@agate.berkeley.edu (Chris G. Demetriou) writes: +--------------- | There are some significant technical and aesthetic problems with | getting the com driver to work as above (using ASTs -- and that's | really the only way to do it well...): +--------------- Well, not quite. What we did at Fortune Systems (circa 1981) was take *every* hardware device in the system and convert it to a "pseudo-DMA" device. That is, when an interrupt occurred a *small* assembly-language grabbed the status (and for serial I/O, the data), queued an entry at the appropriate level in one of a few little real-time "run queues", set a flag, and dismissed the interrupt. I have written several long net articles on the scheme ("two-level interrupts") before. I'll see if I can dig them out. But here's a brief intro: The amount of work done by the "real" hardware interrupt routines should be minimal, just enough to emulate an abstract "smart" DMA controller. For example, the disk interrupt turned a traditional "one command/one interrupt" controller into a virtual "linked list of commands/linked list of statuses" controller, giving a 2nd-level virtual interrupt only when the queue of work to do fell empty. The serial interrupt moved one byte into or out of one of a pair of ping-pong buffers (per port), and queued a 2nd-level "virtual interrupt" only when it switched buffers (or, checked in the clock interrupt, if it had been 1/30 second with no activity). The queued 2nd-level "virtual" interrupts were serviced by a common multi-level task dispatcher (I hesitate to say "scheduler" -- it wasn't that complex), which simply ripped tasks to run off the queues and called them, as if they were in fact interrupt routines. (This is why I sometimes call the 2nd-level service routines "virtual interrupt" handlers.) This dispatcher ran at the lowest-priority interrupt service level in the system, and was in fact simply a few lines of code added to the callout dispatcher. (Indeed, an earlier single-level version of the scheme just used "timeout(,0,)" to enqueue a task to run.) The "spl*()" routines were redefined to *not* shut off true hardware interrupts but to simply raise and lower the virtual interrupt "level", which was just which queues would get serviced. (Later, we even made the virtual levels preemptable, though on fast processors a single level without preemption suffices.) This *COMPLETELY* decoupled actual hardware interrupt priorities from "virtual interrupt" priorities (actual service order), which could then be chosen as needed to maximize throughput. Since the "spl*()" didn't actually turn of hardware interrupts, serial I/O overruns could not occur unless some 2nd-level interrupt completion task ran longer than the time it took to fill both of the ping-pong buffers. So rather than spending lots of time rewriting all of the old krufty disk buffer cache code to not keep interrupts off so long, we simply *measured* the worst-case interrupt-off time we saw, and made the serial I/O pseudo-DMA ping-pong buffers bigger than that (about twice as big -- then as now, memory was cheaper than kernel-hacker time). As a result, we obtained a 12-fold increase in serial I/O throughput. When we started, we couldn't even keep a single 2400-baud UUCP going without frequently dropping a character. When we finished, we could run *three* 9600-baud UUCPs simultaneously and never drop a character. And this was on a 5.5 MHz 68000, with serial I/O chips with only a couple of characters of FIFO (Zilog SIOs). Surely a 33 Mhz (or faster) 386/486 can do as well!?! +--------------- | traditionally, ASTs are used only for forcing context | switching, and evoking the behavior described above in the | networking code (so that, for instance, you can get lots | of packets from the ethernet in a burst, without having to | call ip_input on each as they are received...) +--------------- Recent studies (Van Jacobson, plus other unpublished work) have shown that while this strategy -- effectively a kind of 2-level interrupt, if you think about it -- was useful when CPUs were very slow and network cards had little buffering on them, these days many controllers can queue several -- sometimes dozens of -- received packets. So in terms of total CPU overhead, it may be *better* to go ahead and call ip_input() once for every packet, and skip the overhead of IF_{EN,DE}QUEUEing and especially the overhead of the AST and attendent context switch. +--------------- | now, the problem is this: the networking code is more or | less part of the standard kernel, while the com driver | (and device drivers in general) are *NOT*. to add an | AST especially for the com driver would be, well, | ugly, because it would violate the machine-independence | of most of the "kern" and "net" subdirs... +--------------- Especially since the next genration of network code will probably *remove* the networking AST! +--------------- | In my mind, the solution is this: | add an AST type for "device AST" -- which would be | dispatched to a machine-dependent handler, which would | then check which devices needed the AST, and call their handlers. +--------------- See the above sketch of the "2-level interrupt" scheme. If it is to work, it must be done for *all* devices which cause interrupts, so that the "spl*()" routines can be turned into "software" priority levels and the real hardware interrupts left enabled (almost) all of the time. Note: The one place where a true hardware interrupt-off may still be needed is when the 2nd-level service routine is manipulating data shared with the 1st-level pseudo-DMA service routine. But that is a *very* localized interlock, and (if done correctly) *never* requires that the interrupt enable be off for more than a few instructions. +--------------- | the drawback of this is that there would be an extra level | of indirection in the entire thing, and it could make | some of the device driver/interrupt handler code a bit uglier... +--------------- Believe it or not, it can actually *significantly* increase performance! The little light-weight pseudo-DMA interrupts just queue up work to be done, not saving/restoring much state, and the "2nd-level interrupt" task dispatcher calls one (virtual) interrupt routine after another, seldom doing a full context save/restore. ...provided that interrupts are fairly "bursty" so that you can run more than one 2nd-level service routine per invocation of the dispatcher. And our experience at Fortune and my experience since then has been that interrupts *are* bursty enough for it to be a big win. -Rob p.s. The notion of "2-level interrupts" and "pseudo-DMA" is not new. It has been used in real-time and data-communications kernels for over 25 years. We used it in the "WSCHED" communications kernel at Digital Communications Associates in 1972. [I wrote the guts of the interrupt system and task scheduler for WSCHED.] That system was able to handle over 10,000 characters per second *through* the system (in and out counts as "1"), interrupt per character on each side... on an 833 KHz (!) DEC PDP-8. p.p.s. Another thing that's a really big win is always polling the I/O hardware to see if there's another interrupt pending before dismissing the current one. If so, you can save save all of the dismiss/interrupt overhead. Of course, in the @#&^%$! IBM PC architecture, many devices are not capable of being polled in this way, and you have to play weird games with the interrupt-controller chip. (*Ugh!*) ----- Rob Warnock, MS-9U/510 rpw3@sgi.com Silicon Graphics, Inc. (415)390-1673 2011 N. Shoreline Blvd. Mountain View, CA 94043