*BSD News Article 1883

Path: sserve!manuel!munnari.oz.au!spool.mu.edu!agate!ames!sgi!rigden.wpd.sgi.com!rpw3
From: rpw3@rigden.wpd.sgi.com (Rob Warnock)
Newsgroups: comp.unix.bsd
Subject: Re: ASTs for device driver usage (was Re: COM driver and X11)
Message-ID: <n6su1e0@sgi.sgi.com>
Date: 12 Jul 92 12:09:38 GMT
Article-I.D.: sgi.n6su1e0
Sender: rpw3@rigden.wpd.sgi.com
Organization: Silicon Graphics, Inc.  Mountain View, CA
Lines: 146

cgd@agate.berkeley.edu (Chris G. Demetriou) writes:
+---------------
| There are some significant technical and aesthetic problems with
| getting the com driver to work as above (using ASTs -- and that's
| really the only way to do it well...):
+---------------

Well, not quite. What we did at Fortune Systems (circa 1981) was take *every*
hardware device in the system and convert it to a "pseudo-DMA" device. That
is, when an interrupt occurred a *small* assembly-language grabbed the status
(and for serial I/O, the data), queued an entry at the appropriate level in
one of a few little real-time "run queues", set a flag, and dismissed the
interrupt.

I have written several long net articles on the scheme ("two-level interrupts")
before. I'll see if I can dig them out. But here's a brief intro:

The amount of work done by the "real" hardware interrupt routines should
be minimal, just enough to emulate an abstract "smart" DMA controller. For
example, the disk interrupt turned a traditional "one command/one interrupt"
controller into a virtual "linked list of commands/linked list of statuses"
controller, giving a 2nd-level virtual interrupt only when the queue of work
to do fell empty. The serial interrupt moved one byte into or out of one
of a pair of ping-pong buffers (per port), and queued a 2nd-level "virtual
interrupt" only when it switched buffers (or, checked in the clock interrupt,
if it had been 1/30 second with no activity).

The queued 2nd-level "virtual" interrupts were serviced by a common multi-level
task dispatcher (I hesitate to say "scheduler" -- it wasn't that complex),
which simply ripped tasks to run off the queues and called them, as if they
were in fact interrupt routines. (This is why I sometimes call the 2nd-level
service routines "virtual interrupt" handlers.) This dispatcher ran at the
lowest-priority interrupt service level in the system, and was in fact simply
a few lines of code added to the callout dispatcher. (Indeed, an earlier
single-level version of the scheme just used "timeout(,0,)" to enqueue a
task to run.)

The "spl*()" routines were redefined to *not* shut off true hardware interrupts
but to simply raise and lower the virtual interrupt "level", which was just
which queues would get serviced. (Later, we even made the virtual levels
preemptable, though on fast processors a single level without preemption
suffices.) This *COMPLETELY* decoupled actual hardware interrupt priorities
from "virtual interrupt" priorities (actual service order), which could then
be chosen as needed to maximize throughput.

Since the "spl*()" didn't actually turn of hardware interrupts, serial I/O
overruns could not occur unless some 2nd-level interrupt completion task ran
longer than the time it took to fill both of the ping-pong buffers. So rather
than spending lots of time rewriting all of the old krufty disk buffer cache
code to not keep interrupts off so long, we simply *measured* the worst-case
interrupt-off time we saw, and made the serial I/O pseudo-DMA ping-pong buffers
bigger than that (about twice as big -- then as now, memory was cheaper than
kernel-hacker time).

As a result, we obtained a 12-fold increase in serial I/O throughput. When
we started, we couldn't even keep a single 2400-baud UUCP going without
frequently dropping a character. When we finished, we could run *three*
9600-baud UUCPs simultaneously and never drop a character. And this was
on a 5.5 MHz 68000, with serial I/O chips with only a couple of characters
of FIFO (Zilog SIOs). Surely a 33 Mhz (or faster) 386/486 can do as well!?!

+---------------
| 	traditionally, ASTs are used only for forcing context
| 	switching, and evoking the behavior described above in the
| 	networking code (so that, for instance, you can get lots
| 	of packets from the ethernet in a burst, without having to
| 	call ip_input on each as they are received...)
+---------------

Recent studies (Van Jacobson, plus other unpublished work) have shown that
while this strategy -- effectively a kind of 2-level interrupt, if you think
about it -- was useful when CPUs were very slow and network cards had little
buffering on them, these days many controllers can queue several -- sometimes
dozens of -- received packets. So in terms of total CPU overhead, it may be
*better* to go ahead and call ip_input() once for every packet, and skip
the overhead of IF_{EN,DE}QUEUEing and especially the overhead of the AST
and attendent context switch.

+---------------
| 	now, the problem is this: the networking code is more or
| 	less part of the standard kernel, while the com driver
| 	(and device drivers in general) are *NOT*.  to add an
| 	AST especially for the com driver would be, well,
| 	ugly, because it would violate the machine-independence
| 	of most of the "kern" and "net" subdirs...
+---------------

Especially since the next genration of network code will probably *remove*
the networking AST!

+---------------
| In my mind, the solution is this:
| 	add an AST type for "device AST" -- which would be
| 	dispatched to a machine-dependent handler, which would
| 	then check which devices needed the AST, and call their handlers.
+---------------

See the above sketch of the "2-level interrupt" scheme. If it is to work, it
must be done for *all* devices which cause interrupts, so that the "spl*()"
routines can be turned into "software" priority levels and the real hardware
interrupts left enabled (almost) all of the time.

Note: The one place where a true hardware interrupt-off may still be needed
is when the 2nd-level service routine is manipulating data shared with the
1st-level pseudo-DMA service routine. But that is a *very* localized interlock,
and (if done correctly) *never* requires that the interrupt enable be off for
more than a few instructions.

+---------------
| 	the drawback of this is that there would be an extra level
| 	of indirection in the entire thing, and it could make
| 	some of the device driver/interrupt handler code a bit uglier...
+---------------

Believe it or not, it can actually *significantly* increase performance!
The little light-weight pseudo-DMA interrupts just queue up work to be done,
not saving/restoring much state, and the "2nd-level interrupt" task dispatcher
calls one (virtual) interrupt routine after another, seldom doing a full
context save/restore.  ...provided that interrupts are fairly "bursty" so
that you can run more than one 2nd-level service routine per invocation of
the dispatcher. And our experience at Fortune and my experience since then
has been that interrupts *are* bursty enough for it to be a big win.


-Rob

p.s. The notion of "2-level interrupts" and "pseudo-DMA" is not new. It has
been used in real-time and data-communications kernels for over 25 years.
We used it in the "WSCHED" communications kernel at Digital Communications
Associates in 1972. [I wrote the guts of the interrupt system and task
scheduler for WSCHED.] That system was able to handle over 10,000 characters
per second *through* the system (in and out counts as "1"), interrupt per
character on each side... on an 833 KHz (!) DEC PDP-8.

p.p.s. Another thing that's a really big win is always polling the I/O hardware
to see if there's another interrupt pending before dismissing the current one.
If so, you can save save all of the dismiss/interrupt overhead. Of course, in
the @#&^%$! IBM PC architecture, many devices are not capable of being polled
in this way, and you have to play weird games with the interrupt-controller
chip. (*Ugh!*)

-----
Rob Warnock, MS-9U/510		rpw3@sgi.com
Silicon Graphics, Inc.		(415)390-1673
2011 N. Shoreline Blvd.
Mountain View, CA  94043