*BSD News Article 33930

Xref: sserve comp.os.386bsd.misc:3063 comp.os.linux.misc:21227
Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!msuinfo!agate!dog.ee.lbl.gov!news.cs.utah.edu!u.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (Terry Lambert)
Newsgroups: comp.os.386bsd.misc,comp.os.linux.misc
Subject: Re: STREAMS  (was I hope this wont ignite ...)
Date: 6 Aug 1994 07:29:48 GMT
Organization: Weber State University, Ogden, UT
Lines: 57
Message-ID: <31ve5c$4b1@u.cc.utah.edu>
References: <31d5ls$8e9@quagga.ru.ac.za> <Cu0w8x.923@seas.ucla.edu> <Cu2Ey9.2oM@calcite.rhyolite.com>
NNTP-Posting-Host: cs.weber.edu

In article <Cu2Ey9.2oM@calcite.rhyolite.com> vjs@calcite.rhyolite.com (Vernon Schryver) writes:
] Unfortunately, all of those put and service functions and the generic
] nature of the stream head and scheduler ensure that STREAMS are never
] as fast as sockets.  I think you can make "page flipping" and "hardware
] checksumming" work with STREAMS (two primary techniques for fast
] networking), but I doubt it is possible to make a "squashed STREAMS
] stack" without doing fatal violence to the fundamental ideas of STREAMS.
] The fastest TCP/IP implementations are based on sockets, not STREAMS,
] and they run 2 to 20 times faster (yes, twenty, as in Gbit/sec).

You can build a "stack compiler" that takes I/O and connection
specifications for multiple stacks and "squashes" them into a single
stack with apparently discrete interfaces.  There is at least one
commercial implementation that does this (I would have to look at my
notes at work to see which one).

The page flipping an HW checksumming are both good points.  Another
technique is to "pre-know" how much you nead to read at the card
level; you can do this with incestuous knowledge on a per-protocol
basis in the drivers; this can nearly triple burst rate (but won't
do anything for propagation delay).

Another "cheat" is to start pushing a packet and shove it all the way
down at a high priority.  This isn't combinatorial with "squashing",
and leads to some cute problems unless a lot of thought is taken
beforehand.

One "trick" that does do "fatal violence to the fundamental ideas of
STREAMS" (I like that phrase) is doubly mapping the buffers, pinning
the pages, and passing the address rather than the data itself.  This
requires pre-preparing the page mapping so the kernel and user space
mapping is the same.  Packet assembly at the stram "tail" must take
this into account, but if done correctly, this will save two copies
and a *lot* of page overhead on a 386 (less so on a 486 or other
rational kernel page protecting architecture).

Another "trick" is to preallocate the buffers to include the protocol
header and thus avoid the assembly entirely (leaving only the copy
to card memory, and only that if that is a considration and the card
doesn't DMA from main memory).  This does violence to the buffer return
and the stream head, and generally doubles the buffer memory consumption
(to be safe).  The user space copyin is done into the real buffer as
a unit instead of into "real" (seperate) mbufs.  This techniques is
not usable simultaneously with the previous one, unless the user space
application has incestuous knowledge of the protocol and can handle
skipping the encapsulation (header) data in dealing with the buffer
contents.

STREAMS can be high performance, but, as you note, at almost the
penalty of not being STREAMS any more except in the technical sense.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.