*BSD News Article 33985

Xref: sserve comp.os.386bsd.misc:3076 comp.os.linux.misc:21295
Newsgroups: comp.os.386bsd.misc,comp.os.linux.misc
Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!msuinfo!uwm.edu!lll-winken.llnl.gov!decwrl!decwrl!netcomsv!calcite!vjs
From: vjs@calcite.rhyolite.com (Vernon Schryver)
Subject: Re: STREAMS  (was I hope this wont ignite ...)
Message-ID: <Cu58wD.E1G@calcite.rhyolite.com>
Organization: Rhyolite Software
Date: Sun, 7 Aug 1994 02:43:25 GMT
References: <Cu0w8x.923@seas.ucla.edu> <Cu2Ey9.2oM@calcite.rhyolite.com> <31ve5c$4b1@u.cc.utah.edu>
Lines: 84

In article <31ve5c$4b1@u.cc.utah.edu> terry@cs.weber.edu (Terry Lambert) writes:
>In article <Cu2Ey9.2oM@calcite.rhyolite.com> vjs@calcite.rhyolite.com (Vernon Schryver) writes:
>] Unfortunately, all of those put and service functions and the generic
>] nature of the stream head and scheduler ensure that STREAMS are never
>] as fast as sockets.  I think you can make "page flipping" and "hardware
>] checksumming" work with STREAMS (two primary techniques for fast
>] networking), but I doubt it is possible to make a "squashed STREAMS
>] stack" without doing fatal violence to the fundamental ideas of STREAMS.

> ...
>One "trick" that does do "fatal violence to the fundamental ideas of
>STREAMS" (I like that phrase) is doubly mapping the buffers, pinning
>the pages, and passing the address rather than the data itself.  This
>requires pre-preparing the page mapping so the kernel and user space
>mapping is the same.  Packet assembly at the stram "tail" must take
>this into account, but if done correctly, this will save two copies
>and a *lot* of page overhead on a 386 (less so on a 486 or other
>rational kernel page protecting architecture).

That's exactly what I call "page flipping."  I don't think it does
violence to STREAMS.   Simply create a new STREAMS buffer type.  It's
easier to create STREAMS buffer types than fancy mbuf clusters.  I don't
know why fewer people play such games with STREAMS buffers than mbufs.
"Type 3" mbufs were the rage at Sun in 1986.  My FDDI code has been
"page flipping" mbufs for years, with gratifying performance results.
HP's FDDI code also page flips, with performance almost as good.  Output
page flipping is quite easy if you have copy-on-write.  Input is harder,
but modifying ld(1) to page align big buffers by default or special
option makes it practical.


>Another "trick" is to preallocate the buffers to include the protocol
>header and thus avoid the assembly entirely (leaving only the copy
>to card memory, and only that if that is a considration and the card
>doesn't DMA from main memory).  This does violence to the buffer return
>and the stream head, and generally doubles the buffer memory consumption
>(to be safe).  The user space copyin is done into the real buffer as
>a unit instead of into "real" (seperate) mbufs.  This techniques is
>not usable simultaneously with the previous one, unless the user space
>application has incestuous knowledge of the protocol and can handle
>skipping the encapsulation (header) data in dealing with the buffer
>contents.

This is an ancient BSD mbuf trick.  I don't think it does any violence
to STREAMS.  At most your STREAMS modules have to peak at STREAMS buffer
reference counts and know more than they should about the underlying
implementation of the buffers (e.g. to do as you say and avoid writing
on buffers that are not really simple buffers.)  My first commerical
STREAMS code in 1986 played such games to make tty's go faster on 68000
some based systems.  (That's not intended as a brag, but proof it's not
rocket science.)


>STREAMS can be high performance, but, as you note, at almost the
>penalty of not being STREAMS any more except in the technical sense.

I disagree.  I don't think you can build what I understand Van Jacobson
calls a "squashed stack" without changing the STREAM head code beyond
recognition.  Remember that Jacobson's neat idea (as I understand it)
is to cache the entire pile of headers, from TCP through MAC, and when
the user makes a write(2) call, combine a copy of that cached glob of
headers with the user's data while doing the TCP checksum, make the
mindless modifications about 10 bytes among to those 54 bytes (for
Ethernet) or 64 bytes (for FDDI with typical MACs), and stick the result
on the MAC chip's DMA queue.  Those mindless modifications consist
of adding values to the previous contents--e.g. TCP seq #, IP ID,
and IP cksum.  Note that there is no ARP lookup, no running through
TCP state machine switches, and no IP fiddling.  It's just "header
prediction" or "header compression" taken to it's obvious conclusion.
("obvious" once you're told about it, that is).

Think about how the STREAMS head would have to be smart enough to do
all of this and bypass all of the put and service functions, except when
something exceptional has happened in which case it must do the old
fashioned stuff.  Note also that the STREAMS head would have to arrange
to keep the user data around on some queue somewhere in case of
retransmissions.  On the other hand, not having seen Van Jacobson's
code, but having thought a little about it, this seems to me like fairly
straight forward violence to the BSD sosend() function--yeah, I understand
the protocol switch is much changed and sosend() may not be called
sosend() anymore, but those are not a big deal.


Vernon Schryver    vjs@rhyolite.com