*BSD News Article 73005

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.wildstar.net!cancer.vividnet.com!hunter.premier.net!news-res.gsl.net!news.gsl.net!nntp.coast.net!swidir.switch.ch!scsing.switch.ch!news.rccn.net!master.di.fc.ul.pt!usenet
From: Pedro Roque Marques <roque@oberon.di.fc.ul.pt>
Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc
Subject: Re: TCP latency
Date: 07 Jul 1996 00:49:45 +0100
Organization: Faculdade de Ciencias da Universidade de Lisboa
Lines: 191
Message-ID: <x7ybkxgcx2.fsf@oberon.di.fc.ul.pt>
References: <4paedl$4bm@engnews2.Eng.Sun.COM> <4qaui4$o5k@fido.asd.sgi.com>
	<4qc60n$d8m@verdi.nethelp.no> <31D2F0C6.167EB0E7@inuxs.att.com>
	<4rfkje$am5@linux.cs.Helsinki.FI> <31DC8EBA.41C67EA6@dyson.iquest.net>
NNTP-Posting-Host: oberon.di.fc.ul.pt
Mime-Version: 1.0 (generated by tm-edit 7.69)
Content-Type: text/plain; charset=US-ASCII
X-Newsreader: Gnus v5.2.25/XEmacs 19.14
Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:44226 comp.unix.bsd.netbsd.misc:3950 comp.unix.bsd.freebsd.misc:22952

>>>>> "John" == John S Dyson <toor@dyson.iquest.net> writes:

    John> Linus Torvalds wrote:
    >>  In article <31D2F0C6.167EB0E7@inuxs.att.com>, John S. Dyson
    >> > All this TCP latency discussion is interesting, but how does this
    >> > significantly impact performance when streaming data through the
    >> > connection?  Isn't TCP a streaming protocol?
    >> 
    >> 
    >> No. TCP is a _stream_ protocol, but that doesn't mean that it
    >> is necessarily a _streamING_ protocol.
    >> 
    John> Okay, you CAN kind-of misuse it by using TCP for a single
    John> transaction, like simple HTTP transactions.  That is the

I don't think HTPP that was the issue here.
Using TCP you have two big different classes of applications: bulk
data transfers and what is traditionally called interactive traffic
(small packets; delay sensitive; sometimes you can/should agregate
several writes in a datagram that goes out on the stream)

If you want to messure bulk data performance you use something like
bw_tcp (big buffer writes down the pipe), if you want to messure *one*
of the factors that influences TCP performance for this so called
"interactive" traffic one choice that seams resonable is to test
latency (usually defined as how long does it take to push a byte back
and forward on a socket n times).

    John> reason for the implementation of the so far little used
    John> protocol extension TTCP.  (FreeBSD has it for example.)

Well since your the one that started talking in HTTP... T/TCP is not
IMHO a good solution for that either. The greatest problem with HTTP
is the fact that there is no multiplexing of requests on a connection
in most current implementations and that well-known browsers do
"smart" things like opening 5 simultaneous connections.
This is the perfect example of what you shouldn't do over a large
scale network ... actually my wild guess is that since HTTP ammounts
to around 80% of backbone traffic we should be more or less in the
same situation that we where before TCP had Van's congestion control
and avoidance in terms of congestion.

    John> Also, there are advanced features in www browsers/servers
    John> like Netscape where the connection is kept up for more than
    John> one transaction.  (Why be silly to re-establish a
    John> connection, when you could have kept the previous one up?)

right.
 
    >> But many applications don't really care about bandwith past a
    >> certain point (they might need only a few kB/s), but latency
    >> can be supremely important. The TCP connection might be used
    >> for some kind of interactive protocol, where you send lots of
    >> small request/reply packets back and forth (*).
    >> 
    John> With many/most web pages being 1-2K, the transfer rate
    John> starts to overcome the latency, doesn't it?  For very small

that is not the case for X Window for instance.

    John> transactions, maybe 100 bytes the latency is very very
    John> important.  How many web pages are that small???

    John> Now I can understand that there might be specific
    John> applications where there are only a few hundred bytes
    John> transferred, but those appear to be in the

The number of bytes used in the connection is not important. What
Linus tries to messure is how fast we can put a packet on the other
side, when the packet is small. This is relevant for a lot of
applications.

    John> minority. (Especially where it is bad that a latency of
    John> 100usecs worse is bad in a SINGLE THREADED environment.)
    John> Note -- in most single threaded environments, 100usecs is in
    John> the noise.
 
    >> (*) Now somebody is bound to bring up UDP, but no, UDP is _not_
    >> a good protocol for many of these things. If you need good
    >> performance over wildly different network connections, and
    >> require in-order and reliable connections, UDP sucks. You'd end
    >> up doing all the work TCP does in user space instead, and TCP
    >> would be a lot more efficient. UDP is fine for _certain_
    >> applications, but if you think you should always use UDP for
    >> request/reply, you're wrong.
    >> 

    John> There are a few applications that need very low latency, but
    John> remember, latency != CPU usage also.  You might have a
    John> 100usec additional latency, but that might be buried by
    John> another concurrent connection...  As long as the latency
    John> doesn't tie up the CPU, and you have many multiple streams,
    John> it isn't very important, is it?  (Unless you have realtime
    John> requirements in the region of 100usecs.)  I guess it is
    John> possible that the application have a 100usec realtime
    John> requirement, isn't it?  :-).

Well. Tests are made without any other load on the system... any Unix I
know of tries to put a write immediatly on the network if it hasn't
anything else to do. So what is really messured here is how much CPU
cycles does it cost to put a packet on the net. The same goes for
receiving of course...

I do believe that in this particular case latency = CPU usage.

    >>  Wrong. TCP latency is very important indeed. If you think
    >> otherwise, you're probably using TCP just for ftp.
    >> 
    John> I guess FreeBSD-current makes it up by being faster with the
    John> fork/execs done by simple www servers. (About 1.1msecs on a
    John> properly configured P5-166.)

:-) Ho. you took the chance to "sell your product" also... :-)
Yes. FreeBSD should be very suitable as a WWW server but I don't think
that affects our ongoing discussion about the relevance of TCP latency.

    >>  Think Quality (latency) vs Quantity (throughput).  Both are
    >> important, and depending on what you need, you may want to
    >> prioritize one or the other (and you obviously want both, but
    >> that can sometimes be prohibitively expensive).
    >> 
    John> The quality vs. quantity is interesting, since I consider
    John> for certain applications, slower transfer rates
    John> *significantly* impact quality.  The ability to handle many
    John> many concurrent connections seriously impacts quality. Some
    John> very low quality trivial algorithms might work well in the
    John> single threaded trivial cases such as lmbench.

I'm lost in terms of where you are trying to lead the discussion
into... 
1) scalability of BSD TCP and Linux TCP
2) relevance of the tests compared to the under-high-load behaviour
3) overall system scalability

3) is really completly outside of my domain so i'll stick with 1) and 2).

- scalability of TCP

hard to tell :-)

I'll better just point the differences and let you make an opinion
about it. The big big difference between the 2 is the way the handle
timers: BSD with fast and slow timeout and linux with per socket
timers with precise values. You can argue that those 200ms/500ms are
cheaper when you have a loaded machine... however those functions have
to look though all the sockets and have an O(n) complexity. On Linux,
on the other hand you have an O(n) complexity in the add_timer
function which is called for every send and receive. True, the cost of
Linux timers is greater but they are always more precise than BSDs
timers. Since i religiously dislike the BSD way of doing TCP timers
;-) let me add that those timer values will probably be a bit more
broken under high load :-)

- relevance of high-load to the latency test

really not much other than the fact that we're having an IMHO good
aproximation on the CPU cycles needed to put a (small-sized) packet on
the network. Note also that under linux the cost of sending a
small-sized packet only differs from the cost of sending a full-sized
packet in terms of memory alloc and copy functions while for BSD this
doesn't hold since you start messing with mbuf clusters (which people
on the Linux camp usually find very inelegant ;-))

I just remembered that BSD uses a pre allocated mbuf pool while Linux
allocates and skb for every send... i don't feel confortable enhough
to comment on whatever this can affect performance... i'm sure you are
although ;-)

    John> I guess what I am saying is that the results would look more
    John> credible with a real load, where the networking code would
    John> be exercised more.

Well. My belief is that neither Linux neither FreeBSD have thier TCP
specially designed for high-load (read 1000+ TCP connections). I say
this because when you enter that zone it starts to make sense having
special features to deal with a great number of connections in
TIME-WAIT and SYN-RECV (due to unroutable syn,acks). Also there might
be alternative solutions to the timer handling problem.

    John> One last comment -- if you notice that FreeBSD isn't that
    John> much slower in the localhost case, it appears that the

relax. :-) 4.4 BSD networking is still considered to be the leading stuff.
But i think (in a very biased and personal opinion) that Linux
networking already beats 4.3 based implementations that are still used
in lots of comercial Unixes. (humm... the value of this statement is
really next to 0 since comparing a whole networking implementation
is kind of silly... take it with a few grains of salt)

regards,
  Pedro.