Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!nntp.coast.net!howland.reston.ans.net!news.sprintlink.net!news-stk-200.sprintlink.net!news.sprintlink.net!news-stk-11.sprintlink.net!news.sprintlink.net!new-news.sprintlink.net!server1.nw.ixe.net!server1.adam.ixe.net!wirehub!Leiden.NL.net!sun4nl!surfnet.nl!swsbe6.switch.ch!scsing.switch.ch!news.rccn.net!master.di.fc.ul.pt!usenet From: Pedro Roque Marques <roque@di.fc.ul.pt> Newsgroups: comp.os.linux.networking,comp.unix.bsd.netbsd.misc,comp.unix.bsd.freebsd.misc Subject: Re: TCP latency Date: 10 Jul 1996 23:39:59 +0100 Organization: Faculdade de Ciencias da Universidade de Lisboa Lines: 133 Sender: roque@oberon.di.fc.ul.pt Message-ID: <x74tnfn35s.fsf@oberon.di.fc.ul.pt> References: <4paedl$4bm@engnews2.Eng.Sun.COM> <31D2F0C6.167EB0E7@inuxs.att.com> <4rfkje$am5@linux.cs.Helsinki.FI> <31DC8EBA.41C67EA6@dyson.iquest.net> <4rlf6i$c5f@linux.cs.Helsinki.FI> <31DEA3A3.41C67EA6@dyson.iquest.net> <Du681x.2Gy@kroete2.freinet.de> <31DFEB02.41C67EA6@dyson.iquest.net> <4rpdtn$30b@symiserver2.symantec.com> <x7ohlq78wt.fsf@oberon.di.fc.ul.pt> <Pine.LNX.3.91.960709020017.19115I-100000@reflections.mindspring.com> NNTP-Posting-Host: oberon.di.fc.ul.pt Mime-Version: 1.0 (generated by tm-edit 7.69) Content-Type: text/plain; charset=US-ASCII X-Newsreader: Gnus v5.2.25/XEmacs 19.14 Xref: euryale.cc.adfa.oz.au comp.os.linux.networking:44726 comp.unix.bsd.netbsd.misc:3991 comp.unix.bsd.freebsd.misc:23248 >>>>> "Todd" == Todd Graham Lewis <tlewis@mindspring.com> writes: Todd> On 8 Jul 1996, Pedro Roque Marques wrote: >> >>>>> "tedm" == tedm <tedm@agora.rdrop.com> writes: >> tedm> I feel this has gotten so academic that it is meaningless. >> I for one i'm so tired of seing non-techical arguments on >> Usenet about supposedly techincal issues that i don't ever >> consider a discussion to get "too academic". Todd> Then let's get to it. A steak dinner at the next NANOG or Todd> LISA, my treat, to anyone who makes a substantial Todd> contribution to this thread. There are two questions in Todd> this message; answer them if you have the time. With such a prize i figure you'll get a lot of answers ;-) >> Having good thoughput and/or latency in TCP is much harder than >> most people believe. Todd> Question #1: Which aspects of network performance under Todd> FreeBSD and/or Linux are most in need of improvement? Extra Todd> credit for well-reasoned answers. If latency is Todd> unimportant, which I don't think anyone is seriously Todd> asserting, then what else is important, to which everyone Todd> should have an answer. I can only answer in terms of Linux. I never looked to the FreeBSD code in particular although i believe it is much based in BSD Net/3. I think we should split our view of the TCP/IP stack in 3 areas: TCP, routing table lookup/maintainance and UDP performance. - TCP I believe Linux TCP can be improved in all fronts :-). It's also important here to distinguish between raw performance in loopback or fast networks, behaviour over delayed/congestioned/lossy pipes and scalability to large servers. Performance is usually achieved taking the less number of CPU cycles to send a packet so i shouldn't be conflictuous with the other two goals. The other two raise a very tricky question which is: how to do timers ? Like i mentioned in a previous post BSD style timers tend to be cheaper but they are less correct than what one would normally desire. One of the finest remarks i've heard on this issue was "if most of the times the retransmit timer won't expire, why set it in the first place ?" the tought part is that when the timer expires with want it to expire with the greatest precision you can achieve. But back to Linux improvements, on performance i think we can improve things a lot by having a simpler way of handling send and receive queues and by doing header prediction (this two can account to a 30% improvement of localhost thoughput on a 486/66). Another thing comming up is the change in the way Linux ix86 handles user and kernel segments, rumor has it that upto one CPU cycle per memory access when copying from user/kernel can be saved which will reflect on the general performance of the network stack. For instance Linus posted a profile strace of bw_tcp on localhost sometime ago when memcpy_to/from user space shows as using around 90% of the CPU. Also we should fix some details like only sending an ack after checking the transmit queue (this is actually required by the specs) and geting a final agreement on how we should do delay acks (this detail shows up very nicely on lat_tcp test). Instead of doing like BSD that delays acks some random time between 0 and 200 ms, Linux tries to predict the apropriate ack delay time according to RFC 813 (like suggested in RFC 1122), which is quite tricky to get right. When we have this done, if we are still unhappy with raw performance we can adopt Van Jacobson's way of receiving TCP segments: a network interrupt delivers the packet directly to the socket receive queue, the process wakes, goes through IP and TCP tests and copies the buffer to it's address space while checksuming. You can hardly do it faster... trouble is that it breaks layering. Linux network stack has a very simple and IMHO elegant layering that would be a shame to break but... As for TCP behaviour under delayed/congestioned paths ... we've fixed rto estimative already ;-) (although only lately). I think that assuring code correcteness, i.e. finding bugs should be the greatest problem in the short term. One of the things i've been experimenting with is TCP Vegas congestion avoidance but i don't have any results about if it actually behaves better than Van Jacobson's original algorithm. Then there is a whole set of "little" fixes like fast retransmit with some corrections by Sally Floyd that are quite important (this one is in already). As for big WWW/FTP server support your biggest enemies are the timer setting overhead and sockets in TIME-WAIT and SYN-RECV that occupy memory that could be free. - As for the routing table maintainance/lookup, i'm not familiar enough with it but i think that in 2.0 Linux has a resonably good/fast code. Rumor is that the big plans are to support part of the routing table in user space (which has the great advantage of being swapable). - UDP performance. I think the major improvemnt will be on the ix86 with the new way of accessing user memory. checksum and copy is a very good thing (TM). the price of csum and copy is close to the price of a memcpy. the rest of the code involved in a send is mainly a routing table lookup (with it's cached link layer address for the nexthop). Todd> Question #2: Does latency optimization have ancillary Todd> benefits in terms of general code robustness or quality? Generally yes. To reduce the latency you must look into reducing the costs involved in sending and receiving a packet. remember in this case the CPU costs are mainly in terms of the TCP processing functions. !Warning: This doesn't mean that the latency figures are a valid way of comparing the quality of two distinct systems!. Todd> As a good neighbor, I ask you to do us a favor: Please don't Todd> storm off in a huff muttering about "Those stupid Linux Todd> guys." If you have input to make on the network Todd> optimization issue(s), then I would love to hear it. Hum... the "ignorant mob" reputation Linux has in some circles. Rumor has it that there are actually some T-shirts with the saying: "Linux: putting power in the wrong hands". I'd like one myself, since i find it a good joke on both the criticised and the critics. :-) Todd> We're listening, and we're curious. (And we're offering Todd> steak dinners.) do i win a beer at least :-) i've surelly tired my fingers of writing this much rubbish ;-) regards, Pedro.