Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.vbc.net!samba.rahul.net!rahul.net!a2i!olivea!venus.sun.com!news2me.EBay.Sun.COM!engnews2.Eng.Sun.COM!usenet From: mukesh@lucknow.eng.sun.com (Mukesh Kacker) Newsgroups: comp.unix.bsd.misc,comp.unix.sys5.r4 Subject: Re: TCP socket close() behavior (Was: Imcomplete Documents and Images from httpd) Followup-To: comp.unix.bsd.misc,comp.unix.sys5.r4 Date: 22 May 1996 16:20:52 GMT Organization: Sun Microsystems Inc., Mountain View, CA Lines: 88 Message-ID: <MUKESH.96May22092052@lucknow.eng.sun.com> References: <4m7r3m$9qt@nntpb.cb.att.com> <31979d8d@yoda.omnicron.com> <319900f5.21380062@news.meganet.nl> <319a205f@yoda.omnicron.com> <319b6555@yoda.omnicron.com> <4nkuav$q54@noao.edu> <31a1f624@yoda.omnicron.com> NNTP-Posting-Host: lucknow.eng.sun.com In-reply-to: ford@omnicron.com's message of Tue, 21 May 1996 16:58:12 GMT Xref: euryale.cc.adfa.oz.au comp.unix.bsd.misc:1089 comp.unix.sys5.r4:11005 In article <31a1f624@yoda.omnicron.com> ford@omnicron.com (Mike "Ford" Ditto) writes: > I submit for discussion the question of whether the kernel must attempt > to deliver pending sent data when a close() is performed on a connected > TCP socket without the SO_LINGER option enabled. > > I and a few other people wrote about a problem with several httpd > packages on various SVR4 systems. I tracked the problem down to what I > declared to be a bug in the httpd software. Both NCSA httpd 1.5.1 and > apache 1.0.3 have this "bug". > There might be bugs in httpd software but not using SO_LINGER is not one of them. You should not rely on SO_LINGER to provide iron-clad guarantee delivery of data. SOme guarantees need application involvement. The cause for truncated transfers might be a more basic wrong assumption make in their designs of network proagrams which is not doing an application level handshake in a two way data transfer making sure that the data got there. Some of this is speculation ofcourse but this might what be the problem. If you use tcpdump (or snoop on Sun platforms) or a protocol analyser such as Sniffer, you might see an exchange like: local remote ----- ------ close() -----data in transit----> (lot of buffered data) <----incoming data------ --------TCP RST---------> (lot of buffered data flushed and probably buggy program does not report error or reports only to a log) Onw way to fix this in applications is to use shutdown(fd, 1) to send FIN and wait for EOF (0 on a read()) and not use close(). What failed above is that the protocol allowed the possiblilty of incoming data when one end had closed. Read "TCP/IP Illustrated Vol 1" by Richard Stevens to understand this scenario. It is amazing how many buggy applications are there which do not work in the above scenario. There could be http servers-client pairs which fail in such scenario though I am speculating. Even a close() call may return an error if the problem is known but many applicatiosn do not check the return value on a close(). Some basic TCP/IP trivia in detail now....which you might already know. When you make a call to a write()/send() in the Socket interface, you have not delivered data typically to the local OS kernel (TCP/IP stacks buffers) to attempt deliver to the remote end. How fast the data really makes it there depends on the speed of the link and how fast the application removes it from the remote machines buffers and whether thing get throttled with flow control or not. TCP guarantees of reliable delivery are that it will try its best to get the data there or it will let you know (using error such as EPIPE) that it could not deliver the data. Even get to the other end only means to the other ends OS and not the application unless you build a handshake in your design. One case where it tries its best but cannot inform one end that something has failed is when there is when fd was close()'d so there is no way to inform the application at that end. Most TCP implementations do try their best to deliver data after close() but there is a worst case abort time after which they give up. IF you had a fast sender and slow receiver in the above scenario, and the remote end blocked for 3 hours (such as blocked on a write() to a file which is on a NFS server which went down), you can be sure that the data will not make it. I think most TCP implementations try for approx 10-15 minutes after close(). Such latent bugs only show up in scenarios where such a delay occurs. I saw some comments that STREAMS based stacks do not try to deliver after close() which is not true in general. [ If there is such a TCP/IP stack it is a poor implementation and I am certainly not familiar with all implementations. ]. In Solaris 2.X which is a STREAMS based stack, even after the close(), the detached Stream hangs around trying to deliver data. though the link to application - the file descriptor - is gone. So this may not be a SO_LINGER issue. You should investigate what is going on the wire and draw conclusions based on that. SO_LINGER use may delay the above scenario (will have to look more to see if even that is correct !) but there will always be a worst case which can only be guaranteed by fixing the application level on two way data transfer such as above. -Mukesh Kacker Internet Engineering Sun MIcrosystems Inc (SunSoft) mukesh.kacker@eng.sun.com --