Return to BSD News archive
Xref: sserve comp.unix.ultrix:13375 comp.sys.dec:9194 comp.unix.bsd:4251 comp.bugs.4bsd:1901 comp.sys.next:16919 Path: sserve!manuel!munnari.oz.au!network.ucsd.edu!sdd.hp.com!uakari.primate.wisc.edu!usenet.coe.montana.edu!news.u.washington.edu!milton.u.washington.edu!corey From: corey@milton.u.washington.edu (Corey Satten) Newsgroups: comp.unix.ultrix,comp.sys.dec,comp.unix.bsd,comp.bugs.4bsd,comp.sys.next Subject: bug/fix for sort !!! Keywords: bug fix sort Message-ID: <1992Aug27.202530.28111@u.washington.edu> Date: 27 Aug 92 20:25:30 GMT Sender: news@u.washington.edu (USENET News System) Reply-To: corey@cac.washington.edu Organization: University of Washington, Seattle Lines: 148 As hard as it is to believe, after all these years of use, I found a bug in /usr/bin/sort when used with the -u flag. The bug only shows up for relatively large inputs and causes a few of the last lines of output to be omitted. Below is a script you can feed both to /bin/sh to test your sort and to "patch" to (hopefully - but as always, no guarantees) fix your sort.c. The context diff is for BSD 4.3 sort.c, however the exact same code exists (at different line numbers) in Ultrix 4.1 sort.c so you can still apply the patch verbatim. -------- Corey Satten, corey@cac.washington.edu Networks and Distributed Computing University of Washington #!/bin/sh # # Test for sort -u bug. The bug causes a small number of lines to be # omitted from the sort -u output for relatively large sort inputs. # # It is caused by the merge code not handling EOF properly in the next-to-last # temp-file generated by sort. In order to trigger the bug, enough input # must be presented that sort merges a larger number of tempfiles into a # smaller number and then merges the smaller number. Alternatively, # the uninitialized memory which is compared against the last value # could accidently contain that value. # # This bug is known to be present in: # BSD 4.3 Tahoe # Sequent Dynix V3.1.4 # NeXT version 2.1 # Ultrix version 4.1, 4.2a # AT&T 7300/3b1 version 3.51 # # It appears not to be present in SunOS 4.x, OSF/1 however this may only # be because the size of the input required to trigger the bug has been # raised (the size of the tempfiles and memory arrays has been raised). # # A context diff of what I hope is a fix to the 4.3BSD version is # appended at the end of this script. Use at your own risk. # # Corey Satten, corey@cac.washington.edu, July 31, 1992 TMP=/tmp/data$$ trap "rm -f $TMP.c $TMP; exit 0" 0 1 2 13 15 COUNT=${1-45} # value supplied as $1, else default to 45 lines=`echo "$COUNT^3 * 1.015625 / 10 * 10" | bc` echo testing sort with input of roughly $lines lines 1>&2 cat <<EOF >$TMP.c #include <stdio.h> #define r(x) ("qwertyuiopasdfghjklzxcvbnm"[pos(x)%26]) #define pos(x) ((x)>0?(x):5-(x)) main() { int lines = 0; int i,j,k; for (i=0; i<$COUNT; ++i) { for (j=0; j<$COUNT; ++j) { for (k=0; k<$COUNT; ++k) { if ( (++lines & 63) == 0) printf("zzzzzzzzzzz - OK\n"); printf("%c%c%c%c%c%c\n", r(k), r(j+k), r(j-k), r(i+k), r(i-k), r(i+j+k)); } } } } EOF cc $TMP.c -o $TMP || { rm -f $TMP.o echo echo " Sorry, your C compiler didn't compile my program." 1>&2 exit 1 } if $TMP | sort -u | grep -s '^zzzzzzzzzzz' ; then echo echo " sort -u worked for this input (COUNT=$COUNT) but to be safe, you" echo ' might wish to try again with a larger value of COUNT. You may' echo ' supply a value for COUNT as an argument to this script.' echo ' The number of lines is roughly COUNT to the 3rd power.' exit 0 else echo echo ' sort -u failed on this system. The line beginning with zzzzzzzzzz' echo ' was omitted from the sort -u output. You might wish to compare' echo ' sort -u output with output from sort|uniq to see if there are any' echo ' other differences.' exit 1 fi # I believe this change applied to the 4.3BSD code fixes the problem. # Use at your own risk. -Corey # *** sort.c Fri Jul 31 08:35:33 1992 --- new_sort.c Fri Jul 31 14:33:20 1992 *************** *** 387,392 **** --- 387,393 ---- int j; int k, l; int muflg; + int last_muflg; p = (struct merg *)lspace; j = 0; *************** *** 432,438 **** term(); } } ! if(muflg){ cp = ibuf[i-1]->l; dp = p->l; do { --- 433,439 ---- term(); } } ! if(last_muflg = muflg){ /* yes assignment. Corey */ cp = ibuf[i-1]->l; dp = p->l; do { *************** *** 452,458 **** *ip = *(ip-1); *(ip-1) = jp; } ! if(!muflg) break; j = (*compare)(ibuf[i-1]->l,p->l); if(cflg) { --- 453,466 ---- *ip = *(ip-1); *(ip-1) = jp; } ! /* ! * Instead of testing muflg, I introduce last_muflg ! * because when i-- reaches 1 and the value of muflg ! * changes, the code suddenly compares against p->l ! * which hasn't had a recent value copied into it. ! * Corey. ! */ ! if(!last_muflg) break; j = (*compare)(ibuf[i-1]->l,p->l); if(cflg) {