Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!zombie.ncsc.mil!cs.umd.edu!not-for-mail From: torek@elf.bsdi.com (Chris Torek) Newsgroups: comp.unix.bsd.bsdi.misc Subject: Re: Major bug? (probably programmer error :) Date: 4 Sep 1995 18:41:40 -0700 Organization: Berkeley Software Design, Inc. Lines: 79 Message-ID: <42g9sk$9ee@elf.bsdi.com> References: <42ff8m$b3d@clarknet.clark.net> <VIXIE.95Sep4112251@wisdom.vix.com> <42fpad$rie@clarknet.clark.net> Reply-To: torek@bsdi.com NNTP-Posting-Host: elf.bsdi.com In article <VIXIE.95Sep4112251@wisdom.vix.com> vixie@wisdom.vix.com (Paul A Vixie) notes that seeking by ... >>> lseek(fd,-1L*sizeof(b),1); >>> write(fd,&b,sizeof(b)); creates a file with a large `hole' (a sparse file). >>Try this: >> printf("offset: %ld\n", -1L*sizeof(b)); Actually, this should be printed with `%lu' on any current BSD/OS system. In article <42fpad$rie@clarknet.clark.net> Alan Weiner <alweiner@clark.net> asks: >Doesn't -1L*sizeof(b) do an implicit conversion to long? No. In general, -1L*sizeof(b) will typically be either `unsigned long' or `long'. The type of the result of sizeof() is size_t, which is (by definition in the ANSI C standard) an unsigned integral type. For various reasons, it will almost always be either `unsigned int' or `unsigned long'. On platforms in which sizeof(int) < sizeof(long), if the type of `size_t' is `unsigned int', `-1L * sizeof(b)' will convert sizeof(b) from unsigned int to signed long and will result in a negative number. For instance, if sizeof(b) is 6U, on such a machine this will compute -1L * 6U, giving -6L. This might occur on a PC running MS-DOS. On the other hand, on platforms on which sizeof(int) < sizeof(long) but the type of `size_t' is `unsigned long', the multiplication will be done in `unsigned long' and will result in a positive number. If ULONG_MAX is 0xffffffffffffffff and sizeof(b) is 6UL, on this platform, -1L * sizeof(b) will be 0xfffffffffffffff0 or 18446744073709551600 (i.e., ULONG_MAX + 1 - 6). This might occur on, say, a DEC Alpha. On platforms on which sizeof(int) == sizeof(long), the type of size_t can be either `unsigned int' or `unsigned long' without affecting the result. The multiplication will be done in unsigned arithmetic and will again result in ULONG_MAX + 1 - 6 (assuming 6 bytes for `b'). All existing BSD/OS 1.x and 2.x platforms have size_t defined as `unsigned int' and sizeof(int) == sizeof(long). We therefore fall in this lattermost category. Since our ULONG_MAX is 0xffffffff (4294967295), you get values like 4294967290. You might legitimately wonder how this ever worked. The answer is that this bug was dormant -- it could only show up when we supported larger files (and 9GB disks, etc.). In BSD/OS, file sizes and offsets have type `off_t'. In 1.x, `off_t' was simply `long'. This is how the bug was hidden: you would compute -1L * 6U as 4294967290, but then add the current file offset (6) to the given lseek offset (4294967290), resulting in an overflow (4294967296) which was truncated to 0. The second write() call would thus rewrite the desired bytes. In BSD/OS 2.x (2.0 and 2.0.1), `off_t' is a 64-bit integral type. The addition no longer overflows -- 4294967290 + 6 is simply 4294967296. Because the type of `sizeof' is implementation-defined, code that intends to seek backwards over a record it just read should say: lseek(fd, -(off_t)sizeof record, SEEK_CUR); rather than: lseek(fd, (off_t)-sizeof record, SEEK_CUR); Such code will work anywhere the basic idea works, regardless of the relative sizes and types of `off_t' and `size_t'. -- In-Real-Life: Chris Torek, Berkeley Software Design Inc Berkeley, CA Domain: torek@bsdi.com +1 510 549 1145 `... if we wish to count lines of code, we should not regard them as ``lines produced'' but as ``lines spent.'' ' --Edsger Dijkstra