Return to BSD News archive
Newsgroups: comp.unix.bsd Path: sserve!manuel!munnari.oz.au!spool.mu.edu!wupost!udel!sbcs.sunysb.edu!sbcs!stark From: stark@cs.sunysb.edu (Gene Stark) Subject: Program dies with FP Exception Message-ID: <STARK.92Sep13002650@sbstark.cs.sunysb.edu> Sender: usenet@sbcs.sunysb.edu (Usenet poster) Nntp-Posting-Host: sbstark Organization: SUNY at Stony Brook Computer Science Dept. Date: Sun, 13 Sep 1992 05:26:50 GMT Lines: 68 Here's a tough one I've been trying to track down -- maybe somebody out there who knows more can guess what is going on. I am running 386BSD on a 486/33 system with 4MB RAM and a 210MB Connor IDE drive. A program I was working on dies on Signal 8 (Floating point exception) in a perfectly repeatable fashion. It is not so easy to tell where the exception actually comes from, though, because the signal seems to be getting delivered to the process much later, when it is leaving the system after a call to "write". I haven't been able to get a small test program that repeats the bug, however there seem to be several crucial elements involved: (1) A call to "atof", which returns a double that is then stored in a temporary on the stack. Removing the call removes the error. (2) The actual magnitude of the number being converted by "atof". I found that the string "1e10" and "1e12" cause the error, but "1e9", "1e6", and "0.0" do not. (3) Some later "write" system calls. The signal is actually delivered on the fourth call to write after the atof. What is happening in the interim is just C code without any other system calls. I do not know what causes the signal to get delivered when it actually does. After a lot of debugging, I boiled the problem down to this section of source code: lp->token.value.flot = atof("1e10"); This compiles (no optimization) to the following: pushl $LC10 call _atof addl $-8,%esp fstpl (%esp) # This instruction seems to be the culprit popl %eax popl %edx movl 8(%ebp),%ecx movl %eax,20(%ecx) movl %edx,24(%ecx) Removing the "fstpl" instruction removes the error. Placing the code: pushl $1 call _sleep immediately after the "fstpl" instruction also removes the error. Taking the code out of context and putting it in a small test program does not produce the error, so presumably there is some interaction with the virtual memory state. I also tried putting the instruction movl $0,(%esp) just after the fstpl instruction, on the theory that maybe the fstpl was causing a page fault with bad consequences, but this did not eliminate the error. So, are these enough clues that somebody who knows more than I do can guess what the problem might be? Any help appreciated. - Gene Stark