Return to BSD News archive
Newsgroups: comp.os.386bsd.bugs Path: sserve!newshost.anu.edu.au!munnari.oz.au!metro!ipso!runxtsa!bde From: bde@runx.oz.au (Bruce Evans) Subject: Re: Floating exceptions? Message-ID: <1993Mar24.071239.17071@runx.oz.au> Organization: RUNX Un*x Timeshare. Sydney, Australia. References: <f0XUP76@quack.kfu.com> Date: Wed, 24 Mar 93 07:12:39 GMT Lines: 193 In article <f0XUP76@quack.kfu.com> mrapple@quack.kfu.com (Nick Sayer) writes: >0.2.2 running on a 486-50 with 16M RAM. I compiled xv_calctool >(patchlevel 12). If I try and find ln(10000)/ln(10), it crashes with >a floating point exception. The same code on a sun does not. I think Sun (SPARC) systems use the IEEE default of all floating point exceptions masked. i386 Un*xes traditionally unmask the worst of the exceptions, because too many programs don't bother to check for them. 386BSD does the same as the other i386 Unix*es here. Someday I want to use the IEEE default. Maybe it's sufficient to check for exceptions in exit(). But first the libraries have to be improved. >If I try and get a stack-trace on the resulting core, it >crashes in routines that haven't the slightest thing to do with >floating point. > >Has anyone seen this behavior before? Might there be some delay between >the occurance of the problem and the exception or something? There are several delays. First, the i387 delays reporting an error until the next FPU instruction (an ISA h/w bug sometimes causes it to report an error immmediately). Second, a 386BSD kernel bug sometimes delays the delivery of the signal about an error. Third, looking at things with a debugger tends to cause errors to be reported too early. This is the README from my npx-0.4.tar.Z package where some of the bugs are fixed. npx-0.4 only works on 486's. --- There are many bugs in floating point error handling in 386BSD-0.1. Here are my fixes for most of them. I have tested them on a 486DX and (in a slightly different form) on a 386 (no 387). How well the fixes work depends on the system: 486DX: Floating point error handling now uses exception 16 instead of IRQ13 to report errors (the method is reported at boot time). Exception 16 is designed correctly so it is possible for the kernel to get everything right. 486SX: ? 386/387: IRQ13's at inappropriate times are now detoxified. FP errors are still sometimes reported early at unpredictable times (after the kernel preempts the process) and at predictable times (after the usual program executes certain unusual FP instructions, and when it gives up control to a debugger). 386/287: ? All h/w: Context switching and exit() now never clobber the FP context. SIGFPE's are now delivered as soon as possible. Emuluator: Still lacks error handling. It now needs to handle fwait but doesn't. ----- Files ----- README: o This file. fpetest.c: o Test program. Run as "fpetest -z" to see the options. To stress the system, run several copies concurrently. This will crash 386BSD-0.1 eventually. To demonstrate the exit() bug in 386BSD-0.1, run the program "double x; main() { x = x + 1; }" in a shell loop concurrently. This might crash the system. After applying the patches, run the tests overnight. This should not crash the system. npx.diff: o Patches. Apply using "cd /; patch -p <somewhere/npx.diff" or by editing out the pieces that apply to the individual directories (/sys/i386/i386, /sys/i386/include and /sys/i386/isa) and working in each directory separately. o All patches are relative to the 386BSD-0.1 distribution except the one for machdep.c. The patch for machdep.c is small and unimportant and should work anyway. npx.c: o Complete replacement for /sys/i386/isa/npx.c. Since the asm is now written correctly, it should work with gcc-2. test.486.ex16: o Output from running "fpetest" on a 486DX using exception 16 error reporting (after these patches have been applied). Bit 0x0008 (CR0_TS) in the machine status word may vary. test.486.irq13: o Output from running "fpetest" on a 486DX using IRQ13 error reporting (after these patches have been applied and the exception 16 initialization in npx.c has been deleted). ------- Changes ------- /sys/i386/conf/Makefile.i386: o It now depends on machine/specialreg.h and it always depended on $S/net/netisr.h. The patch is not included here. (mkdep needs to be fixed to handle asm files and to handle the -p option propery for genassym.c.) /sys/i386/i386/locore.s: o Avoid any bogus IRQ13 from fnsave. o Update FP flags in pcb to reflect the fact that fnsave clobbers the state. I don't know how 0.1 worked without this. In 0.0, context switches sometimes clobbered the state. o Clear npxproc when it becomes invalid. This is required at least for the new checks in npx.c. o Fully upport the FPU exception (#16). Have to handle it like IRQ13 except for IRQ stuff. o Finish traps and syscalls with doreti() instead of spl0() to handle AST's. This is required to handle asynchronous signals ASAP when they occur in kernel mode. Even (bogus) IRQ13's can occur while in kernel mode. npxdna() allows them because it is too much trouble to stop them, and they can be nested in the trace trap handler. spl0() cannot do enough because the stack frame is inconvenient. /usr/src/sys/i386/i386/machdep.c: o CR0_TS is now used for emulation, not CR0_EM. Actually, the changed line should be deleted. If we have NPX, then npxinit() will do the work. We may as well have NPX if we have math emulation since the h/w support is small compared with the emulator. /sys/i386/i386/vm_machdep.c: o _Completely_ free the coprocessor when we are done with it. Without the fix, another process may inherit the exiting process's FP state, and npxintr may use a NULL pointer (npxproc). /sys/i386/include/npx.h: o The "standard" npx control words are all braindamaged. /sys/i386/include/specialreg.h: o Fix comments (CR0_EM isn't for npx emulation!). o Add some defines for 486 (npx uses only CR0_NE). /sys/i386/isa/icu.s: o Support aston() by checking astpending in doreti(). The changes to locore.s cause doreti() to be called early enough for signals to be delivered ASAP. The change to icu.s alone is sufficient to fix itimers (they used to have about 10 Hz precision instead of 100 Hz). /sys/i386/isa/isa.c: o Utility routine for probing isa interrupts. /sys/i386/isa/npx.c: o Cleaned up inline asm. o Probe for exception 16 working and IRQ13 not working. Use exception 16 if possible. It should always work on 486DX's. I doubt it will work on 386's (ISA probably requires it to be broken). I don't know what happens on 486SX's. o Set CR0_EM and toggle CR0_TS for emulation. CR0_EM is no good for emulating an x87 because it doesn't trap fwait's. The emulator needs to be fixed to handle fwait's. Now it botches even the decoding of them. o Fixed order of initialization. o Fixed npxintr() and npxdna() to handle nested interrupts. --- Etc --- The library has a lot more foating point bugs. I have fixed the following. The fixes are not included here. /usr/src/include/math.h: o Stop gcc from crashing when it tries to compile HUGE_VAL. The crash is due to bugs in the library atof and in the kernel's floating point error handling. /usr/src/lib/libc/i386/gen/fixdfsi.s: o Fix to round towards 0 as specified by ANSI. /usr/src/lib/libm/common_source/pow.c: o Avoid overflow bug (the patch was botched for 0.1). o Use volatile variable to stop gcc from optimizing away calculations that are being made for their side effects on the FPU exception flags. (This stuff is broken in other ways but...) /sys/i386/include/float.h: o DBL_MAX was too large and might overflow (actually it doesn't). I have not fixed the following library bugs. /usr/src/lib/libc/i386/gen/fixunsdfsi.s: o Same bug as for fixdfsi. Someone posted fixed versions of both. /usr/src/lib/libc/i386/stdlib/atof.c: o atof() is inaccurate and allows overflow exceptions. /usr/src/lib/libc/stdio/vfprintf.c: o Inaccurate. /usr/src/lib/libc/stdio/vfscanf.c: o scanf uses atof() so it's broken too. /usr/src/lib/libm/*. o STDC functions aren't STDC conformant (they allow exceptions, at least with the current FP control word, and don't set errno). /usr/libexec/cc1: o gcc uses atof() so it's broken too. binaries: Damaged FP constants may have been compiled into a lot of programs. --- -- Bruce Evans bde@runx.oz.au