Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!nntp.coast.net!swidir.switch.ch!swsbe6.switch.ch!news.belnet.be!news.rediris.es!acebo.sdi.uam.es!b12mc6.cnb.uam.es!user From: jrvalverde@samba.cnb.uam.es (jr) Newsgroups: comp.unix.bsd.freebsd.misc Subject: Re: Signal 11 Date: Mon, 27 May 1996 19:02:45 +0100 Organization: Centro Nacional de Biotecnologia Lines: 87 Message-ID: <jrvalverde-2705961902450001@b12mc6.cnb.uam.es> References: <nD356D43A@longacre.demon.co.uk> NNTP-Posting-Host: b12mc6.cnb.uam.es X-Newsreader: Value-Added NewsWatcher 2.0b24.0+ In article <nD356D43A@longacre.demon.co.uk>, searle@longacre.demon.co.uk (Michael Searle) wrote: > Does processes exiting on signal 11 always mean bad hardware (probably > memory or mainboard), or can they be caused by other things (like buggy > executables)? I have had them occasionally, but mostly on new software I > hadn't tried before. I have never had gcc failing (and I have done several ... And many people answered... Lemme try too. I have had the same problem with my brand-new Pentium-133 with 32 MB EDO-RAM and enough swap space. I monitored memory consumption and almost never reached swap before the signal 11. I could reproduce the behavior under Windows3.1 (with lots of difficulty, but that's a crappy system), FreeBSD and Linux. I guess the problem with win-3.1 was I could hardly push hardware as much as with the other OSes. So it failed less. Also I hardly use Win3.1 at all. The problem could be solved by retrying the command. And when it refused, by cleaning up memory (to eliminate the ghost(+) of the last run to be reused) which you can do with a 'dd' copy from HD to memory or -as I also tried- with an 'ad-hoc' program. (+) Unix retains the pages of the last programs run in case you re-run them again, so it already has them in-core and doesn't need to re-read from disk, thus given a faster response, that's what I call the ghost or old carcasse (my term). If in spite of that I kept running the system, sooner or later the filesystem or the kernel would go bersek. With the FS it would mean i-nodes that weren't correct to the kernel's eyes. With the kernel it would mean I couldn't run any more programs, or I got a memory error with appropriate data about the virtual page that was corrupt. All this was similar under Linux and FreeBSD, running different versions of GCC, and also running different programs. Overload would delay crashes. I assume that by removing ghosts and making it more defficult to find already loaded programs. It was more frequent when compiling the kernel, and while I had the CD-ROM running. It was less frequent on february (colder wheather here) than on april (milder temps). Suggestions: - Bad RAM - Bad HD - Bad Cache - Bad bus/motherboard - Bad CPU - Bad cooler -> overheating - Interference from other devices - All or any combination of the above - Bad swap partition - erroneous transfers disk <-> memory - Others. I have taken my machine for fix (it's still under guarantee) but seriously doubt they'll find anything wrong since I'm sure they'll only try to test it under DOS/Windows which doesn't allow to take as much advantage of the system. (Yep! I just called them and they confirmed this: they haven't found anything with their off-the-shelf test programs for Windows, they'll try a script I left for UNIX next). You should also have a look at this URL. It will tell you a lot (I discovered it too late): http://www.bitwizard.nl/sig11/ In short, it is most probably a hardware problem due to current faster CPUs pushing the limits of borderline hardware. It can be difficult to detect, but the URL gives some help. The most difficult part will probably be to prove the problem to your provider, since it will be difficult to find a Windows program stretching the hardware as strongly as UNIX allows GCC to, and they surely will only speak DOS/Windows... It's even worst here where they don't even know English and I suspect they won't understand the text of the URL (sigh). What I have given them is a script that makes clean and recompiles the kernel repeatedly, checking output from one make to the next to see if there are differences (they can only come from errors). In my case 10 compiles would give 3-4 fails, but you might need more. jr