Return to BSD News archive
Path: sserve!euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!zombie.ncsc.mil!news.mathworks.com!uunet!in2.uu.net!news1.digital.com!pa.dec.com!nntpd.lkg.dec.com!usenet From: Jon Jenkins <jenkinsj@ozy.dec.com> Newsgroups: comp.unix.bsd.freebsd.misc Subject: Intel Premiere EIDE bug and FreeBSD Date: 9 Aug 1995 09:03:22 GMT Organization: Digital Equipment Corporation Lines: 258 Message-ID: <409tkq$89t@nntpd.lkg.dec.com> NNTP-Posting-Host: ozyd13-p3.ozy.dec.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="-------------------------------12807794882140328675151403282" X-Mailer: Mozilla 1.1N (X11; I; BSD/386 uname failed) X-URL: file:/root/pci1.txt This is a multi-part message in MIME format. ---------------------------------12807794882140328675151403282 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Below I inlude a post from the comp.sys.intel group about problems with Pentium motherboards with the RZ-1000 PCI controller chip. Does FreeBSD use this prefetch method for fast EIDE transfers ? -- ---------------------------------------------------------------------- Name: Dr Jon Jenkins Location: Digital Equipment Corporation NaC Voice/Fax: 61-7-55-75-0151/100 Burnett Place, Research Park, Inet: jenkinsj@ozy.dec.com Bond University, Gold Coast Close Proximity: "HEY YOU !!!" QLD, AUSTRALIA 4229 "Daddy, what's outside the Universe?" (My 5 year old.....) ----------------------------------------------------------------------- ---------------------------------12807794882140328675151403282 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain From: "Kevin T. Van Maren" <vanmaren@fast.cs.utah.edu> Newsgroups: comp.os.os2.bugs,alt.sys.pc-clone.dell,alt.sys.pc-clone.gateway2000,comp.sys.intel,comp.os.linux.hardware,comp.sys.ibm.p= c.hardware.storage,comp.sys.ibm.pc.hardware.chips,comp.sys.ibm.pc.hardware.systems,comp.sys.ibm.pc.hardware.misc Subject: Re: Latest scoop on the [Intel Premiere] "DMA" bug. It is NOT DMA! Date: 8 Aug 1995 17:57:50 GMT Organization: University of Utah Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Message-ID: <4088iu$k9n@magus.cs.utah.edu> References: <3vms83$bdm@news2.delphi.com> <3vrd19$lgj@apollo0.Stanford.EDU> <3vtnqj$c26@clarknet.clark.net> <sharma.16.762.3022A165@= postbox.acs.ohio-state.edu> Have you called Intel at 1-800-628-8686? Intel tech support stonewalled me, denying there was a problem at all--1000's of people HAVEN'T called. Lets get people calling Intel complaining so that they have to *admit* there is a problem, and not just tell you to upgrade you BIOS to ver 13. Kevin T. Van Maren vanmaren@fast.cs.utah.edu Here is Roedy's summary: This 4 page essay tells everything I have been able to find out about the misnamed "DMA bug" about the corruption progams in EIDE disk i/o on PCI motherboards.: SERIOUS PCI MOTHERBOARD FLAW There is an extremely serious flaw affecting about 1/3 of all PCI motherboards. Any motherboard containing the PC-Tech RZ-1000 PCI EIDE controller chip is affected. This includes motherboards from AT&T, Dell, Gateway, IBM and Intel. Since Intel makes so many of the motherboards sold under other brand names, many machines are affected, both 486 and Pentium PCI. This flaw only shows up when you run a multitasking operating system like OS/2, Linux or other flavour of UNIX. It does not show up with operating systems that do only one I/O at a time such as DOS or Windows. NT is a multi-tasking operating system, but it uses the EIDE controller in a slow mode that bypasses the flaw. WHAT ARE THE SYMPTOMS? When you are using an EIDE hard disk attached to the EIDE motherboard port, it subtly corrupts your files by changing or shifting bytes every once in a while. This will introduce bugs into EXE files, and subtle errors in your spreadsheets, and stray characters into your word processing documents. This mainly happens when you are simultaneously using your floppy drive or modem port. The same sorts of problem may occur on reading a CD-ROM drive attached to the motherboard EIDE port. I repeat, the problem only shows up on PCI systems under Linux and OS/2 and other true multitasking operating systems. Because DOS, Windows and NT use the chip in its slow mode, they don't manifest the problem. IS IT SERIOUS? This flaw is extremely nasty. It is causing hundreds of times more havoc than the infamous Pentium divide flaw ever did. Not only does this corruption occur, but it occurs quietly, often going unnoticed. When the system crashes, you usually put the blame on the operating system or the application. It might actually be this faulty controller chip nailing you. When a directory becomes corrupted, you may not notice at until you have irreparably corrupted the directory. If a spreadsheet application reads a comma-delimited ASCII file, it may simply "miss" a few bytes in a number, an error that may go unnoticed, and that error could cascade through the rest of the spreadsheet. If you have had unexplained crashes in OS/2, you have probably experienced the problem, and should make a thorough check of your data to make sure you don't have hidden corruption. Remember that the bug may only slightly alter your data, and the corruption may not be obvious. HOW DO YOU TELL IF YOU HAVE THE FLAW Scott Llewelyn, the author of PowerQuest's PartitionMagic discovered the flaw and has done most of the work documenting it. He wrote a program called IOtest that can detect the flaw if: 1) You are using OS/2 2) You are willing to go through the hassle of creating a separate small partition to run the test in. His program PartitionMagic can be used to make room to create one. 3) You have an EIDE hard disk attached to your EIDE port. It cannot detect the problem if you only have an EIDE CD-ROM, or if the EIDE port is currently unused. You can find the test program on the Internet Web at: http://www.powerquest.com/ If you don't have Internet access, I can mail you a copy for $5 to cover duplication, shipping and handling. See my address at the end of the article. You can also have a look at your motherboard. Between the PCI slots, at the edge of the motherboard, look for a rectangular chip about 1 by 2 mm (.5" x 75") that says RZ-1000 on it. The text appears near the top of the chip. WHAT CAN YOU DO IF YOU HAVE THE FLAW? Unfortunately, the chip is soldered in. The only way to repair the flaw is to replace the whole motherboard, keeping the socketed chips such as the CPU, DRAM and SRAM cache. It will be very expensive for computer and motherboard manufacturers to fix the flaw. Dell and Intel have so far refused to replace the defective motherboards even though Dell guarantees their machines to be OS/2 compatible. Some BIOSes have a feature to turn of the "prefetch" buffer on the EIDE controller. This makes it run more slowly, but bypasses the flaw. You could buy a PCI EIDE paddleboard controller, to replace the one on the motherboard. You must disable the one on the motherboard. This would waste one of your precious slots, however. You can buy a SCSI hard disk and CD-ROM, and avoid using the EIDE ports. Under OS/2 and Linux, SCSI gives better performance, but cost more. According to Scott Llewelyn of PowerQuest, IBM is working on the problem and will have a fix out for Warp sometime by September 1995. This does not help the Linux or other operating system users. According to Patrick Duffy's "PCI Motherboards for OS/2 list, Fixpack 5 bypasses a flaw in the PC-Tech RZ-1000 EIDE controller. However, the ill-fated chip appears to have two flaws. I DON'T USE WARP. WHY SHOULD I CARE? The EIDE controller chip is supposed to work in a speedy prefetch mode. Some day operating systems like Windows and NT may evolve to exploit this high speed mode. If your motherboard has the flaw, it will not run at the speed you paid for. If ever you sell your machine, it will be worth less that a fully functioning one. Who knows, someday your machine may be running OS/2, Linux or Unix. WHERE HAS THE FLAW BEEN FOUND? On the Internet, in comp.os.os2.bugs, people have reported finding this flaw in the following specific motherboards: Dell Dimension XPS P90 Intel Premiere Intel Plato 90 Midwest Micro P90 TECHNICALLY WHAT IS THE FLAW? *In order for the bug to appear, an interrupt must occur at the exact time that the processor is reading the data from the controller. The interrupt can be anything: mouse, timer, floppy, COM port, etc. So during normal use, the bug might only cause a handful of reads to be corrupted, and may go unnoticed. But it is still a potential disaster. When the driver reads the RZ-1000 EIDE I/O port 1x7 during interrupt processing to clear the interrupt, the chipset improperly puts the status in its read-ahead FIFO, which should contain only data, resulting in data corruption. Other reports claim that whatever data was in the registers at the time of the interrupt is discarded and replaced with the next 2 or 4 bytes, resulting in a shift type corruption. In either case, if some other active but unrelated device (e.g. sound card, COM port or floppy drive) is generating interrupts at just the wrong time during EIDE reads or writes, the EIDE transfer will be corrupted. There are two other flaws that show similar symptoms: Older non-PCI AT machines often cannot handle more than one DMA transfer at a time. Gazelle's freeware DMATest.Exe can detect this flaw. I will include DMATest.Exe with IOtest if you write me. Since the RZ-1000 flaw mimics that old problem, it is often erroneously referred to as the "DMA bug". The RZ-1000 bug has nothing to do with DMA. EIDE devices run in polled PIO mode. DOS and Windows tolerate a faulty DMA controller since they do not do more than one I/O at a time. However OS/2 and Linux will not work with a faulty DMA controller. Intel Premiere motherboards have a couple of known bugs. One of these was due to a bug in the early revision of Intel's Neptune PCI chipset, so it only affected early-revision boards with 90/100 MHz Pentiums. In contrast the RZ-1000 flaw affects PCI motherboards at any speed. SPECULATION Because setting the flaw right would be so expensive, I suspect that motherboard manufacturers will continue to stonewall. Once the OS/2 patch is out, the pressure to set things right will dwindle. If the motherboard manufacturer refuses to replace the defective motherboard, they should at least give you a paddleboard PCI EIDE controller at least as fast as the defective one on the motherboard. Intel has already set the precedent by offering to replace defective Pentiums, even though the divide flaw could be bypassed with software. The RZ-1000 flaw is far more serious. Since Windows users are not complaining, I expect that Intel will get away with stonewalling this time. CONTACTING THE AUTHOR The author, Roedy Green is a computer consultant who prefer to work on Forth, C++, Delphi, DOS, OS/2 and Internet Web projects. Please report which machines you find the flaw in, and which software and fixpacks you were using at the time. Send email via: Roedy@bix.com or discuss this problem on the Internet newsgroup in: comp.os.os2.bugs. You can also write via snail mail: Roedy Green Canadian Mind Products #601 - 1330 Burrard Street Vancouver, BC CANADA V6Z 2B8 (604) 685-8412 The home page for Powerquest (who discovered the bug) is http://www.powerquest.com/ At the bottom there is a link to the PCI Hardware Defect. Kevin T. Van Maren vanmaren@fast.cs.utah.edu ---------------------------------12807794882140328675151403282--