*BSD News Article 48262


Return to BSD News archive

Path: sserve!euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!zombie.ncsc.mil!news.mathworks.com!uunet!in2.uu.net!news1.digital.com!pa.dec.com!nntpd.lkg.dec.com!usenet
From: Jon Jenkins <jenkinsj@ozy.dec.com>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Intel Premiere EIDE bug and FreeBSD
Date: 9 Aug 1995 09:03:22 GMT
Organization: Digital Equipment Corporation
Lines: 258
Message-ID: <409tkq$89t@nntpd.lkg.dec.com>
NNTP-Posting-Host: ozyd13-p3.ozy.dec.com
Mime-Version: 1.0
Content-Type: multipart/mixed;
	boundary="-------------------------------12807794882140328675151403282"
X-Mailer: Mozilla 1.1N (X11; I; BSD/386 uname failed)
X-URL: file:/root/pci1.txt

This is a multi-part message in MIME format.

---------------------------------12807794882140328675151403282
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii

Below I inlude a post from the comp.sys.intel group
about problems with Pentium motherboards with the
RZ-1000 PCI controller chip. Does FreeBSD use
this prefetch method for fast EIDE transfers ?



-- 
----------------------------------------------------------------------
Name:      Dr Jon Jenkins    Location: Digital Equipment Corporation NaC
Voice/Fax: 61-7-55-75-0151/100         Burnett Place, Research Park,  
Inet:      jenkinsj@ozy.dec.com        Bond University, Gold Coast
Close Proximity: "HEY YOU !!!"         QLD, AUSTRALIA 4229
"Daddy, what's outside the Universe?" (My 5 year old.....)
-----------------------------------------------------------------------

---------------------------------12807794882140328675151403282
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain

From: "Kevin T. Van Maren" <vanmaren@fast.cs.utah.edu>
Newsgroups: comp.os.os2.bugs,alt.sys.pc-clone.dell,alt.sys.pc-clone.gateway2000,comp.sys.intel,comp.os.linux.hardware,comp.sys.ibm.p=
c.hardware.storage,comp.sys.ibm.pc.hardware.chips,comp.sys.ibm.pc.hardware.systems,comp.sys.ibm.pc.hardware.misc
Subject: Re: Latest scoop on the [Intel Premiere] "DMA" bug. It is NOT DMA!
Date: 8 Aug 1995 17:57:50 GMT
Organization: University of Utah
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Message-ID: <4088iu$k9n@magus.cs.utah.edu>
References: <3vms83$bdm@news2.delphi.com> <3vrd19$lgj@apollo0.Stanford.EDU> <3vtnqj$c26@clarknet.clark.net> <sharma.16.762.3022A165@=
postbox.acs.ohio-state.edu>

Have you called Intel at 1-800-628-8686?
Intel tech support stonewalled me, denying there was
a problem at all--1000's of people HAVEN'T called.

Lets get people calling Intel complaining so that
they have to *admit* there is a problem, and not
just tell you to upgrade you BIOS to ver 13.

Kevin T. Van Maren
vanmaren@fast.cs.utah.edu


Here is Roedy's summary:

This 4 page essay tells everything I have been able to find out about the
misnamed "DMA bug" about the corruption progams in EIDE disk i/o on PCI
motherboards.:

SERIOUS PCI MOTHERBOARD FLAW

There is an extremely serious flaw affecting about 1/3 of all PCI
motherboards. Any motherboard containing the PC-Tech RZ-1000 PCI EIDE
controller chip is affected. This includes motherboards from AT&T, Dell,
Gateway, IBM and Intel. Since Intel makes so many of the motherboards sold
under other brand names, many machines are affected, both 486 and Pentium
PCI.

This flaw only shows up when you run a multitasking operating system like
OS/2, Linux or other flavour of UNIX. It does not show up with operating
systems that do only one I/O at a time such as DOS or Windows. NT is a
multi-tasking operating system, but it uses the EIDE controller in a slow
mode that bypasses the flaw.

WHAT ARE THE SYMPTOMS?

When you are using an EIDE hard disk attached to the EIDE motherboard port,
it subtly corrupts your files by changing or shifting bytes every once in a
while. This will introduce bugs into EXE files, and subtle errors in your
spreadsheets, and stray characters into your word processing documents.
This mainly happens when you are simultaneously using your floppy drive or
modem port.

The same sorts of problem may occur on reading a CD-ROM drive attached to
the motherboard EIDE port.

I repeat, the problem only shows up on PCI systems under Linux and OS/2 and
other true multitasking operating systems. Because DOS, Windows and NT use
the chip in its slow mode, they don't manifest the problem.

IS IT SERIOUS?

This flaw is extremely nasty. It is causing hundreds of times more havoc
than the infamous Pentium divide flaw ever did.

Not only does this corruption occur, but it occurs quietly, often going
unnoticed. When the system crashes, you usually put the blame on the
operating system or the application. It might actually be this faulty
controller chip nailing you.
When a directory becomes corrupted, you may not notice at until you have
irreparably corrupted the directory. If a spreadsheet application reads a
comma-delimited ASCII file, it may simply "miss" a few bytes in a number,
an error that may go unnoticed, and that error could cascade through the
rest of the spreadsheet.

If you have had unexplained crashes in OS/2, you have probably experienced
the problem, and should make a thorough check of your data to make sure you
don't have hidden corruption. Remember that the bug may only slightly alter
your data, and the corruption may not be obvious.

HOW DO YOU TELL IF YOU HAVE THE FLAW

Scott Llewelyn, the author of PowerQuest's PartitionMagic discovered the
flaw and has done most of the work documenting it. He wrote a program
called IOtest that can detect the flaw if:

1) You are using OS/2

2) You are willing to go through the hassle of creating a separate small
partition to run the test in. His program PartitionMagic can be used to
make room to create one.

3) You have an EIDE hard disk attached to your EIDE port. It cannot detect
the problem if you only have an EIDE CD-ROM, or if the EIDE port is
currently unused.

You can find the test program on the Internet Web at:
    http://www.powerquest.com/

If you don't have Internet access, I can mail you a copy for $5 to cover
duplication, shipping and handling. See my address at the end of the
article.

You can also have a look at your motherboard. Between the PCI slots, at the
edge of the motherboard, look for a rectangular chip about 1 by 2 mm (.5" x
75") that says RZ-1000 on it. The text appears near the top of the chip.

WHAT CAN YOU DO IF YOU HAVE THE FLAW?

Unfortunately, the chip is soldered in. The only way to repair the flaw is
to replace the whole motherboard, keeping the socketed chips such as the
CPU, DRAM and SRAM cache. It will be very expensive for computer and
motherboard manufacturers to fix the flaw. Dell and Intel have so far
refused to replace the defective motherboards even though Dell guarantees
their machines to be OS/2 compatible.

Some BIOSes have a feature to turn of the "prefetch" buffer on the EIDE
controller. This makes it run more slowly, but bypasses the flaw.

You could buy a PCI EIDE paddleboard controller, to replace the one on the
motherboard. You must disable the one on the motherboard. This would waste
one of your precious slots, however.

You can buy a SCSI hard disk and CD-ROM, and avoid using the EIDE ports.
Under OS/2 and Linux, SCSI gives better performance, but cost more.

According to Scott Llewelyn of PowerQuest, IBM is working on the problem
and will have a fix out for Warp sometime by September 1995. This does not
help the Linux or other operating system users.

According to Patrick Duffy's "PCI Motherboards for OS/2 list, Fixpack 5
bypasses a flaw in the PC-Tech RZ-1000 EIDE controller. However, the
ill-fated chip appears to have two flaws.

I DON'T USE WARP.  WHY SHOULD I CARE?

The EIDE controller chip is supposed to work in a speedy prefetch mode.
Some day operating systems like Windows and NT may evolve to exploit this
high speed mode.  If your motherboard has the flaw, it will not run at the
speed you paid for.  If ever you sell your machine, it will be worth less
that a fully functioning one.

Who knows, someday your machine may be running OS/2, Linux or Unix.

WHERE HAS THE FLAW BEEN FOUND?

On the Internet, in comp.os.os2.bugs, people have reported finding this
flaw in the following specific motherboards:
Dell Dimension XPS P90
Intel Premiere
Intel Plato 90
Midwest Micro P90

TECHNICALLY WHAT IS THE FLAW?

*In order for the bug to appear, an interrupt must occur at the exact time
that the processor is reading the data from the controller. The interrupt
can be anything: mouse, timer, floppy, COM port, etc. So during normal use,
the bug might only cause a handful of reads to be corrupted, and may go
unnoticed. But it is still a potential disaster.

When the driver reads the RZ-1000 EIDE I/O port 1x7 during interrupt
processing to clear the interrupt, the chipset improperly puts the status
in its read-ahead FIFO, which should contain only data, resulting in data
corruption. Other reports claim that whatever data was in the registers at
the time of the interrupt is discarded and replaced with the next 2 or 4
bytes, resulting in a shift type corruption.

In either case, if some other active but unrelated device (e.g. sound card,
COM port or floppy drive) is generating interrupts at just the wrong time
during EIDE reads or writes, the EIDE transfer will be corrupted.

There are two other flaws that show similar symptoms:

Older non-PCI AT machines often cannot handle more than one DMA transfer at
a time. Gazelle's freeware DMATest.Exe can detect this flaw. I will include
DMATest.Exe with IOtest if you write me. Since the RZ-1000 flaw mimics that
old problem, it is often erroneously referred to as the "DMA bug". The
RZ-1000 bug has nothing to do with DMA.  EIDE devices run in polled PIO
mode.  DOS and Windows tolerate a faulty DMA controller since they do not
do more than one I/O at a time. However OS/2 and Linux will not work with a
faulty DMA controller.

Intel Premiere motherboards have a couple of known bugs. One of these was
due to a bug in the early revision of Intel's Neptune PCI chipset, so it
only affected early-revision boards with 90/100 MHz Pentiums. In contrast
the RZ-1000 flaw affects PCI motherboards at any speed.

SPECULATION

Because setting the flaw right would be so expensive, I suspect that
motherboard manufacturers will continue to stonewall.  Once the OS/2 patch
is out, the pressure to set things right will dwindle.  If the motherboard
manufacturer refuses to replace the defective motherboard, they should at
least give you a paddleboard PCI EIDE controller at least as fast as the
defective one on the motherboard.  Intel has already set the precedent by
offering to replace defective Pentiums, even though the divide flaw could
be bypassed with software.  The RZ-1000 flaw is far more serious.  Since
Windows users are not complaining, I expect that Intel will get away with
stonewalling this time.

CONTACTING THE AUTHOR

The author, Roedy Green is a computer consultant who prefer to work on
Forth, C++, Delphi, DOS, OS/2 and Internet Web projects.

Please report which machines you find the flaw in, and which software and
fixpacks you were using at the time. Send email via:
        Roedy@bix.com
or discuss this problem on the Internet newsgroup in:
        comp.os.os2.bugs.
You can also write via snail mail:
Roedy Green
Canadian Mind Products
#601 - 1330 Burrard Street
Vancouver, BC  CANADA
V6Z 2B8
(604) 685-8412


The home page for Powerquest (who discovered the bug) is
http://www.powerquest.com/

At the bottom there is a link to the PCI Hardware Defect.

Kevin T. Van Maren
vanmaren@fast.cs.utah.edu


---------------------------------12807794882140328675151403282--