*BSD News Article 25077

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!europa.eng.gtefsd.com!avdms8.msfc.nasa.gov!sol.ctr.columbia.edu!hamblin.math.byu.edu!news.byu.edu!cwis.isu.edu!u.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Newsgroups: comp.os.386bsd.development
Subject: Re: [FreeBSD 1.0R] DMA Problems?
Date: 16 Dec 1993 21:28:07 GMT
Organization: Weber State University, Ogden, UT
Lines: 76
Message-ID: <2eqjt7$dqm@u.cc.utah.edu>
References: <CHCErs.G5w@genesis.nred.ma.us> <2dj25i$1ga@u.cc.utah.edu> <2encotINN3sq@bonnie.sax.de>
NNTP-Posting-Host: cs.weber.edu

In article <2encotINN3sq@bonnie.sax.de> j@uriah.sax.de (J Wunsch) writes:
>In <2dj25i$1ga@u.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
>
>>[ ... DMA problems with FDC ... ]
>
>>It is probably your cache. ...
>>... device initiated DMA has the potential
>>to update memory without updating cache (on reads) or write data that is
>>valid in cache but not in memory (unless your cache is write through) on
>>writes.
>
>Umm, Terry, i think you failed here.
>The FDC does not do any ``device initiated DMA'' in this sense. It's simply
>using the DMA chipset feature. This *should* not conflict with cache usage -
>or the chipset design is broken. [We do not speak about SCSI host adaptors
>that do bus-master DMA.]

Geeze... this is a real old discussion; let's see if I can remember...

He was taking about DMA problems trashing his floppy I/O; without a more
complete discussion of exactly what was happening (which if it could be
done meant that the question wouldn't have needed to be asked 8-)), the
only thing I could see was the data being read off the disk using DMA
and then written to the floppy not using DMA.  BECAUSE the floppy does
*not* _ALSO_ use DMA, then there could be a cache related problem.

This could probably be ignored if it were program generated data
going to the floppy... but if it wre, for instance, a tar of a
directory or a cp of a file, then the DMA into the machine from the
disk could be what is screwing up -- and that because of cache problems.

>In fact, i'm also experiencing lotta DMA overruns when attempting to do
>some floppy IO while some heavy compile job is running. This is in a box
>with an Adaptec SCSI, and i didn't track it down whether it's the CPU
>load that causes the trouble (would really look strange to me), or if
>it's the heavy disk IO that happens while compiling.

Was the original DMA overrruns?  If so, it's a retry condition -- although
I can't see how this would be possible, given that you are not allowed
to have more outstanding requests than buffer descriptors on most SCSI
controllers, even if you are doing scatter/gather.

>Assuming it's the latter, so it could be some problem with the host adaptor
>DMA cycles - does the adaptec really care for other DMA requests floating
>around on the bus while it is considering to take over the bus?

I'm pretty sure that multiple AHA cards in the same machine works -- I
haven't seen anything to the contrary, but then again "absence of
evidence is not evidence of absence" ...not that this is a justification
for believing that was happening, only for not ruling it out.

In any case, there is a bus arbitration cycle that it would need to go
through.

You *can* twiddle the bus-on/bus-off time on AHA controllers with their
config disks, but the only thing I have seen blow up is the staandard
VESA problem, and then only on slow machines (unless it's a VESA board,
in which case it will probably happen all the time) -- DMA steals the
refresh cycles (VESA boards are more likely to steal too many) and
then memory over a certain amount will start getting parity errors,
or if you don't have parity, your machine will just seem flakey and
dia occasionally.

>As a workaround, Serge Vakulenko proposed to simply ignore the DMA overrun
>error and retry the transfer until it succeeds (or some other error occurs).

Yup.  The DMA timeout falls into the soft error classification; standard
procedure is retry until *contiguous* fail count hits some max, after
which it is treated as a hard error.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.