Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!munnari.OZ.AU!news.mel.connect.com.au!news.mira.net.au!news.vbc.net!samba.rahul.net!rahul.net!a2i!olivea!charnel.ecst.csuchico.edu!psgrain!usenet.eel.ufl.edu!newsfeed.internetmci.com!in2.uu.net!news.dca.net!dca.net!awhite From: Andrew White <awhite@dca.net> Newsgroups: comp.unix.bsd.bsdi.misc Subject: FIX (?!): BusLogic BT-946C firmware problem Date: Tue, 11 Jun 1996 04:49:33 -0400 Organization: DCANet - Delaware Common Access Network Lines: 154 Message-ID: <Pine.BSI.3.91.960611044017.2184A-100000-100000@dca.net> NNTP-Posting-Host: dca.net Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII To: bsdi-users@bsdi.com SUMMARY: The BusLogic PCI SCSI controller (BT-946C) has firmware problems that cause irrecoverable hangs on the SCSI bus under BSD/OS 2.x. This message outlines migrating from the BT-946C to another SCSI controller without reformatting the disk. I am sending this message out to the BSDI-users mailing list and to comp.unix.bsd.bsdi.misc in the hopes that it saves someone a lot of time and trouble -- I certainly found this to be a vexing problem. The problem I experienced was that one of our BSD/OS servers, running BSD/OS 2.01 at the time, would suddenly freeze (up to 5 times per day, ouch!). Shortly thereafter, I would see kernel messages on the console which repeated every thirty seconds or so. The messages were of the form: bha0: command timed out for XX seconds The only way to recover from this problem was a cold reboot. Donn Sealy of BSDI posted this explanation in January 1994 as to what causes this type of error: -- From donn@BSDI.COM Thu Jan 20 08:26:18 1994 Date: Thu, 20 Jan 94 13:26:18 -0500 From: donn@BSDI.COM (Donn Seeley) Message-Id: <9401201826.AA05857@BSDI.COM> To: gks1!greg%ucdavis.edugks1!greg@ucdavis.edu, bsdi-users@bsdi.com Subject: Re: aha0: command timed outRe: aha0: command timed out Cc: bsdi-users@BSDI.COM Status: OR Currently the aha driver performs the following operations if a command times out: detect stale command by checking timestamps issue a host adapter abort for the given command receive notification of the aborted command deliver notification to the machine-independent SCSI disk code The machine-independent SCSI disk (sd) code does the following: observe an error call sderror() to print an error message use return value from sderror to classify the error if it's a retryable error and we haven't exceeded the retry count, retry the command otherwise return an EIO error This procedure usually works fine if the problem lies with the SCSI target rather than the SCSI host adapter. If the host adapter itself is completely wedged, then we never receive notification of the aborted command, and we loop waiting for the host adapter to acknowledge an abort. At this stage, we could try a host adapter reset if we notice that the host adapter is being uncooperative, but experience says that this doesn't work -- the only cure is an ISA bus reset (that is, a reboot). If the host adapter is not a critical component in your system, I suppose the driver could just mark the adapter as 'dead' and continue. Unfortunately most people have their root disk on the host adapter and they really do need to reboot when the timeout message cycles. It's difficult to come up with a good strategy for dealing with broken host adapters because the possibilities for breakage are quite large and the number of examples is quite small. Ideally host adapters would never wedge :-), Donn -- In this case, the problem was indeed a "wedged" adapter. Calls to BSDI tech support confirmed that there exists a firmware problem with the Buslogic (which Buslogic tech support naturally denies). Apparently, the machine which hosts BSDI's web site, www.bsdi.com, was switched from a BusLogic PCI adapter to an NCR PCI controller because of this same problem! The "fix" is to replace the SCSI adapter. I decided to upgrade to BSD/OS 2.1 and use an Adaptec AHA-2940 PCI adapter, although I could have just as easily used one of the many NCR PCI controllers as well. The problem with switching adapters for me was that I'd chosen to use aha boot blocks, which only work with BusLogic and Adaptec 15XX-ISA controllers. I also did not have an FDISK table on my boot disk, which is apparently required for the AHA-2940. I followed the following procedure to create an FDISK table and write BIOS boot blocks to my boot disk (and maintain my existing filesystems/data), which allowed me to boot from the Adaptec controller. I compiled this procedure via several conversations with BSDI tech support and from a post to bsdi-users by Paul Bonman of BSDI. Please note that while this procedure worked for me, it may not work for you -- please call BSDI tech support (and not me!) if you run into problems using it. 1. Print the current config of your boot disk with this command for reference before beginning: disksetup sd0 2. Format a floppy to save the first 16 sectors of the boot disk to floppy, in case things get really messed up! fdformat /dev/rfd0c floppy dd if=/dev/rsd0c of=/dev/rfd0c bs=512 count=16 3. Enter single user mode ("shutdown now"), and use disksetup -i to create the FDISK table etc. Verbatim from Paul Borman's post to bsdi-users: Run disksetup -i to begin writing FDISK table. When prompted, say you have coresidency. Say that BSD/OS and DOS is *not* your setup. Once you get into the FDISK screen, add exactly one partition that starts at 0 and is the whole disk of type BSDI, make sure you mark it active! Use your old BSD disklabel when asked (it should still work). Install bootany.sys, but when asked if BSD is bootable, say NO!!! Install the appropriate boot blocks (almost always the bios bootblocks these days) and write everything out. The 2.1 version of bootany will not ask any questions and should boot BSD directly. If you had said "YES" that BSD was bootable it would have prompted you to press <F1>. See bootany(8) for more information about bootany. This procedure worked fine for me (thanks Paul!). Note that I had to go into my CMOS setup and into the setup of my BusLogic and Adaptec controllers to enable "Large disk access mode for >1GB disks (DOS only)" in order to have the geometry for BIOS and BSDI agree about the size of the disk. Also note that after following the above procedure, I could no longer boot from the BT-946C controller, but I could boot from the Adaptec 2940. Note that if you need to restore the partition table, boot records, etc. you can use the floppy created in step 2 to restore this information to sd0. First you need to un-write protect this area of the disk disksetup -W sd0 Then use dd to restore the information from the floppy. dd if=/dev/rfd0c of=/dev/rsd0c bs=512 count=16 Be careful out there folks! -Andrew White Andrew White | DCANet: Internet Access for the Delaware Valley andrew@dca.net | Offering dialup, ISDN, and dedicated Internet access (302) 654-1019 | in the 215/302/610 area codes. http://andrew.white.org/ | e-mail: info@dca.net web: http://www.dca.net/