Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!constellation!convex!convex!cs.utexas.edu!sdd.hp.com!nigel.msen.com!yale.edu!newsserver.jvnc.net!udel!newsserv.cs.sunysb.edu!stark.UUCP!cs.sunysb.edu!newsserv!stark!gene From: newsserv!stark!gene@cs.sunysb.edu (Gene Stark) Newsgroups: comp.os.386bsd.bugs Subject: Hanging 2nd hard drive [PATCH] (was FreeBSD 1.0 EPSILON won't boot) Date: 25 Oct 93 07:53:29 Organization: Gene Stark's home system Lines: 732 Message-ID: <NEWSSERV!STARK!GENE.93Oct25075329@stark.uucp> References: <dgontz.54.00154821@po-box.esu.edu> NNTP-Posting-Host: stark.uucp In-reply-to: dgontz@po-box.esu.edu's message of 25 Oct 93 01:16:45 GMT In article <dgontz.54.00154821@po-box.esu.edu> dgontz@po-box.esu.edu (Douglas Gontz) writes: I just installed freeBSD 1.0 EPSILON I was able to ftp from freebsd.cdrom.com. I am having several problems getting the system to function as anything other than a single user system. I am especially having trouble getting the system to function with the second hard drive. I had some similar troubles when I installed my second hard drive a few weeks ago. I went through the wd driver and replaced the remaining infinite loops with timeouts. This made it work better for me. I had sent the patches below to the FreeBSD group, but understandably enough they are reluctant to do much with them right now. The reason is that everybody has tried patching the wd driver, and it fixes some systems and breaks others. Anyway, I am posting these in the hopes that they may be useful to some people. At least they remove the infinite loops, which I feel have absolutely no place in a driver intended for a timesharing system. Thus, you should not experience any hanging with this driver, though your disks still might not work right. - Gene StarK (Copy of bug report and patch follows) -------------------- Subject: Problems with wd driver and second IDE drive Index: sys/i386/isa/wd.c FreeBSD-epsilon Description: About a week ago I added a second IDE drive, and this exercised some problems with the wd driver (I'll bet you've heard *that* before :-)). In particular, I noticed the following things: (1) Hanging in the driver just after fsck of root completes and the second drive is opened for the first time. (2) Failure of wdgetctlr to work on the second drive. In an attempt to fix some of these things, I went through the code, and did some cleaning up. In particular, I replaced all the timeout loops with a macro that makes things easier to read (hopefully), and I removed the two remaining infinite loops in wdintr (infinite loops have no business in kernel routines, in my opinion). I also added a timeout to catch missed interrupts and restart the transfer. These changes at least keep my system from hanging on bootup, though they don't solve the underlying problem (which is that nobody seems to understand how the IDE interface works). I also added code to reset the controller at the beginning of wdgetctlr. This makes the wdgetctlr command on the second drive at least respond without error, although I still have the problem that the data I get back is the data for the first drive, and not the second. (In the process of debugging, I did once start seeing the correct data for both drives, but this happy state of affairs went away when I did a power cycle/reboot.) I am enclosing the modified driver below in the hopes that the timeout code might at least solve some of the hanging problems people have been having. Unfortunately, I haven't been able to fix everything, even on my own system. The whole interface seems very timing/state sensitive, and I don't have proper documentation on it. Repeat-By: Fix: Apply the following diffs: *** /usr.src/sys/i386/isa/wd.c Wed Oct 6 07:06:09 1993 --- wd.c Mon Oct 11 07:34:06 1993 *************** *** 56,64 **** --- 56,70 ---- * 17 May 93 Rodney W. Grimes Fixed all 1000000 to use WDCTIMEOUT, * and increased to 1000000*10 for new * intr-0.1 code. + * 9 Oct 93 Eugene W. Stark Deleted remaining infinite loops, + * removed stale code and excess gotos, + * added timeout/restart to handle + * missed interrupts, hacked wddump + * to work with syscons. */ /* TODO:peel out buffer at low ipl, speed improvement */ + /* make dk_copenpart and dk_bopenpart do something useful */ #include "wd.h" *************** *** 75,80 **** --- 81,87 ---- #include "buf.h" #include "uio.h" #include "malloc.h" + #include "kernel.h" #include "machine/cpu.h" #include "i386/isa/isa.h" #include "i386/isa/isa_device.h" *************** *** 89,94 **** --- 96,118 ---- #define WDCTIMEOUT 10000000 /* arbitrary timeout for drive ready waits */ #endif + #define WAITWHILE(cond,tout) { \ + int tymeout = WDCTIMEOUT; \ + while(cond) { \ + if(--tymeout < 0) { tout } \ + } \ + } + + #define WDCRESET(wdc) { \ + /* reset the device */ \ + outb((wdc)+wd_ctlr, (WDCTL_RST|WDCTL_IDS)); \ + DELAY(1000); \ + outb((wdc)+wd_ctlr, WDCTL_IDS); \ + DELAY(1000); \ + (void) inb((wdc)+wd_error); /* XXX! */ \ + outb((wdc)+wd_ctlr, WDCTL_4BIT); \ + } + #define RETRIES 5 /* number of retries before giving up */ #define MAXTRANSFER 32 /* max size of transfer in page clusters */ *************** *** 157,166 **** --- 181,192 ---- static void wdustart(struct disk *); static void wdstart(); + static void wdrestart(struct buf *bp); static int wdcommand(struct disk *, int); static int wdcontrol(struct buf *); static int wdsetctlr(dev_t, struct disk *); static int wdgetctlr(int, struct disk *); + static int wdctimeout(struct buf *); /* * Probe for controller. *************** *** 215,222 **** int wdattach(struct isa_device *dvp) { ! int unit; ! /* int unit = dvp->id_unit;*/ for (unit=0; unit< _NWD; unit++) { struct disk *du; --- 241,247 ---- int wdattach(struct isa_device *dvp) { ! int unit, x; for (unit=0; unit< _NWD; unit++) { struct disk *du; *************** *** 229,237 **** --- 254,264 ---- } /* print out description of drive, suppressing multiple blanks*/ + x = splbio(); if(wdgetctlr(unit, du) == 0) { int i, blank; char c; + splx(x); printf("wd%d: unit %d type ", unit, unit); for (i = blank = 0 ; i < sizeof(du->dk_params.wdp_model); i++) { char c = du->dk_params.wdp_model[i]; *************** *** 250,255 **** --- 277,283 ---- printf("\n"); du->dk_unit = unit; } + splx(x); } return(1); } *************** *** 297,303 **** /* otherwise, process transfer request */ } - q: /* queue transfer on drive, activate drive and controller if idle */ dp = &wdutab[unit]; s = splbio(); --- 325,330 ---- *************** *** 305,311 **** if (dp->b_active == 0) wdustart(du); /* start drive */ if (wdtab.b_active == 0) ! wdstart(s); /* start controller */ splx(s); return; --- 332,338 ---- if (dp->b_active == 0) wdustart(du); /* start drive */ if (wdtab.b_active == 0) ! wdstart(); /* start controller */ splx(s); return; *************** *** 362,383 **** struct buf *dp; register struct bt_bad *bt_ptr; long blknum, pagcnt, cylin, head, sector; ! long secpertrk, secpercyl, addr, i, timeout; int unit, s, wdc; ! loop: ! /* is there a drive for the controller to do a transfer with? */ ! dp = wdtab.b_actf; ! if (dp == NULL) ! return; ! ! /* is there a transfer to this drive ? if so, link it on ! the controller's queue */ ! bp = dp->b_actf; ! if (bp == NULL) { ! wdtab.b_actf = dp->b_forw; ! goto loop; ! } /* obtain controller and drive information */ unit = wdunit(bp->b_dev); --- 389,409 ---- struct buf *dp; register struct bt_bad *bt_ptr; long blknum, pagcnt, cylin, head, sector; ! long secpertrk, secpercyl, addr, i; int unit, s, wdc; ! do { ! /* is there a drive for the controller to do a transfer with? */ ! dp = wdtab.b_actf; ! if (dp == NULL) ! return; ! ! /* is there a transfer to this drive ? if so, link it on ! the controller's queue */ ! bp = dp->b_actf; ! if (bp == NULL) ! wdtab.b_actf = dp->b_forw; ! } while(bp == NULL); /* obtain controller and drive information */ unit = wdunit(bp->b_dev); *************** *** 466,487 **** du->dk_bc += DEV_BSIZE; /* controller idle? */ ! timeout = 0; ! while (inb(wdc+wd_status) & WDCS_BUSY) ! { ! if (++timeout > WDCTIMEOUT) ! { ! printf("wd.c: Controller busy too long!\n"); ! /* reset the device */ ! outb(wdc+wd_ctlr, (WDCTL_RST|WDCTL_IDS)); ! DELAY(1000); ! outb(wdc+wd_ctlr, WDCTL_IDS); ! DELAY(1000); ! (void) inb(wdc+wd_error); /* XXX! */ ! outb(wdc+wd_ctlr, WDCTL_4BIT); ! break; ! } ! } /* stuff the task file */ outb(wdc+wd_precomp, lp->d_precompcyl / 4); --- 492,501 ---- du->dk_bc += DEV_BSIZE; /* controller idle? */ ! WAITWHILE((inb(wdc+wd_status) & WDCS_BUSY), ! {printf("wd.c: Controller busy too long!\n"); ! WDCRESET(wdc) ! break;}) /* stuff the task file */ outb(wdc+wd_precomp, lp->d_precompcyl / 4); *************** *** 508,529 **** outb(wdc+wd_sdh, WDSD_IBM | (unit<<4) | (head & 0xf)); /* wait for drive to become ready */ ! timeout = 0; ! while ((inb(wdc+wd_status) & WDCS_READY) == 0) ! { ! if (++timeout > WDCTIMEOUT) ! { ! printf("wd.c: Drive busy too long!\n"); ! /* reset the device */ ! outb(wdc+wd_ctlr, (WDCTL_RST|WDCTL_IDS)); ! DELAY(1000); ! outb(wdc+wd_ctlr, WDCTL_IDS); ! DELAY(1000); ! (void) inb(wdc+wd_error); /* XXX! */ ! outb(wdc+wd_ctlr, WDCTL_4BIT); ! goto RETRY; ! } ! } /* initiate command! */ #ifdef B_FORMAT --- 522,531 ---- outb(wdc+wd_sdh, WDSD_IBM | (unit<<4) | (head & 0xf)); /* wait for drive to become ready */ ! WAITWHILE(((inb(wdc+wd_status) & WDCS_READY) == 0), ! {printf("wd.c: Drive busy too long!\n"); ! WDCRESET(wdc) ! goto RETRY;}) /* initiate command! */ #ifdef B_FORMAT *************** *** 539,564 **** #endif } /* if this is a read operation, just go away until it's done. */ if (bp->b_flags & B_READ) return; /* ready to send data? */ ! timeout = 0; ! while ((inb(wdc+wd_status) & WDCS_DRQ) == 0) ! { ! if (++timeout > WDCTIMEOUT) ! { ! printf("wd.c: Drive not ready for too long!\n"); ! /* reset the device */ ! outb(wdc+wd_ctlr, (WDCTL_RST|WDCTL_IDS)); ! DELAY(1000); ! outb(wdc+wd_ctlr, WDCTL_IDS); ! DELAY(1000); ! (void) inb(wdc+wd_error); /* XXX! */ ! outb(wdc+wd_ctlr, WDCTL_4BIT); ! goto RETRY; ! } ! } /* then send it! */ outsw (wdc+wd_data, addr+du->dk_skip * DEV_BSIZE, --- 541,557 ---- #endif } + /* set alarm clock in case we never get an interrupt */ + timeout(wdctimeout, bp, 2*hz); + /* if this is a read operation, just go away until it's done. */ if (bp->b_flags & B_READ) return; /* ready to send data? */ ! WAITWHILE(((inb(wdc+wd_status) & WDCS_DRQ) == 0), ! {printf("wd.c: Drive not ready for too long!\n"); ! WDCRESET(wdc) ! goto RETRY;}) /* then send it! */ outsw (wdc+wd_data, addr+du->dk_skip * DEV_BSIZE, *************** *** 595,601 **** printf("I "); #endif ! while ((status = inb(wdc+wd_status)) & WDCS_BUSY) ; /* is it not a transfer, but a control operation? */ if (du->dk_state < OPEN) { --- 588,597 ---- printf("I "); #endif ! WAITWHILE(((status = inb(wdc+wd_status)) & WDCS_BUSY), ! {printf("wd.c: Controller not ready after interrupt!\n"); ! wdrestart(bp); ! return;}) /* is it not a transfer, but a control operation? */ if (du->dk_state < OPEN) { *************** *** 655,662 **** chk = min(DEV_BSIZE / sizeof(short), du->dk_bc / sizeof(short)); /* ready to receive data? */ ! while ((inb(wdc+wd_status) & WDCS_DRQ) == 0) ! ; /* suck in data */ insw (wdc+wd_data, --- 651,660 ---- chk = min(DEV_BSIZE / sizeof(short), du->dk_bc / sizeof(short)); /* ready to receive data? */ ! WAITWHILE(((inb(wdc+wd_status) & WDCS_DRQ) == 0), ! {printf("wd.c: not ready to receive data!\n"); ! wdrestart(bp); ! return;}) /* suck in data */ insw (wdc+wd_data, *************** *** 678,683 **** --- 676,682 ---- /* see if more to transfer */ if (du->dk_bc > 0 && (du->dk_flags & DKFL_ERROR) == 0) { + untimeout(wdctimeout, bp); wdstart(); return; /* next chunk is started */ } else if ((du->dk_flags & (DKFL_SINGLE|DKFL_ERROR)) *************** *** 685,690 **** --- 684,690 ---- du->dk_skip = 0; du->dk_flags &= ~DKFL_ERROR; du->dk_flags |= DKFL_SINGLE; + untimeout(wdctimeout, bp); wdstart(); return; /* redo xfer sector by sector */ } *************** *** 692,697 **** --- 692,698 ---- done: /* done with this transfer, with or without error */ + untimeout(wdctimeout, bp); du->dk_flags &= ~DKFL_SINGLE; wdtab.b_actf = dp->b_forw; wdtab.b_errcnt = 0; *************** *** 715,720 **** --- 716,755 ---- } /* + * Timeout routine in case no interrupt is forthcoming for + * the current transfer. Reset the device, and start over. + */ + + int + wdctimeout(struct buf *bp) + { + printf("wd.c: Transfer is taking too long -- restarting\n"); + wdrestart(bp); + } + + /* + * Reset the controller and restart the current transfer in case the + * controller doesn't do something when it seems like it should. + */ + + void + wdrestart(struct buf *bp) + { + register struct disk *du; + int i; + + i = splbio(); + untimeout(wdctimeout, bp); + du = wddrives[wdunit(bp->b_dev)]; + WDCRESET(du->dk_port) + wdtab.b_active = 0; + du->dk_skip = 0; + du->dk_flags &= ~DKFL_ERROR; + wdstart(); + splx(i); + } + + /* * Initialize a drive. */ int *************** *** 743,749 **** * or longer if there isn't one there. */ bzero(&du->dk_dd, sizeof(du->dk_dd)); - #undef d_type /* fix goddamn segments.h! XXX */ du->dk_dd.d_type = DTYPE_ST506; du->dk_dd.d_ncylinders = 1024; du->dk_dd.d_secsize = DEV_BSIZE; --- 778,783 ---- *************** *** 781,787 **** * that overlaps another partition which is open * unless one is the "raw" partition (whole disk). */ ! if ((du->dk_openpart & mask) == 0 /*&& part != RAWPART*/ && part != WDRAW) { int start, end; pp = &du->dk_dd.d_partitions[part]; --- 815,821 ---- * that overlaps another partition which is open * unless one is the "raw" partition (whole disk). */ ! if ((du->dk_openpart & mask) == 0 && part != WDRAW) { int start, end; pp = &du->dk_dd.d_partitions[part]; *************** *** 793,800 **** if (pp->p_offset + pp->p_size <= start || pp->p_offset >= end) continue; - /*if (pp - du->dk_dd.d_partitions == RAWPART) - continue; */ if (pp - du->dk_dd.d_partitions == WDRAW) continue; if (du->dk_openpart & (1 << (pp - --- 827,832 ---- *************** *** 856,864 **** wdtab.b_active = 1; /* wait for drive and controller to become ready */ ! for (i = WDCTIMEOUT; (inb(wdc+wd_status) & (WDCS_READY|WDCS_BUSY)) ! != WDCS_READY && i-- != 0; ) ! ; outb(wdc+wd_command, WDCC_RESTORE | WD_STEP); du->dk_state++; splx(s); --- 888,896 ---- wdtab.b_active = 1; /* wait for drive and controller to become ready */ ! WAITWHILE(((inb(wdc+wd_status) & (WDCS_READY|WDCS_BUSY)) ! != WDCS_READY),{break;}) ! outb(wdc+wd_command, WDCC_RESTORE | WD_STEP); du->dk_state++; splx(s); *************** *** 874,880 **** goto tryagainrecal; } bp->b_error = ENXIO; /* XXX needs translation */ ! goto badopen; } /* some controllers require this ... */ --- 906,915 ---- goto tryagainrecal; } bp->b_error = ENXIO; /* XXX needs translation */ ! printf(": status %b error %b\n", stat, WDCS_BITS, ! inb(wdc + wd_error), WDERR_BITS); ! bp->b_flags |= B_ERROR; ! return(1); } /* some controllers require this ... */ *************** *** 892,903 **** panic("wdcontrol"); } /* NOTREACHED */ - - badopen: - printf(": status %b error %b\n", stat, WDCS_BITS, - inb(wdc + wd_error), WDERR_BITS); - bp->b_flags |= B_ERROR; - return(1); } /* --- 927,932 ---- *************** *** 908,938 **** */ static int wdcommand(struct disk *du, int cmd) { ! int timeout = WDCTIMEOUT, stat, wdc; /* controller ready for command? */ wdc = du->dk_port; ! while (((stat = inb(wdc + wd_status)) & WDCS_BUSY) && timeout > 0) ! timeout--; ! if (timeout <= 0) ! return(-1); /* send command, await results */ outb(wdc+wd_command, cmd); ! while (((stat = inb(wdc+wd_status)) & WDCS_BUSY) && timeout > 0) ! timeout--; ! if (timeout <= 0) ! return(-1); if (cmd != WDCC_READP) return (stat); /* is controller ready to return data? */ ! while (((stat = inb(wdc+wd_status)) & (WDCS_ERR|WDCS_DRQ)) == 0 && ! timeout > 0) ! timeout--; ! if (timeout <= 0) ! return(-1); ! return (stat); } --- 937,957 ---- */ static int wdcommand(struct disk *du, int cmd) { ! int stat, wdc; /* controller ready for command? */ wdc = du->dk_port; ! WAITWHILE(((stat = inb(wdc + wd_status)) & WDCS_BUSY), return(-1);) /* send command, await results */ outb(wdc+wd_command, cmd); ! WAITWHILE(((stat = inb(wdc+wd_status)) & WDCS_BUSY), return(-1);) if (cmd != WDCC_READP) return (stat); /* is controller ready to return data? */ ! WAITWHILE((((stat = inb(wdc+wd_status)) & (WDCS_ERR|WDCS_DRQ)) == 0), ! return(-1);) return (stat); } *************** *** 967,986 **** /* * issue READP to drive to ask it what it is. */ static int wdgetctlr(int u, struct disk *du) { ! int stat, x, i, wdc; char tb[DEV_BSIZE]; struct wdparams *wp; - x = splbio(); /* not called from intr level ... */ wdc = du->dk_port; outb(wdc+wd_sdh, WDSD_IBM | (u << 4)); stat = wdcommand(du, WDCC_READP); if (stat < 0) { - splx(x); return(stat); } /* --- 986,1005 ---- /* * issue READP to drive to ask it what it is. + * EWS: call at splbio() */ static int wdgetctlr(int u, struct disk *du) { ! int stat, i, wdc; char tb[DEV_BSIZE]; struct wdparams *wp; wdc = du->dk_port; + WDCRESET(wdc) /* EWS: seems to help sometimes */ outb(wdc+wd_sdh, WDSD_IBM | (u << 4)); stat = wdcommand(du, WDCC_READP); if (stat < 0) { return(stat); } /* *************** *** 992,998 **** stat = wdcommand(du, WDCC_RESTORE | WD_STEP); if (stat & WDCS_ERR) { stat = inb(wdc+wd_error); - splx(x); return(stat); } --- 1011,1016 ---- *************** *** 1001,1007 **** strncpy(du->dk_params.wdp_model, "Unknown Type", sizeof du->dk_params.wdp_model); du->dk_dd.d_type = DTYPE_ST506; - splx(x); return(0); } --- 1019,1024 ---- *************** *** 1222,1228 **** addr = (char *) 0; /* starting address */ /* toss any characters present prior to dump */ ! while (sgetc(1)) ; /* size of memory to dump */ --- 1239,1245 ---- addr = (char *) 0; /* starting address */ /* toss any characters present prior to dump */ ! while (sgetc(1) & 0xff) /* EWS: A hack to work with syscons */ ; /* size of memory to dump */ *************** *** 1354,1360 **** (int) addr += 512; /* operator aborting dump? */ ! if (sgetc(1)) return(EINTR); } return(0); --- 1371,1377 ---- (int) addr += 512; /* operator aborting dump? */ ! if (sgetc(1) & 0xff) /* EWS: A hack to work with syscons */ return(EINTR); } return(0); --