Return to BSD News archive
Newsgroups: comp.unix.bsd Path: sserve!manuel!munnari.oz.au!uunet!gatech!destroyer!caen!hellgate.utah.edu!fcom.cc.utah.edu!gateway.univel.com!gateway.novell.com!ithaca!terry From: terry@ithaca.npd.Novell.COM (Terry Lambert) Subject: Satanic boot problem tracked to CMOS, wd.c Message-ID: <1992Jul23.152046.13374@gateway.novell.com> Keywords: 386bsd wd.c boot CMOS satan Sender: terry@ithaca (Terry Lambert) Nntp-Posting-Host: ithaca.eng.sandy.novell.com Organization: Novell NPD -- Sandy, UT Date: Thu, 23 Jul 1992 15:20:46 GMT Lines: 126 Well, curiousity got the better of me... and I found what I believe to be *the* boot problem... well, several boot problems, actually. The magic file? usr/src/sys.386bsd/i386/i386/machdep.c! 1) The value of 'maxmem' is global. This should result in it being auto-initialized to 0, if the compiler is a compiler. If either the 'biosbasemem' or 'biosextmem' is "invalid", then the value of maxmem is set by "maxmem = min (maxmem, 640/4);" to zero. This will result in 0, which is clearly incorrect, as the boot code is obviously running in RAM somewhere... besides, maxmem is calculated off 'Maxmem' directly after the if statement, blowing the value to 0-1, which puts us at 0xffffffff for our amount of memory. Correction: First, this is incorrect; the value being set in the default case should be 'Maxmem', not 'maxmem'. It is very arguable that the min of 0 and anything will be zero; why is the 'min()' function called at all in this case? It is also arguable that a base memory of less than 640K is unable to boot 386BSD, so the forced default should be 640K in the "bad CMOS" case. If the machine actually has less than 640K, it will fail anyway; but if the thing *has* 640K, this will allow it to boot. 2) If the amount of extended memory is not greater than 0, or the biosbasemem is not equal to 640, 'Maxmem' is *never* set. This is the missing "not handled" case which would more correctly be the second "else". Correction: I suggest propmting the user for the amount of memory in the machine at this point, and jumping to just after the "#endif" for "NDDB" to avoid reiterating the boundry check code. I suspect that one of these two (fatal) cases are being triggered by my CMOS having "incorrect" values. There are several reasons this might occur: 1) The CMOS truly has "incorrect" values. A diagnostic to this effect, along with what the values retrieved were, and a "Hit any key to continue" message immediately following the "degraded mode" message would greatly help debugging this. This is, I believe, the case, although the reason the values are "incorrect" is that "_rtcin" is broken. 2) The CMOS has the correct values, but the read of the CMOS fails due to timing; most likely, this is related to the reset rate of various items on my bus. I suspect that the longest delay reset items, specifically the built-in bus mouse, are the most likely suspects if this is indeed the cause. Again, the modified machdep.c would help me narrow this. The HP Vectra problems could easily be realted. Dollars to donuts says that my AT&T machines and the Vectra store their CMOS values in a strange place, unexpected by BSDI. There is code to the effect that "probing breaks certain 386 AT relics"; I suppose *NOT* probing is the cause of our problems. I suspect that only one location is being used, and that the entire memory is being listed there. Again, without a boot diagnostic with suffucuent delay, I have no way of telling. Additional notes on boundry conditions: I would suggest that the expression "maxmem = Maxmem - 1;" be checked for a minimum and maxum bounds (it immediately follows the "if" on line 876 of machdep.c). This is more likely to be the intent of the misuse of the "min()" expression for the first case of the "if". What I suspect: '_rtcin' in locore.s is broken. Specifically, it reads as follows: .globl _rtcin _rtcin: movl 4(%esp),%eax outb %al,$0x70 subl %eax,%eax # clr eax inb $0x71,%al # Compaq SystemPro ret This should probably look like the following to guarantee that it is more generic (and therefore more likely to work): .globl _rtcin _rtcin: movl 4(%esp),%eax outb %al,$0x70 inb $0x71,%al # Compaq SystemPro/ATT/HP andl $0x000000ff, %eax # Fix big nasty bug ret I believe that the zeroing of eax is detrimental, and have removed it; only a byte of the value returned is defined... the rest is undefined, and is set by the setup program to whatever. One of the reasons I need these fixes is to rebuild the kernel: the machine 386BSD currently runs on at Weber State University (1 whole box) has a problem with both memory and disk space. The machine I was doing file system developement on 0.0 has been confiscated to teach NetWare classes on (somewhat ironic, considering that I work for Novell), and this has brought me to a halt. My cross-compilation environment has died on a couple of the new header files; I really can't justify the time to fix this until I have 386bsd up on at least one real box, and I can't get it up on a real box until I have fixed binaries 8-(. Any of the partial (machdep.c) or full (locore.s) fixes suggested on a dist.fs disk would be greatly appreciated! I'm sure that this would be very helpful in diagnosing the HP Vectra problem, if it didn't fix it outright, and would certainly serve to expose a lot of internals students to BSD as well as System V. Regards, Terry Lambert terry_lambert@gateway.novell.com terry@icarus.weber.edu --- Disclaimer: Any opinions in this posting are my own and not those of my present or previous employers.