Return to BSD News archive
Xref: sserve comp.unix.bsd:3844 comp.sys.ibm.pc.hardware:29366 Newsgroups: comp.unix.bsd,comp.sys.ibm.pc.hardware Path: sserve!manuel!munnari.oz.au!mips!mips!darwin.sura.net!wupost!uunet!mcsun!sunic!aun.uninett.no!barsoom!barsoom!tih From: tih@barsoom.nhh.no (Tom Ivar Helbekkmo) Subject: Re: Disklabeling (was Re: Another Adaptec Question) Message-ID: <tih.714115743@barsoom> Sender: news@barsoom.nhh.no (USENET News System) Organization: Norwegian School of Economics References: <1992Aug16.144341.24052@Informatik.TU-Muenchen.DE> <15776@star.cs.vu.nl> <1992Aug17.173807.2309@Informatik.TU-Muenchen.DE> <#79m_a.alm@netcom.com> Distribution: world,fj,spec Date: Tue, 18 Aug 1992 05:29:03 GMT Lines: 133 alm@netcom.com (Andrew Moore) writes: >To return to 386BSD and disklabeling a SCSI drive: >am I correct in assuming that the specification of # of cylinders, >sectors per cylinder, etc does not matter at all, so long as the total >number of blocks is correct? No, you're not... It *does* matter to the file system. Although the following was written while I was trying to figure out how to set up SCSI disks correctly for Ultrix, it's relevant to any system using the Berkeley Fast File System. This is a summary I posted to the net in comp.unix.ultrix a while back: I recently asked how to correctly set up disk partitions for the SCSI disks connected to our DECstations, specifying some of the problems I had understanding what was right and wrong. I've had several very interesting responses, and feel that I've learned quite a bit of useful stuff here... Thanks go to Klaus Steinberger, Walter Wong, Mike Mitchell, and especially to Stefan Esser, who took the time to explain a lot of details to me in our email correspondence. Anyway -- to recap my situation, I wanted to make sure I partitioned my disks so partition boundaries were placed at cylinder boundaries, and their sizes worked out properly to an integral number of cylinder groups, 16 cylinders per group being the default number. Looking at the /etc/disktab entries for the disks helped me little, since DEC obviously hadn't cared about this in their setup, and multiplying sectors/track by tracks/cylinder by cylinders didn't work out to the specified total number of sectors on the disks anyway! Well, it turns out that the situation is more complex than that... The BSD Fast File System uses certain heuristics to allocate disk blocks within a partition. Some of these are supposed to increase data security (against accidental loss), some to make file access more efficient. For instance: - Groups of inodes are allocated in each cylinder group in the partition, and attempts are made to keep file data blocks near the inodes describing them. (Efficiency) - Each cylinder group has a redundant copy of the superblock, which is staggered by one track per cylinder group, to keep them on different platters. Inodes follow the superblock copy, to stagger those as well. (Security) - File data blocks are allocated for rotational contiguity. The optimal block is not necessarily the one following the previous one; if the system is not fast enough to schedule a new disk transfer in time, a "rotationally later" block is selected. If the optimal block on the disk is already taken, the same block (or one as closely following it as possible) on another track in the same cylinder is attempted allocated instead, and so on. (Efficiency) There's more to it than this, of course, but note the assumptions being made here: The file system needs to know the correct disk geometry; cylinders, heads, and sectors per track. The product of the last two of these must be the correct number of sectors per cylinder. It must also know the rotational speed of the disk, and it assumes that sectors within tracks are numbered in parallell, so that sector 0 of each track in a cylinder passes the read/write head at the same time. Guess what? This doesn't hold true for SCSI disks! These disks tend to do quite a bit of optimization of their own, behind the file system's back... For instance: - Tracks are usually rotationally staggered, to optimize the time to sequentially get from the last sector of one track to the first sector of the (logically) next one. This counteracts the rotational delay optimization in the file system. - Spare sectors (for bad block replacement) are usually allocated on a per-cylinder basis. This is a good strategy for optimal disk utilization and effective relocation, but it breaks the file system's calculation of where cylinder boundaries are, since heads multiplied by sectors per track does not equal (usable) sectors per cylinder. - Large SCSI disks tend to use zone bit recording, which means that there are more sectors per track on the outer tracks than on the inner ones. Then they lie to the file system about geometry, giving it something that works out close to the correct size of the disk. Again, this ruins the file system's attempt to intelligently use cylinder boundary information, which is guaranteed to be wrong. So, what do you do if you want optimal performance from a SCSI disk? Well, as long as the disk does not do zone bit recording, there may be hope. SCSI disks can be reparameterized and reformatted. However, the number of parameters that you can change varies from disk to disk. (See the man page entry on 'rzdisk' for more information on how to examine and change these parameters.) - If you can set track skew and cylinder skew parameters to zero, thus reorienting the geometry of the disk to what the file system expects, you can get the timing calculations to work. - If you can make the disk allocate spare sectors on a per-track basis, you can make the cylinder boundary calculations work right, by using, say, one spare per track, and telling the file system that the disk has one less sector per track than it really does. (The file system doesn't know about spares; it counts usable sectors.) This means that tracks with more than one fault will be reallocated to the spare cylinders you reserve at the end of the disk, but that can't very well be helped. - If spare sectors can only be allocated on a per-cylinder basis, a hack is still possible! According to Stefan Esser, you can specify (through /etc/disktab and/or mkfs) that the disk has only one head, with a rather large number of sectors per track (the number of actually usable sectors per real cylinder). He notes, however, that a patch to the ufs_alloc() function in the file system is necessary, because, as shipped from DEC, it can't handle this large number of sectors per track. It would seem, then, that the correct choice, when you need high disk throughput, is to get a disk that does not do zone bit recording, and that can be reparameterized to use a non-staggered layout with a spare sector per track. This will normally mean smaller disks, and thus an increased number of drives to achieve the same storage space -- which isn't too bad anyway if you're really into speed; e.g. two optimized drives on each of two SCSI controllers should be much better than two non-optimized, bigger drives on one controller. I expect, though, that future versions of the BSD Fast File System will have knowledge of SCSI disks, and how to use them effectively. I understand that Sun has already made such changes, resulting in noticeable improvements. -tih -- Tom Ivar Helbekkmo, NHH, Bergen, Norway. Telephone: +47-5-959205 Postmaster for domain nhh.no. Internet mail: tih@barsoom.nhh.no