Return to BSD News archive
Xref: sserve comp.unix.ultrix:15351 comp.sys.dec:10554 comp.unix.bsd:8557 Path: sserve!manuel.anu.edu.au!munnari.oz.au!news.hawaii.edu!ames!saimiri.primate.wisc.edu!zaphod.mps.ohio-state.edu!wupost!uunet!mcsun!Germany.EU.net!rrz.uni-koeln.de!IKP.Uni-Koeln.DE!se From: se@IKP.Uni-Koeln.DE (Stefan Esser) Newsgroups: comp.unix.ultrix,comp.sys.dec,comp.unix.bsd Subject: Better matching SCSI drive characteristics (with patch for386BSD) (was Re: installation of a scsi disk) Date: 2 Dec 1992 18:58:09 GMT Organization: Institute for Mathematics, University of Cologne, Germany Lines: 225 Distribution: world Message-ID: <1fj101INNdeu@rs1.rrz.Uni-Koeln.DE> References: <2021@nikhefh.nikhef.nl> <2022@nikhefh.nikhef.nl> <1992Nov20.215457.1595@nntpd2.cxo.dec.com> <1992Nov25.185108.23362@infodev.cam.ac.uk> <1992Nov27.233646.16972@nntpd2.cxo.dec.com> NNTP-Posting-Host: snert.ikp.uni-koeln.de Keywords: ultrix Seagate SCSI In article <1992Nov27.233646.16972@nntpd2.cxo.dec.com>, alan@nabeth.enet.dec.com (Alan Rollow - Alan's Home for Wayward Tumbleweeds.) writes: |> >It's high time the filing system STOPPED TRYING to understand geometry |> >of SCSI discs, in my opinion. |> |> Agreed. And as it turns out, if you don't use rotational delay, |> about the only bit of geometry information used by newfs(8) is |> to setup the cylinder groups. You could get really fancy and |> partition along the zones and setup different entries for the |> different zones (if you thought it mattered enough): |> |> disk-foo-zone-1:mumble:whatever:\ |> :ns#zone-1-sectors:nt#tracks:nc#whatever-works:\ |> |> It is starting to look like disks are more and more often starting |> to push the boundry conditions assumed by newfs(8). It does only only require a trivial change in alloccgblk() to add a allocation scheme much better suited to SCSI disks ! (I didn't perform extensive tests to verify the improvement, but its a minor change and it prevents some unneccessary seeks and lost revolutions due to incorrect assumptions of the drive geometry, so it ought to be worth the effort.) The relevant code is (taken from ufs_alloc.c from the 386BSD sources, but present in at least BSD4.2, BSD4.3 and several Ultrix releases): if (cg_blktot(cgp)[cylno] == 0) goto norot; if (fs->fs_cpc == 0) { /* * block layout info is not available, so just have * to take any block in this cylinder. */ /***/ bpref = howmany(fs->fs_spc * cylno, NSPF(fs)); goto norot; } The only change required is to remove the initialization of 'bpref' to the start of cylinder in the marked line (/***/). This doesn't have any negative effect, since that line is only executed in case of a 'misconfiguration': The condition 'fs->fs_cpc == 0' is true, iff there are too many sectors per track for the kernel's "rotational positions table". This is triggered by setting the number of heads to '1', the number of sectors per track to the real number of sectrors per cylinder (which on quite a few drives isn't a multiple of the number of heads or the numbers of sectors per track !) or some 1000 sectors for ZBR drives. The result is, that the file system prefers allocation in ascending logical block number, spiraling over the blocks of the cylinder. This is different from the normal behaviour, if the file system block succeeding the last one allocated to a file is unavailable. The BSD-FFS allocates a free block under another head if there seems to be one at a rotational near position, but given the fact, that most SCSI drives lie about their geometry, this doesn't work well with them (even if they don't use ZBR). With the above one line patch (uncommenting the last line) and by use of a disktab entry specifying ONE head per cylinder, the allocation becomes much better suited for SCSI drives. Advantages: Switching heads makes the SCSI drive's read ahead cacheing useless. Its often better to skip a block or two, than to switch heads. If the drive incorporates track skew to compensate for the head switch time, then the rotational positions can't be computed the way the FFS considers right. The FFS looses on average half a revolution of the disk when switching heads, in this case. Even worse, if the drive has a certain number of alternate sectors per cylinder, than the cylinder boundary isn't where the FFS expects it to be! This means, that the FFS optimizations, which try to keep the blocks of a file within one cylinder, now tend to spread the file onto two adjacent cylinders. Eg. my Fujitsu M2266 has 85 sectors/track and 15 heads. The cylinder thus contains 1275 sectors, but 3 of them are alternates and only 1272 available to the file system. With 'ns#85:nt#15', the first FFS cylinder extends 3 sectors into the second physical cylinder, the second FFS cylinder 6, ... At cyl. 212 the distance has grown to half a cylinder, resulting in the first half of FFS cyl. 212 resideing on drive cyl. 212 and the second half lying on drive cyl. 213. This results in seeks between track 212 and 213 when the FFS code believes, it was just switching heads. When the (patched) FFS sees this drive as 'ns#1272:nt#1', it does the right thing! Blocks are allocated within the physical cylinder, head switching can happen in the drive, but the FFS doesn't need to know about that. This applies to ZBR drives as well, since switching heads results in a loss of half a revolution plus head switch time on them as well. When just allocating blocks in succeeding order, the seek to next cylinder happens just once (the FFS can't be told where the cyl. boundary is really is), but that's not bad compared to a worst case scenario of some 50 seeks that may result from the standard FFS allocation scheme on such a drive. I nearly forgot to explain, why its not enough to just mkfs a filesystem with one head and secpertrk:=heads*secpercyl specified. The line bpref = howmany(fs->fs_spc * cylno, NSPF(fs)); sets the preferred block to the first block of the current cylinder. The kernel arrives there only in case, that the preferred block (usually the one succeeding the last one allocated to that file) is unavailable. And when 'bpref' doesn't become reset, but is allowed to keep its value at entry to 'norot', then the kernel allocates the next free block behind bpref (wrapping around to the start of the cylinder in case there wasn't any free block). 'bpref' has been verified to be a valid block number before and it has been checked that there is at least one free block in that cylinder at the start of the alloccgbl subroutine. So the patch can't do any harm, it doesn't change the layout policy unless the file system has been created with 'ns' in the range of a few hundreds (didn't check the limit, >1000 works for me...). ! Without the patch applied, specifying eg. ns#1000 leads to a very ! bad allocation scheme. The kernel will scan for a free block ! from the beginning of the cylinder, resulting in severely fragmented ! files and wasting lots of CPU cycles ... The patch can be used on systems with a mix of SCSI and eg. IPI drives. The original BSD-FFS layout policy will be used for the IPI drives, which have been 'newfs'ed with their physical geometry data as usual (and not with ns=1). There are other (more important) improvements that ought to be applied to the FFS to better work with SCSI drives (eg. > 8KB transfers), but since this one is that simple and doesn't have any negative impact, I'd like to see it incorporated into at least 386BSD. #>>>> In case somebody wants to try it, here is a patch. #>>>> I had prepared it some time ago but never sent. #>>>> It contains some comments to become part of the #>>>> patched ufs_alloc.c. #>>>> This has been tested on several DECstations running #>>>> Ultrix 4.1-4.2a, and never failed over the last 2 years ... #>>>> (I had to apply a binary patch on these systems, #>>>> since I don't have access to Ultrix sources.) #>>>> It won't change ANYTHING in the behaviour of your system, #>>>> unless you create a new file system with a large number #>>>> of sectors per track. This is best done by specifying #>>>> ns#1:nt#1000 in the drives definition in /etc/disktab #>>>> or by specifying these values directly to mkfs. I'd like to get some feedback, in case you try it ... STefan *** ufs_alloc.c~ Fri Aug 28 11:14:03 1992 --- ufs_alloc.c Fri Aug 28 11:47:23 1992 *************** *** 715,719 **** * to take any block in this cylinder. */ ! bpref = howmany(fs->fs_spc * cylno, NSPF(fs)); goto norot; } --- 715,766 ---- * to take any block in this cylinder. */ ! /**** ! * the standard BSD distributions probably ! * back to the first 4.2 release do the following, ! * but I don't see any advantage in doing so. ! * If its uncommented, SCSI drives can be supported ! * better, because its possible to force a reasonable ! * block layout for an SCSI drive without changing ! * the behaviour for other drives ... ! */ ! /* bpref = howmany(fs->fs_spc * cylno, NSPF(fs)); */ ! /* ! * The above line forced the norot code to scan ! * for a free block starting at the <<beginning ! * of the current cylinder>>. I don't see any reason ! * for doing this, it should always be better to use ! * the <<current value of bpref>> as a starting point ! * (which is already guaranteed to be valid for this ! * purpose). By doing this, the BSD FFS block allocation ! * scheme which uses heuristics generally not applicable ! * to current SCSI drives, can be selectively switched ! * to a mode which doesn't make as many assumptions on ! * the exact knowledge of the drive geometry and ! * which takes better advantage of the preread cache ! * common on SCSI drives. To enable this mode, simply ! * use a disktab entry with more sectors per track, ! * than can be dealt with in the table used for finding ! * a rotational near block (which doesn't work on SCSI ! * drives anyway). By declaring a track to contain some ! * 1000 sectors (I use the number of *data* sectors per ! * cylinder) the allocation now prefers a sector with ! * a slightly higher logical block number than the last ! * one used for the file. This increases the probability ! * of finding the data in the drive's preread cache. ! * This is of much higher importance when using ZBR ! * (zone bit recording) drives, where its impossible ! * to specify any geometry near that required by the ! * BSD FFS block allocation heuristics. ! * Another problem with FFS disktab specifications not ! * matching the actual geometry of the drive is, that ! * the FFS tries to allocate a sector within the same ! * *cylinder*, but in fact without knowing the borders ! * of cylinders on most SCSI drives. This normally ! * leads to much unneccessary next track seeks, since ! * blocks get allocated spread over the second half of ! * one cylinder and the first one of the next because ! * of wrong assumptions about the cylinder borders. ! * 920828, Stefan Esser, <se@ikp.uni-koeln.de> ! ****/ goto norot; } -- Stefan Esser, Institute of Nuclear Physics, University of Cologne, Germany se@IKP.Uni-Koeln.DE [134.95.192.50]