Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!olivea!strobe!jerry From: jerry@strobe.ATC.Olivetti.Com (Jerry Aguirre) Newsgroups: news.software.nntp,comp.unix.bsd.freebsd.misc Subject: Re: FreeBSD 2.1.5, INN 1.4unoff4, and mmap() Date: 19 Jul 1996 00:38:06 GMT Organization: Olivetti ATC; Cupertino, CA; USA Lines: 53 Distribution: inet Message-ID: <4smlde$8er@olivea.ATC.Olivetti.Com> References: <mandrews.837437077@bob.wittenberg.edu> <4se8tq$sgf@news.demos.su> <4sg80v$3uu@brasil.moneng.mei.com> <4sggo5$ls@news.demos.su> NNTP-Posting-Host: strobe.atc.olivetti.com Xref: euryale.cc.adfa.oz.au news.software.nntp:24518 comp.unix.bsd.freebsd.misc:23967 In article <4sggo5$ls@news.demos.su> andy@sun-fox.demos.su (Andrew A. Vasilyev) writes: > Could you provide the theoretical value of stripe size for multiple > reading/writing of small files? For a UFS file system the optimal stripe value is the cylinder group size. UFS tries to put the directory, inode, and file all in the cylinder group. Thus to access one article there should be one large seek, to get to that cylinder group and then zero to a few small seeks within the cylinder group for the other accesses. With the entire cylinder group on one disk only that disk is occupied for that access. The other disks are free to be accessing other articles in other groups. Simple concatenation will also result in the same isolation of accesses but the load will not be evenly ballanced. Depending on the way the directories are created the system will tend to concentrate accesses to the first few drives and the other drives will be less active. When the initial directories are created space is available so it tends to put them in the first part of the drive. With a stripe size of one cylinder group the accesses tend to be more evenly distributed across the drive. It is small enough to randomize the accesses while still isolating the newsgroups to individual drives. It is easy to see that optimizing for large files is incorrect. Before switching to ODS I had the typical multiple drives. At one point I had the binary heirarchy mounted on one drive and the rest on another. The space usage was about the same but the IO bandwidth for the binary disk was only a fraction of the other. Creating lots of small articles requires considerable more IO than the same space in a few big articles. If all we were shipping around was 1 Meg. binaries there would be no disk IO problems. It is all those 1K byte small articles that are killing performance. This assumes that one is dealing with multiple accesses. If all you ran was innd to create articles then this would not be critical, but presumably you have some readers or outgoing feeds. Each one of those processes is making accesses independant of innd's article creation and independant of each other. In terms of planning they are all making random accesses to different parts of the file system. If your concerned about maximum performance then disk seek time, not read/write time, becomes the critial factor. In the past 10 years, while CPUs have gotten 100 times faster, disks 100 times bigger, and RAM 1000 times bigger, seek times have increased by about 3. (Assuming 24 ms. access time then vs. 8 ms. today.) In contrast something like expire's access to the history file would benefit from a smaller stripe size. This allows the read ahead code to be accessing different drives and increase the rate that expire can sequentially read and write the history file. Given that expire already runs fairly fast this doesn't seem to be a goal to pursue.