*BSD News Article 74157

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!spool.mu.edu!olivea!strobe!jerry
From: jerry@strobe.ATC.Olivetti.Com (Jerry Aguirre)
Newsgroups: news.software.nntp,comp.unix.bsd.freebsd.misc
Subject: Re: FreeBSD 2.1.5, INN 1.4unoff4, and mmap()
Date: 19 Jul 1996 00:38:06 GMT
Organization: Olivetti ATC; Cupertino, CA; USA
Lines: 53
Distribution: inet
Message-ID: <4smlde$8er@olivea.ATC.Olivetti.Com>
References: <mandrews.837437077@bob.wittenberg.edu> <4se8tq$sgf@news.demos.su> <4sg80v$3uu@brasil.moneng.mei.com> <4sggo5$ls@news.demos.su>
NNTP-Posting-Host: strobe.atc.olivetti.com
Xref: euryale.cc.adfa.oz.au news.software.nntp:24518 comp.unix.bsd.freebsd.misc:23967

In article <4sggo5$ls@news.demos.su> andy@sun-fox.demos.su (Andrew A. Vasilyev) writes:
>  Could you provide the theoretical value of stripe size for multiple
>  reading/writing of small files?

For a UFS file system the optimal stripe value is the cylinder group
size.  UFS tries to put the directory, inode, and file all in the
cylinder group.  Thus to access one article there should be one large
seek, to get to that cylinder group and then zero to a few small seeks
within the cylinder group for the other accesses.

With the entire cylinder group on one disk only that disk is occupied
for that access.  The other disks are free to be accessing other articles
in other groups.

Simple concatenation will also result in the same isolation of accesses
but the load will not be evenly ballanced.  Depending on the way the
directories are created the system will tend to concentrate accesses to
the first few drives and the other drives will be less active.  When the
initial directories are created space is available so it tends to put them
in the first part of the drive.

With a stripe size of one cylinder group the accesses tend to be more
evenly distributed across the drive.  It is small enough to randomize
the accesses while still isolating the newsgroups to individual drives.

It is easy to see that optimizing for large files is incorrect.  Before
switching to ODS I had the typical multiple drives.  At one point I had
the binary heirarchy mounted on one drive and the rest on another.  The
space usage was about the same but the IO bandwidth for the binary disk
was only a fraction of the other.  Creating lots of small articles
requires considerable more IO than the same space in a few big
articles.  If all we were shipping around was 1 Meg. binaries there
would be no disk IO problems.  It is all those 1K byte small articles
that are killing performance.

This assumes that one is dealing with multiple accesses.  If all you
ran was innd to create articles then this would not be critical,  but
presumably you have some readers or outgoing feeds.  Each one of those
processes is making accesses independant of innd's article creation and
independant of each other.  In terms of planning they are all making random
accesses to different parts of the file system.  If your concerned about
maximum performance then disk seek time, not read/write time, becomes
the critial factor.

In the past 10 years, while CPUs have gotten 100 times faster, disks
100 times bigger, and RAM 1000 times bigger, seek times have increased
by about 3.  (Assuming 24 ms. access time then vs. 8 ms. today.)

In contrast something like expire's access to the history file would
benefit from a smaller stripe size.  This allows the read ahead code
to be accessing different drives and increase the rate that expire
can sequentially read and write the history file.  Given that expire
already runs fairly fast this doesn't seem to be a goal to pursue.