*BSD News Article 62901

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!bunyip.cc.uq.oz.au!munnari.OZ.AU!spool.mu.edu!howland.reston.ans.net!newsfeed.internetmci.com!EU.net!sun4nl!Utrecht.NL.net!news.iaf.nl!news.es.iaf.nl!fozzie.sun3.iaf.nl!fozzie.sun3.iaf.nl!not-for-mail
From: geert@fozzie.sun3.iaf.nl (Geert Bosch)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Poor performance with INN and FreeBSD.
Followup-To: comp.unix.bsd.freebsd.misc
Date: 24 Feb 1996 23:05:51 +0100
Organization: La Calandre Infortunee
Lines: 48
Distribution: inet
Message-ID: <4go23v$8hp@fozzie.sun3.iaf.nl>
References: <311F8C62.4BC4@pluto.njcc.com> <DMu8B6.6Jn@twwells.com> <4gdgc6$ron@olivea.atc.olivetti.com> <4gf2p0$209@fozzie.sun3.iaf.nl> <4gij7n$pmm@uriah.heep.sax.de>
NNTP-Posting-Host: fozzie.sun3.iaf.nl
X-Newsreader: TIN [version 1.2 PL2]

J Wunsch (j@uriah.heep.sax.de) wrote:
: BSD uses a filesystem with linear directories, but the directory
: contains just an pointer to the i-node information (unlike e.g. OS/2,
: where the directory contains the equivalent of an i-node itself).
But if you have to look for a certain filename you'd still have to
do a lineair search which is what is causing the large overhead when
accessing news articles in a large directory. 

The fact that the directory itself only contains the inodes makes it
faster to remove inodes from the directory, but it doesn't help in
finding the right inode given a certain directory. So, when I want
to access one article (say 23456) from a large directory I still have
to do a lineair search comparing all filenames. The only difference
with having the complete i-node information in the inode-table instead
of in the directory is that you've got to dereference the inode numbers,
which even may be scattered around the inode table(s). Or am I mistaken?

Of course having this extra level of indirection makes it possible to
implement hard links easily. Wouldn't it be better if this level of
redirection was used to maintain a sorted index? When using weak constraints
it might be possible to get a real performance boost on searching large
directories without sacrifying any compatibility with current software.

Something like: ``we'll promise to try to keep directories sorted by name''.
Software like INN could take advantage of that by first doing a binary
search and, if it fails falling back to lineair search starting with
the last entry and searching backwards. When directories would be sorted 
in idle-time and on some places when they are in memory anyway or in
some strategic places like after the INN expire and renumbering has
been done, most of the time the binary search would succeed. In the 
other cases, there's a good chance the missed file is new and is added 
to the end of the directory so it will be found very early in the linear search.

The worst that you'd get is to do the (relatively fast) binary search in
addition to the lineair search, but that should be a marginal slow-down.
Moreover, it would be rather straightforward to only bother doing the
binary search on large directories, since linear search is efficient
on small directories.

Since I don't know much about filesystems, my proposal might very well
be completely stupid or not applicable to FreeBSD. Since I'm not afraid
to learn, please tell me why.

Greetings,
   Geert
-- 
E-Mail: geert@sun3.iaf.nl      ***   Lbh whfg penpxrq ZvpebFbsg'f   ***
 Phone: +31-53-4303054         **  arj naq vzcebirq frphevgl flfgrz. **