Return to BSD News archive
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.Hawaii.Edu!news.uoregon.edu!hammer.uoregon.edu!news-peer.gsl.net!news.gsl.net!news.mathworks.com!uunet!in3.uu.net!nwnews.wa.com!news1.halcyon.com!usenet From: "Duane H. Hesser" <dhh@androcles.com> Newsgroups: comp.unix.bsd.freebsd.misc Subject: Re: Problems with HTDIG on 2.1.5R Date: Mon, 11 Nov 1996 22:06:09 -0800 Organization: Northwest Nexus Inc. Lines: 78 Message-ID: <328813D1.41C67EA6@androcles.com> References: <56007m$s7f@service3.uky.edu> NNTP-Posting-Host: androcles.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.1.5-STABLE i386) John Soward wrote: > > I'm experiencing some problems with htdig (3.0.4 and 3.0.5) under FreeBSD > 2.1.5R. Everything compiles and runs fine -- but when I attempt to index a > large server, I get a memory allocation error (in 3.0.5 I get 'out of memory > in 'new''). I have 128M in the machine with 256M of swap. I'm running as > 'root'...top show's I'm only using about 8M of memory. > > I've compiled the same code with gcc/g++ 2.7.2 on an HPUX10 machine and it > completes the index fine... > > I've tried installing gcc/g++ 2.7.2.1 and libg++2.7.2 on the FreeBSD machine > -- but get the same error...is this a gnumalloc problem? > > Anyone else have this problem? > > thanx, > -- > John Soward <a href="http://neworder.cc.uky.edu/">JpS</a> > Systems Programmer 'The Midnight sun will burn you up.' > University of Kentucky (NeXT and MIME mail OK) -R. Smith I have experienced similar problems with Htdig under Ultrix 4.3 and HPUX 9.03. There are two possible sources of the problem (if your problem is similar to mine). The first problem I would consider most likely, except that you say that "top" does not report large memory usage. This problem occurs in 'htmerge'. Look in the file 'htmerge/doc.cc', After reading all urls into a linked list, a 'while' loop reads each document into a structure 'ref', processes it, then reads the next document into 'ref'. As this loop proceeeds, the entire web structure is read into memory and LEFT THERE. Try adding 'delete ref' at the bottom of the loop. If this isn't your problem, someday it will be. The other problem is 'sort'. Actually, there are two possible problems. The first is temp directory space. Htdig may easily require > 100 megabytes of disk space for a sort; if you are sorting in /usr/tmp, make sure it's big enough. Or set TMPDIR to someplace big enough. It should be easy enough to determine if you're running out of sort filespace. Sorry I can't be more specific--all of the relevant files and notes are at work, and you know what they say--the memory is the first thing to go :). There is another possible problem with 'sort', which revealed itself only under Ultrix (not HPUX). The System 5'ish sort uses internal buffers which are adaptable to large memory requirements which will overwhelm the older bsd'ish (actualyy version 7) 'sort'. Ultix has both styles, in different paths, but 'Htdig' uses a configurable path to 'sort' in some places, and hard-coded path in others. It was necessary to assure that the System 5'ish 'sort' was used everywhere. I'm just not sure whether the GNU sort used by freebsd preserves this problem or not. All of this is pretty vague, I realize, but it should give you some places to look. If necessary, send me some mail, and I can try to generate some diffs. One last thing that I don't remember--the exact version of HTdig that I have. I CAN verify that gcc 2.7.2 was used for both compiles. -- Duane H. Hesser dhh@androcles.com