*BSD News Article 2957

Path: sserve!manuel!munnari.oz.au!mips!mips!sdd.hp.com!usc!news
From: merlin@neuro.usc.edu (merlin)
Newsgroups: comp.unix.bsd
Subject: the cause of poor floating point performance on 386bsd/gcc/libm.a
Date: 2 Aug 1992 16:33:54 -0700
Organization: University of Southern California, Los Angeles, CA
Lines: 84
Sender: merlin@neuro.usc.edu (merlin)
Message-ID: <l7os72INNjnu@neuro.usc.edu>
References: x
NNTP-Posting-Host: neuro.usc.edu

I speculated in a previous posting that the cause of poor floating point
performance on 386bsd/gcc/libm.a was emulation of 80387 operations.  One
person pointed out that a common benchmark of floating point performance
was quite good -- and by implication 386bsd was not emulating the 80387
operations.  So, based on a suggestion by another individual, I profiled
the entire BRLCAD rt (photorealistic ray tracer code) to discover rt was
spending much of it's time in calls to transcendental functions.  So, it
was a simple matter from there to go to the library source codes -- the
source code clearly showed the transcendental operations were being done
by emulation which only employed simple (+, -, *, /) floating point ops.

So, I check around on the net and found a mathematics library 'mathlib2'
written by Glenn Geers (glenn@qed.physics.su.oz.au) which seemed to have
the appropriate code for performing the higher functions using an 80387.
Glenn's code required trivial modifications to be run under 386bsd -- it
was simply a matter of prepending his entry points (e.g. 'sin') with an
underscore (e.g. '_sin') so that they would be recognized by 386bsd's ld.
I also had to prepend an underscore to a call to 'fprintf' in one of his
routines.  Finally, I didn't know what port to use for '_iob' so I just
commented out one small bit of 80387 error reset code.  

Then I recompiled the BRLCAD rt and ran some benchmark examples.  These
benchmarks ran within about one percent of the time taken to execute the
same examples under SCO UNIX SYSV/386 3.2r2.0 ODT 1.1 compiled with gcc.
Unfortunately there are some very minor differences in the results which
were generated by this revised 386bsd lib387.a and SCO ODT's libm.a; but
those differences are few in number (only a small percentage difference
in a few pixels scattered throughout a very large image area) - and most
likely represent differences in precision between the two libraries.  My
guess is the difference arises from something trivial like a need to set
64 bit rather than 32 bit floating point precision.

Unfortunately, I am not an expert in 80386/80387 assembly code so I don't
know where to begin looking for an appropriate modification to close the
apparent gap in precision between the two libraries.  However, the result
in terms of performance and precision using Glenn's libraries in place of
the current 386bsd libm.a may be of interest to some readers of the group. 

Perhaps someone with some expertise in 80386/80387 assembly code would be
kind enough to take a look at Glenn's libraries, set maximum appropriate
precision, and put an updated libm.a (perhaps called lib387.a which would
imply use of the library requires a 80387) online at agate and elsewhere.

It seems to me this may be an important enhancement to the 386bsd system
because it makes serious scientific computation (e.g. 3D photorealistic
rendering which requires large virtual address space for the underlaying
geometric models) possible on very inexpensive consumer grade computers.  
Moreover, by linking a series of 386bsd systems together over ETHERNET
systems containing tcp/ip based distributed computing features (BRLCAD) 
will be able to coordinate parallel execution of segments of any really
large problem across multiple systems for enhanced floating performance.

This is exactly the kind of thing which I was discussing with my friend
just a few weeks ago -- the availability of a free 386bsd will make it
possible to put very powerfull and sophisticated computing capabilities
(e.g. BRLCAD, GRASS, IRAF, KHOROS) in the hands of people who could never 
before afford the artificially inflated cost of the previously required 
AT&T licensed software (SCO ODT 1.1 configured like 386bsd would sell for 
about $4,285 -- and that's a big break on SCO's usual piecemeal pricing) 
required to run publically available scientific computing source codes.

Thanks, AJ

p.s.  I suppose I should point out UNIX is claimed to be a trademark by
some people at AT&T and ODT is claimed to be a trademark by some people
at SCO.  However, it is my impression the term UNIX has fallen into the
public domain.  Moreover, if anyone has a right to trademark the term
'ODT' it would be Digital Equipment Corporation.  My copy of DEC's 1975 
'lsi11 pdp11/03 processor handbook' on page 7-2 says ODT was built into
their console microcode to eliminate the need for console switches.  I
would bet the pdp11/03 would run AT&T SYSV/386 and X11R4 very very very
slowly.  It's been a long time, but I seem to recall using ODT on older
systems including OS/8 (PDP-8) and BBN TENEX/DEC TOPS-10 (PDP10) systems.
So, I suspect ODT has also fallen into the public domain.

------------------------------------------------------------------------------
Alexander-James Annala
Principal Investigator
Neuroscience Image Analysis Network
HEDCO Neuroscience Building, Fifth Floor
University of Southern California
University Park
Los Angeles, CA 90089-2520
------------------------------------------------------------------------------