Return to BSD News archive
Path: sserve!manuel!munnari.oz.au!mips!mips!sdd.hp.com!usc!news From: merlin@neuro.usc.edu (merlin) Newsgroups: comp.unix.bsd Subject: the cause of poor floating point performance on 386bsd/gcc/libm.a Date: 2 Aug 1992 16:33:54 -0700 Organization: University of Southern California, Los Angeles, CA Lines: 84 Sender: merlin@neuro.usc.edu (merlin) Message-ID: <l7os72INNjnu@neuro.usc.edu> References: x NNTP-Posting-Host: neuro.usc.edu I speculated in a previous posting that the cause of poor floating point performance on 386bsd/gcc/libm.a was emulation of 80387 operations. One person pointed out that a common benchmark of floating point performance was quite good -- and by implication 386bsd was not emulating the 80387 operations. So, based on a suggestion by another individual, I profiled the entire BRLCAD rt (photorealistic ray tracer code) to discover rt was spending much of it's time in calls to transcendental functions. So, it was a simple matter from there to go to the library source codes -- the source code clearly showed the transcendental operations were being done by emulation which only employed simple (+, -, *, /) floating point ops. So, I check around on the net and found a mathematics library 'mathlib2' written by Glenn Geers (glenn@qed.physics.su.oz.au) which seemed to have the appropriate code for performing the higher functions using an 80387. Glenn's code required trivial modifications to be run under 386bsd -- it was simply a matter of prepending his entry points (e.g. 'sin') with an underscore (e.g. '_sin') so that they would be recognized by 386bsd's ld. I also had to prepend an underscore to a call to 'fprintf' in one of his routines. Finally, I didn't know what port to use for '_iob' so I just commented out one small bit of 80387 error reset code. Then I recompiled the BRLCAD rt and ran some benchmark examples. These benchmarks ran within about one percent of the time taken to execute the same examples under SCO UNIX SYSV/386 3.2r2.0 ODT 1.1 compiled with gcc. Unfortunately there are some very minor differences in the results which were generated by this revised 386bsd lib387.a and SCO ODT's libm.a; but those differences are few in number (only a small percentage difference in a few pixels scattered throughout a very large image area) - and most likely represent differences in precision between the two libraries. My guess is the difference arises from something trivial like a need to set 64 bit rather than 32 bit floating point precision. Unfortunately, I am not an expert in 80386/80387 assembly code so I don't know where to begin looking for an appropriate modification to close the apparent gap in precision between the two libraries. However, the result in terms of performance and precision using Glenn's libraries in place of the current 386bsd libm.a may be of interest to some readers of the group. Perhaps someone with some expertise in 80386/80387 assembly code would be kind enough to take a look at Glenn's libraries, set maximum appropriate precision, and put an updated libm.a (perhaps called lib387.a which would imply use of the library requires a 80387) online at agate and elsewhere. It seems to me this may be an important enhancement to the 386bsd system because it makes serious scientific computation (e.g. 3D photorealistic rendering which requires large virtual address space for the underlaying geometric models) possible on very inexpensive consumer grade computers. Moreover, by linking a series of 386bsd systems together over ETHERNET systems containing tcp/ip based distributed computing features (BRLCAD) will be able to coordinate parallel execution of segments of any really large problem across multiple systems for enhanced floating performance. This is exactly the kind of thing which I was discussing with my friend just a few weeks ago -- the availability of a free 386bsd will make it possible to put very powerfull and sophisticated computing capabilities (e.g. BRLCAD, GRASS, IRAF, KHOROS) in the hands of people who could never before afford the artificially inflated cost of the previously required AT&T licensed software (SCO ODT 1.1 configured like 386bsd would sell for about $4,285 -- and that's a big break on SCO's usual piecemeal pricing) required to run publically available scientific computing source codes. Thanks, AJ p.s. I suppose I should point out UNIX is claimed to be a trademark by some people at AT&T and ODT is claimed to be a trademark by some people at SCO. However, it is my impression the term UNIX has fallen into the public domain. Moreover, if anyone has a right to trademark the term 'ODT' it would be Digital Equipment Corporation. My copy of DEC's 1975 'lsi11 pdp11/03 processor handbook' on page 7-2 says ODT was built into their console microcode to eliminate the need for console switches. I would bet the pdp11/03 would run AT&T SYSV/386 and X11R4 very very very slowly. It's been a long time, but I seem to recall using ODT on older systems including OS/8 (PDP-8) and BBN TENEX/DEC TOPS-10 (PDP10) systems. So, I suspect ODT has also fallen into the public domain. ------------------------------------------------------------------------------ Alexander-James Annala Principal Investigator Neuroscience Image Analysis Network HEDCO Neuroscience Building, Fifth Floor University of Southern California University Park Los Angeles, CA 90089-2520 ------------------------------------------------------------------------------