*BSD News Article 12410

Xref: sserve comp.unix.sys5.r4:1750 comp.unix.pc-clone.32bit:1800 comp.unix.bsd:11642 comp.os.mach:2623 comp.bugs.sys5:1566 news.answers:6367
Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!cs.utexas.edu!zaphod.mps.ohio-state.edu!pitt.edu!gvls1!snark!esr
From: esr@snark.thyrsus.com (Eric S. Raymond)
Newsgroups: comp.unix.sys5.r4,comp.unix.pc-clone.32bit,comp.unix.bsd,comp.os.mach,comp.bugs.sys5,news.answers
Subject: Known Bugs in the USL UNIX distribution
Message-ID: <1kjRVw#6ZKSmX9jc7Zv6W5XZc763zfK=esr@snark.thyrsus.com>
Date: 9 Mar 93 04:13:34 GMT
Expires: 7 Jun 93 23:00:00 GMT
Sender: esr@snark.thyrsus.com (Eric S. Raymond)
Followup-To: comp.unix.sys5.r4
Lines: 1362
Approved: news-answers-request@MIT.Edu

Archive-name: usl-bugs
Last-update: Mon Mar  8 22:11:44 1993
Version: 12.0

Many FAQs, including this one, are available via FTP on the archive site
rtfm.mit.edu (aka pit-manager.mit.edu or 18.172.1.27) in the directory
pub/usenet/news.answers.  The name under which this FAQ is archived appears in
the Archive-nameline above.  This FAQ is updated monthly; if you want the
latest version, please query the archive rather than emailing the overworked
maintainer.

This is the bug posting's first anniversary.

What's new in this issue:
   * New bug info (see below)

(In the table below, bugs new this issue are marked with a ** at the
left margin; old bugs for which information has been added are marked
with *)

0. Table of Contents
I. Introduction
II. General Bugs
    1. UNIX kernel must lie below the 1024-cylinder mark
    2. Suid programs dump core when signalled
    3. DMAs on large ISA machines may fail
    4. There is a cylinder limit on disk size
    5. more(1) doesn't handle SIGWINCH
*   6. X performance problem
    7. C shell background process termination logs you out
    8. A security hole in login
    9. COFF problems with long filenames
    10. Flakeouts in the Wangtek device driver
    11. A kernel declaration bug
    12. Reading tar archives with cpio foos up on multiply-linked files
    13. Process accounting is broken
    14. tar(1) foos up in the presence of symbolic links
    15. Symbolic links can interfere with shellscript execution
    16. Piping a csh builtin causes the shell to hang.
    17. tar(1) fails to restore adjacent symbolic links properly
    18. COFF binaries linked with curses(3) and shared libc hang
    19. shl hangs, sxt devices bad
    20. num-lock prevents mouse from working properly
    21. adjtime() doesn't work
    23. cron mail doesn't go through aliasing
    24. fragility in xterm
    25. csh lossage due to bad optimization
    26. Bug in cp(1)
    27. tbl -me doesn't work
    28. who -r fragility leads to boot-time problems
    29. at(1) breaks here-documents in shell scripts
    30. UHC mouse driver ignores the middle button.
    31. mmap acces doesn't update file mod times
**  32. AT&T select(2) is incompatible with BSD select(2)
**  33. (4.2) The login program requires its PPID to be 1
**  34. (4.2) Bad MAXMINOR values can make the system unbootable
**  35. (4.2) Incompatible change in TZ interpretation
**  36. Nulls in pixmaps can crash X
III. Serial-port and tty administration problems
    1. Dropout problems with tty devices
    2. Quick port setup option in sysadm is broken
    3. ttymon drops DTR
*   4. (4.2) Terminating cu to a direct line locks up the port
IV. Networking and File-Sharing Bugs
    1. NFS locking is unusably slow
    2. UFS file system problems
    3. Byte-order problem with NFS when accessing Sun disks
    4. Under weird circumstances, lseek on UFS may cause corruption
    5. FTP problems
    6. A bug in the WD80x3 support
**  7. Security hole near fingerd
V. SCSI Support Problems
    1. sar is confused by SCSI
    2. A configuration problem
*   3. Synchronous SCSI hang problem
    4. ps chokes on commands that do SCSI I/O
    5. Transfer speed problems with Adaptec 1542B on 486s
VI. Development Tools Problems
*   1. General UCB library brokenness
    2. USL emulation of BSD signals doesn't work
    3. Possible string library problems
    4. USL's ndbm support is broken.
    5. An include file is missing
    6. sscanf(3) has a potential bug
    7. shmat(2) vs. vfork(2)
    8. FIONREAD fails on regular files
    9. fread(3) does the wrong thing on pipes and FIFOs
    10. putw appears to be broken
    11. Compiler problems
VII. The FUBYTE Problem
VIII. Destiny and Dell

I. Introduction

This posting lists known bugs in System V Release 4 implementations, and known
fixes applied by various porting houses (there's also random bits of
information about SCO UNIX here and there).  It was formerly part of the
386-buyers-faq issues 1.0 through 4.0, and is still best read in conjunction
with the pc-unix/software FAQ descended from that posting.

This document is maintained and periodically updated as a service to the net by
Eric S.  Raymond <esr@snark.thyrsus.com>, who began it for the very best
self-interested reason that he was in the market and didn't believe in plonking
down several grand without doing his homework first (no, I don't get paid for
this, though I have had a bunch of free software and hardware dumped on me as a
result of it!).  Corrections, updates, and all pertinent information are
welcomed at that address.

This posting is periodically broadcast to the USENET group comp.unix.sysv386
and to a list of vendor addresses.  If you are a vendor representative, please
check to make sure the information on your company is current and correct.  If
it is not, please email me a correction ASAP.  If you are a knowledgeable user
of any of these products, please send me a precis of your experiences for the
improvement of future issues.

The bug descriptions often include indications of fixes by the various porting
houses to their current releases.  These are:

Consensys UNIX Version 1.3			abbreviated as "Cons" below
Dell UNIX Issue 2.2				abbreviated as "Dell" below
Esix Revision A					abbreviated as "Esix" below
Micro Station Technology SVr4 UNIX		abbreviated as "MST" below
Microport System V Release 4.0 version 4	abbreviated as "uPort" below
UHC Version 3.6					abbreviated as "UHC" below
SCO Open DeskTop 1.1				abbreviated as "SCO" below

II. General Bugs

1. UNIX kernel must lie below the 1024-cylinder mark
   Bela Lubkin says "SCO's boot filesystem must lie below 1024 cylinder mark;
anything else can be anywhere.  This is more-or-less a limitation of the BIOS
interface that the bootstrap loader must use.  Could be circumvented by going
directly to controller hardware in the bootstrap loader, but that would be
horrendously complex with all the controllers & host adapters to be supported."
   Actually this is not quite right.  It's the *kernel* that must lie below
the 1K-cylinder mark; the rest of the root partition could extend above it.
But since partition endpoints are the only way to control where physical
blocks get allocated, it comes to the same thing

2. Suid programs dump core when signalled
   Mark Snitily of SGCS says that under many SVr4s, signalling a
process that is running suid root will cause it to core-dump.  He says
Dell and MST have fixed this, and SCO doesn't suffer from this.

3. DMAs on large ISA machines may fail
   On ISA machines with more that 16MB of RAM, SVr4 may try to do DMA
from outside the bus's address space, causing serious problems.  UNIX ought
to do an in-memory copy to within the low 16MB but the USL base code doesn't.
   Dell says they've fixed this, and that's been confirmed by a user.
   UHC says they've fixed this; they add that the special buffer-allocation
logic to handle the problem can be turned off with a tunable kernel parameter
if you've got less than 16M.
   Microport says they've fixed this in their new 4.1 release, shipping early
March.
   Esix offers a patch to correct this problem.
   SCO used to have a similar bug but fixed it long ago.
   John Sully <jms@mport.com> writes: "This was due to a bug in pre version 4
dma code.  The USL code has always at least attempted to do a copy from low
memory to high memory on systems with more than 16Mb of RAM.  By the way UHC is
wrong; the buffer allocation code only comes into play if you have more than
16Mb of memory.  You can turn it off if you have a machine (ie. an EISA bus)
which will allow you to do DMA above 16Mb.  You *must* have this tunable
(MAXDMAPAGE) turned on if you are using *ISA* bus masters in a system with more
than 16Mb of ram.  Unfortunately doing this will affect all drivers which do
dma as there is no good way to do this on a per-driver basis."

4. There is a cylinder limit on disk size
   Stock USL code is limited to 1,024 cylinders per Winchester, which
might cause problems with some disk drives.
   Microport, Dell, Esix, MST, and UHC have fixed this.

5. more(1) doesn't handle SIGWINCH
   It doesn't get its window size from the stty/termio structures, so it
doesn't cope with SIGWINCH properly.

6. X performance problem
   Stock X11R4 and R5 (at least prior to 1.2E) is said to hog the
processor if you use the LOCALCONNECT option.  Jan Brittenson
<bson@gnu.ai.mit.edu> posted the following workaround:

   I don't know what causes the standard X server to hog the CPU, but
it can be avoided. Use the following program instead of xinit. Compile
it with `$CC -O -o xserv xserv.c -lX11' where CC is either
/usr/ccs/bin/cc or gcc. Set DISPLAY and XINITRC and run `xserv' from
your home directory. This is just a q&d hack, and not really a
substitute for xinit -- but it works.

/* xserv.c -- start X server

   Start X server. Similar to xinit, but intended to
   circumvent the X386 CPU Hog Mode

   Jan Brittenson, June 2 1992  05:15 am
   with corrections by Adam Donnison <adam@shinto.saki.com.au> Tue, 2 Mar 1993
*/

#include <stdio.h>
#include <sys/types.h>
#include <signal.h>
#include <setjmp.h>
#include <unistd.h>
#include <libgen.h>

#include <X11/Xlib.h>
#include <X11/Xos.h>
#include <X11/Xmu/SysUtil.h>


extern int errno;

/* This may need to be "/usr/X386/bin/X386" */
#define DEFAULT_XPATH "/usr/bin/X11/X"

/* Start X server. Fork-exec server, passing the DISPLAY environment
   variable. Wait for server to get up and running (at which point it
   passes back a SIGUSR1), at which point the user xinitrc file is run. */

#define XINITRC ".xinitrc"
#define DEFAULT_XCOMMAND "xterm -g +1+1 -n login -display :0"

extern void *malloc (), free ();
extern char *basename (), *getenv (), *strcpy ();

/* X stuff */
Display *top_display;


/* This is supposed to be in libgen.a... */
static char
*basename (s0)
  char *s0;
{
  register char *s1;

  for (s1 = s0 + strlen (s0) - 1;
       s1 > s0 && *s1 != '/'; s1--);

  if (*s1 == '/')
    return s1+1;

  return s1;
}

jmp_buf sigusr1_frame;

static void
caught_sigusr1 (int dummy) { longjmp (sigusr1_frame, !0); }


static char
*dispname (s0)
  char *s0;
{
  register char *s1;

  for (s1 = s0 + strlen (s0) - 1;
       s1 > s0 && *s1 != ':'; s1--);

  return s1;
}


/* No arguments */
int
main (argc, argv)
int argc;
char **argv;
{
  char *xserver_file, *xinitrc_file, *home_path, *display, *display_X_arg;
  int xserver_pid, orgmask;
  
  
  /* Not that it really matters, just to avoid being used as a direct
     replacement for xinit. */
  
  if (argc != 1)
    {
      fprintf (stderr, "usage: %s\n", basename (*argv));
      exit (1);
    }
  
  
  /* Resolve xinitrc path. This is done before the server is
     started. */
  
  if (!(home_path = getenv ("HOME")))
    home_path = "/etc";
  
  if (!(xinitrc_file = getenv ("XINITRC")))
    {
      xinitrc_file = malloc (strlen (home_path) + 1 + strlen (XINITRC) + 1);
      sprintf (xinitrc_file, "%s/%s", home_path, XINITRC);
    }
  else
    xinitrc_file = strdup (xinitrc_file);


  /* Resolve display */
  if (!(display = getenv ("DISPLAY")))
    display = display_X_arg = ":0.0";
  else
    display_X_arg = dispname (display);


  /* Tell server to notify us when up and running */
  signal (SIGUSR1, SIG_IGN);
  orgmask = sigblock (sigmask (SIGUSR1));

  /* Start server */
  if (!(xserver_pid = vfork ()))
    {
      xserver_file = DEFAULT_XPATH;
      
      execl (xserver_file, xserver_file, display_X_arg, NULL);

      fprintf (stderr, "%s: can't exec %s (errno = %d) -- start-up aborted\n",
               basename (*argv), xserver_file, errno);
      exit (1);
    }

  if (xserver_pid < 0)
    {
      fprintf (stderr, "%s: can't fork (errno = %d) -- start-up aborted\n",
               basename (*argv), errno);
      
      exit (1);
    }
  
  /* Await signal from server */
#if 0
  /* Why the #@$*! doesn't this work?! */
  sigsetmask (orgmask);
  alarm (20);
  sigpause (sigmask (SIGUSR1) | sigmask (SIGALRM));
#else
  sleep (5);
#endif

  /* Open display */
  if (!(top_display = XOpenDisplay (display)))
    {
      fprintf (stderr, "%s: unable to open display '%s' -- start-up aborted\n",
               basename (*argv), display);
      exit (1);
    }
  
  /* Execute xinitrc file */
  if (system (xinitrc_file) < 0)
    system (DEFAULT_XCOMMAND);
      
  /* Close display */
  XCloseDisplay (top_display);

  /* Terminate server */
  kill (xserver_pid, SIGTERM);

  /* Finished */
  free (xinitrc_file);
}

7. C shell background process termination logs you out
   In C shell, unless "ignoreeof" is set, termination of a background
process will log you out.  With "ignoreeof" set, just the message
"Use logout to exit" will be printed.

8. A security hole in login
   David Wexelblat <dwex@mtgzfs3.att.com> reports: "There is a HUGE security
hole in /bin/login in all USL derived SVR4s before 4.0.4.  Refer to CERT
advisory CA-91:08, dated 5/23/91.  This is known to be present in AT&T SVR4
2.1, and Microport SVR4 3.1.  ESIX claims to have fixed it, Microport reports
that it is fixed in 4.1.  I won't give any more details unless necessary.
Suffice to say that this bug allows any non-privileged user on an SVR4 system
to get read-write access to any file on the system."

9. COFF problems with long filenames
   A source at Dell urges: "Our SVR4v2 did some stuff that USL didn't get
around to until SVR4v4.  Try Dell UNIX 2.1 with a COFF program on a large UFS
filesystem in a directory with long names.  Runs on Dell UNIX.  Breaks on
others."  I don't have more definite info yet.

10. Flakeouts in the Wangtek device driver
   Dell reports that USL's Wangtek device driver is seriously flaky.  "How'd
you like a multi volume backup where the second and subsequent volumes don't
follow on from the previous volumes?"  UHC confirms this and is actively
working on the problem.
   An anonymous SCOer says "The QIC02 tape controller `standard' is seriously
flaky.  Our driver's in pretty good shape but nobody will ever have a truly
solid driver that supports every QIC02 controller you can find."
   Gordon Ross <gwr@mc.com> reports: "Actually, the SCSI tape target driver
`st01' has a similar problem at version 4.0.3 which I corrected while I worked
on the SVR4 code.  The correction was provided to the support group at USL.
The actual problem was that the SCSI tape would return a `check status'
completion code which was just trying to inform the driver of the arrival
of the `logical end of media' indication but the driver was treating it
as an error.  The tape drive had in fact written the data, but the driver
incorrectly assumed that the "check status" return meant that it failed.
The result of this is that when you write into the end of the tape, you
can read back one more "chunk" than yu wrote.  Of course, cpio does not
like this at all when doing multi-volume backups..."

11. A kernel declaration bug
    A botch in USL's /etc/conf/pack.d/kernel/space.c (which is present in
Consensys 1.3, Dell 2.1, Esix 4.0.3A, Microport 4.0.3 and 4.0.4 and may also be
present in other SVr4s) can step on the linesw[] table.  The problem is that
the domain name array initialization is wrong and too short; thus, when it's
set, data past the end of the array can be stomped.  To fix this, find the
following near line 247:

	char srpc_domain[] = SRPC_DOMAIN;

and change it to

	char srpc_domain[SYS_NMLN] = SRPC_DOMAIN;

then rebuild the kernel.
   Microport officially knows about this bug and plans to fix it in a
near-future update release.  It has been fixed in Dell 2.2.

12. Reading tar archives with cpio foos up on multiply-linked files
    Paul De Bra <debra@info.win.tue.nl> reports the following:
    In theory, cpio(1) is supposed to be able to read tar(1) archives.  In
practice...don't try it.  Multiply-linked files will be extracted from the
archive, whether or not they match the current pattern and whether or not
you have selected 'u'.  This happens even if you use the `t' option, so
it's not even save to list the archive files!  

13. Process accounting is broken
    In 4.0.3, process accounting doesn't work.  From examining the accounting
scripts, it appears that /usr/lib/acct/accton is supposed to set a return code
depending on whether accounting was switched on already or not.  However, it
always returns the same result - accounting switched off.  This means that the
/usr/lib/acct/ckpacct script, which is run every hour to keep the proccess
accounting log in check, instead turns off accounting the first time it is run
after booting.  The same happens with the nightly /usr/lib/acct/monacct
program.
   I don't yet know whether this bug is present in 4.0.4.  It is definitely
un-fixed in Dell 2.1 and Consensys 1.3.  In Dell 2.2 the return bug is fixed,
but accounting isn't automatically enabled at boot time.

14. tar(1) foos up in the presence of symbolic links
    Tar can get the names of symbolic links wrong when creating an archive.
This bug can be demonstrated by doing the following:

   mkdir t
   cd t
   touch a 1234567890
   ln -s 1234567890 b
   ln -s a c
   tar vcf ../t.tar .

   The output generated by tar is:

   a ./ 0 tape blocks
   a ./a 0 tape blocks
   a ./1234567890 0 tape blocks
   a ./b symbolic link to 1234567890
   a ./c symbolic link to a234567890

(Note the above commands should be done in the order shown and in a new
directory)  This bug is nasty.  Recommended solution: use GNU tar.
   This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
other SVr4s as well.  It has been fixed in Dell 2.2.

15. Symbolic links can interfere with shellscript execution
   There is a problem running #! scripts when symbolic links are involved.
Typing in the following from a command shell demonstrates the problem:

   mkdir a b
   ln -s a c
   cd a
   cat > script <<!
   #!/bin/sh
   echo Hello
   !
   chmod 755 script
   cd ../b
   ln -s ../c/script .
   ./script

The message generated from the last line is:

     a/script: a/script: cannot open

   This is reported from Esix 4.0.3, Consensys 1.3, and Dell 2.2, but
probably exists on other SVr4s as well.

16. Piping a csh builtin causes the shell to hang.
   While running csh, this can be demonstrated by some of the following:

   echo Hello | cat
   history | more

(A solution to this one is use tcsh-6.02.)
   This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
other SVr4s as well.  It is reported fixed in Dell 2.2.

17. tar(1) fails to restore adjacent symbolic links properly
  Arthur Krewatt <...!rutgers!mcdhup!kilowatt!krewat> reports:
  SVR4 tar has another strange bug. Seems if when restoring files, you
restore one file that is a link, say "a ->/a/b/c/d/e" and there is another
link just after it called "b ->/a/b/c" tar will restore it as "b ->/a/b/c/d/e"
This just seems to be a lack of the NULL at the end of the string, like
someone did a memmov or memcpy(dest,src,strlen(src)); where it should be
strlen(src)+1 to include the NULL.

18. COFF binaries linked with curses(3) and shared libc hang
   ...eating the CPU.  Cause unknown.

19. shl hangs, sxt devices bad
   shl(1) does not work.  Try creating a layer and doing an 'ls'.  Your session
hangs.  Bruce Momjian <root%candle.uucp@bts.com>, who reported this bug, says
he believes it is the sxt devices which are broken.  It definitely exists in
Consensys 1.3.

20. num-lock prevents mouse from working properly
   When using the Motif window manager, if your num lock is on, your mouse
clicks are not recognized by the window manager.  The mouse still works in
xterm(1).  This is allegedly fixed in Destiny (4.2).

21. adjtime() doesn't work
  Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 adjtime() doesn't.
Calling `date -a' works to adjust the time slowly.

23. cron mail doesn't go through aliasing
  Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 cron mail to adm
doesn't get redirected by the aliases file.

24. fragility in xterm
  Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, doing ~! from
a cu in xterm kills xterm.  This has been fixed in Dell 2.2.

25. csh lossage due to bad optimization
  If a csh user sources a non-existent file in their .cshrc (eg, source .alias,
where .alias doesn't exist), then the system will hang for a couple of minutes.
Eventually the user get an "Out of memory" error and the console logs "NOTICE:
out of swap space - Insufficient memory to allocate 2 pages - system call
failed".
  This appears to be due to over-optimization of code surrounding a longjmp
call.
  (There are numerous other reports of memory leak bugs in csh).

26. Bug in cp(1)
   If ``copy'' encounters a directory before a file, it dumps core ...

--- cut ---
cd /tmp
mkdir copybug jnk
cd jnk
mkdir directory
>file
cp -r * /tmp/copbug
--- cut ---

This was reported from Consensys 4.0.3 but is probably a generic SVr4 bug.
It appears to have been fixed in ESIX SVR4.0.3A.

27. tbl -me doesn't work
   Wolfgang Denk reports that trying to use "tbl -me" for any input file causes
tbl to quit.  The problem is that newer tbl versions don't accept [nt]roff
contol lines (".rm @W") after .TS.

28. who -r fragility leads to boot-time problems
  It coredumps if the name of the timezone is longer than three characters.
This can be a real problem for European sites...  and is potentially more
hazardous than immediately apparent as _a lot_ of the initialization scripts
(rc1.d, rc2.d) use ``who -r'' to see if the machine is in single- or multi-user
mode.  And when ``who'' bombs out, the ``set'' command is iven an empty
command-line and can't do much else than print the shell variables, $1-$9
remain empty ... meaning that more or less all the scripts fail in various ways
and the system has an exceptionally hard time coming up.
   Peter Wemm <peter@DIALix.oz.au> reports that this bug was present in Dell
2.0, fixed in Dell 2.1, but reappeared in Dell 2.2.  Dell says it's a generic
USL bug.

29. at(1) breaks here-documents in shell scripts
   at adds gratuitous empty lines to the job submitted by the user.
This prevents shell here-documents from working.

30. UHC mouse driver ignores the middle button
   This may be a generic USL problem, but Dell (at least) has fixed it.  UHC
says they have a patch for it, but I haven't seen the patch.

31. mmap acces doesn't update file mod times
   Peter Wemm <peter@DIALix.oz.au> reports that under SVr4, if one mmap()'s a
file, and writes to it via the mapped memory, when the disk is updated, the
modification time does not update.

32. AT&T select(2) is incompatible with BSD select(2)

James Buster <bitbug@lynx.com> reports:

The select() system call waits for read, write, or exception activity
on a set of file descriptors, and yields an integer telling you how
much activity it found.

BSD's select(N,&R,&W,&E,&T) can yield up to 3*N, because BSD's select()
counts the number of bits that it turns on in in the R, W, and E
arguments, and R, W, and E each contain one bit per file descriptor.
However, System V Release 4 v2.1's select(N,&R,&W,&E,&T) yields at most N,
because SVR4's select() just counts the number of active file
descriptors, regardless of how many bits it turns on.

For example, the following code checks file descriptor 0.  In BSD, this
code can set n to 2 if file descriptor 0 is ready for both reading and
writing.  However, in SVR4, this code sets n to at most 1, because only
file descriptor 0 is active.

    int n;
    fd_set r, w;
    FD_ZERO(r);  FD_SET(0, &r);
    FD_ZERO(w);  FD_SET(0, &w);
    n = select(1, &r, &w, (fd_set*)0, (struct timeval*)0);

At least one widely used piece of software depends on the BSD
behavior, namely X11R5 (see Xt/NextEvent.c).  In this application, the
bug's symptoms are subtle and are rarely encountered, but they do
exist.

33. (4.2) The login program requires its PPID to be 1
   Rick Richardson reports: "The "/bin/login" program has been changed to be
hardwired to require its PPID to be "1".  In all other versions of UNIX, it is
sufficient that there be an /etc/utmp entry.  This bug was reported to USL, and
I did get a fixed "login" program from them, but the fix did not make it into
the release.  I don't know how mere mortals get the fix at this point."

34. (4.2) Bad MAXMINOR values can make the system unbootable
    Rick Richardson reports: "If MAXMINOR is stune'ed to the maximum value,
0x3fff (18 bits), then the kernel will refuse to boot, cycling up to driver
initialization and then doing a processor recent.  Interestingly, this bug was
not in the beta release, but was in the final release."

35. (4.2) Incompatible change in TZ interpretation
   Rick Richardson reports: "While not really a bug, this is a surprise.  In
4.2, the TZ variable was given a new meaning.  Rather than the traditional
CST6CDT type of value, it now looks like ":US/Central".  This causes 3.2 and
4.0 binaries which use the date/time routines to report GMT time.  I have no
idea why another variable name was not choosen.  I've taken to aliasing the
binaries, e.g. "TZ=CST6CDT svr4binary"."

36. Nulls in pixmaps can crash X
   Rick Richardson reports: "Displaying XPM2 pixmaps which have NULLS in them
will crash the X server.  Admittedly, this is not much of a bug, since these
are ill-formed or corrupted pixmaps.  But the server should stay up, even in
these conditions.  A little error checking needed."

III. Serial-port and tty administration problems

1. Dropout problems with tty devices
   The most serious problem anyone has reported is that the USL asy driver is
flaky and occasionally drops characters at above 4800 baud.
   Microport, Dell, Esix, and UHC say that they believe they've fixed this.
However, Dell, at least, was mistaken when they first made this claim; a more
detailed description of the problem is given below.  I have been assured that
this is on the fix list for the next Dell release.
   Bela Lubkin at SCO comments "386 interrupt latency vs. unbuffered UARTs.
This is a tough problem.  Nobody's driver should drop characters with a
turned-on 16550.  It's not so easy with a 16450.  Anyone with 16450s or lower
should be able to solve their problems by dropping in a 16550."

2. Quick port setup option in sysadm is broken
   In 4.0.3 sysadm, the quick port setup option, which is used to add and
delete terminal ports, is seriously broken.  The script modifies /etc/conf/*
files, and has incorrect minor numbers, sets the 5th field of sdevice.d if Y
when it should be N, and is missing columns for node.d.  See
/usr/sadm/sysadm/bin/q-add.

3. ttymon drops DTR
  Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 the ttymon(1)
utility for HDB uucp drops DTR every few weeks.  The workaround is to disable
and re-enable it.
   The SVr4.2 ttymon is even more broken; it *never* raises DTR after the
first outgoing call.  Jeremy Chatfield at IF has confirmed that this is a
real bug in the USL sources and is on his urgent-fix list.

4. (4.2) Terminating cu to a direct line locks up the port
    The problem is the C2 security mechanisms.  Terminating cu with ~.
doesn't tear them down correctly.  Subsequently, another cu(1) will be
able to get at the port, but utilities which try to get at it directly (i.e.,
cat or stty) won't be.
    Rick Richardson <rick@digibd.com> adds: "The "cu" problem where ports
can't be used by stty, seyon, or other programs once "cu" has had its way
with them:  This problem apparently affects any program (cu, uucp) that uses
the DIAL(3) routines.  Those routines have been modified to use the "cs"
connection server daemon to open the port and/or dial a phone number on behalf
of the client (though you'd hardly realize this from reading the manual page).
The "cs" daemon does *something*, where *something* is not known yet, which
causes all subsequent termio type ioctl's to fail.  This bug has been reported
to USL and Univel, but no fix has been forthcoming."
    He continues: "I had our streams device driver guy put in a version of one
of our serial port drivers with debugging turned on, and he said that it looked
like the driver "close" routine was never getting called - possibly because the
device close call only happens on the last close of a device, and the
connection server has still got the port open.  This theory would seem to
indicate that "cu" and "uucp" are fine, but that the connection server is
broken.  We don't really know, though -- its just a theory.

IV. Networking and File Sharing Bugs 

1. NFS locking is unusably slow
   Randy Terbush <randy@dsndata.dsndata.com> has posted code which
demonstrates a serious bug in the SVr4 NFS locking daemon.
   In his own words: "The symptoms are ~30% cpu usage by 'lockd' and
severe slowing of the machines on the network.  This program
demonstates that it takes ~20 seconds to obtain locks from an ailing
'lockd'.  We have verified that this bug does not exist in HPUX 8.0x."
   Randy's code is too large to be included here.  He is, quite
rightly, exercised at USL's exceedingly slow response to this problem.
The comment in his makefile reads, in part:

# USL has admitted to the existance of this bug in version 4.0, 4.1,
# and 4.2 of their distributed and yet to be released sources.  This is
# a network crippling problem that they have refused to fix until
# release 4.3, which will be OVER 1 YEAR from today. (29 Oct 1992)
# If your version of 'lockd' exhibits this same problem, I would
# strongly urge you to contact your vendor and ask them to put some
# pressure on USL to fix this problem.  SVR4 is virtually useless in a
# network of shared resources while this problem exists.

2. UFS file system problems
   In stock USL 4.0.3, you can't use a UFS file system as the root; the system
hangs if you try.  Consensys, Dell, Esix, Microport, MST, UHC, and ESIX all
appear to have fixed this.
   David Aitken, the UNIX product manager at UHC, writes "The ufs as root file
system [problem] was not really a bug, just a little oversight on USL's part -
we have fixed it completely by adding one line to the /stand/boot script:
rootfstype=ufs!"  He adds that they've been using ufs on their lab machines for
over 10 months with no trouble, and the latest UHC release defaults to ufs if
you have more than 120MB of disk.

3. Byte-order problem with NFS when accessing Sun disks
   Christoph Badura <bad@generics.ka.sub.org> notes that the stock USL resolver
library suffers from serious confusion about the byte order in the
socketaddr_in structure.  This bug is acknowledged by USL for the 4.0.4
release.  A symptom of this bug is that Sun disks will not mount correctly over
NFS. As a workaround, try removing the references to /usr/lib/resolv.so from
/etc/netconfig and rebooting your system.  Unfortunately, this will mean
you can't use nameservers.
   Alan Batie <batie@agora.rain.com> writes: "Actually, you don't have to
remove resolv.so, just put tcpip.so first and have a hosts file with the names
of hosts you want to do NFS mounts from.  This way you can use nameservers for
most things."

4. Under weird circumstances, lseek on UFS may cause corruption
  Christoph Badura <bad@generics.ka.sub.org> reports that a UFS lseek() to an
offset which is a multiple of 4096 but not a multiple of 8192, followed by a
write(), may corrupt the file being written.  The bug shows up only, if the
file has no pages in the page pool associated with it at the seek offset and at
4k before the seek offset.  He has sent USL kernel fix for this, which was
included in 4.0.4.

5. FTP problems
  The in.ftpd on SVR4.0.3 does not support all the commands listed in RFC 959.
When recent SCO UNIX/ODT versions ftp to SVR4.0.3, the SVR4 side will refuse,
drop the connection, and core dump after you authenticate.  This is because the
SCO end sends the 'SYST' command ala RFC 959, and the SVR4.0.3 end doesn't
recognise it.  Some ports have fixed this.
  Christoph Badura adds: "The bug is do to a longjmp(3) on a sigjmpbuf obtained
by sigsetjmp(3). ARGH. Testing led to a bug in the original BSD sources, which
is still present in the NET/2 ftpd.  "

6. A bug in the WD80x3 support
   MST reports a serious bug in the SVr4 kernel support for this card.  Here's
how to reproduce it:

	server: init 3 and share (export) /usr for example.

	client: mount -F nfs server:/usr /mnt
		cd /mnt
		find . -print | cpio -ocBuv > /dev/null

	what happens:
		server and client will "hang" together.

	"cue":
		hit keys on server and/or client, hang will go away
		for 10-20 seconds temporarily.  Yank BNC connectors
		do the same trick.

   They say they've heard from customers that this happens on Dell, UHC as well
as USL 4.0.4.  PCNFS/BWNFS network xcopy suffers this as well.  Client can be a
Sun Sparc for that matter.

7. Security hole near fingerd
   Jerry Whelan <guru@stasi.bradley.edu> reports:
	We encountered a cute security hole in AT&T SVR4 2.1 (which I believe
translates to USL 4.0.2).  It apparently was fixed in AT&T SVR4 3.0.  The
hole related to the finger daemon.  If a user set his .plan to a symbolic
link pointing to a protected file (such as /etc/shadow, or somebody's
mail file) then fingering the user would cause the finger daemon to read
that file and display it.
	I don't know if the bug exists in any other vendor's versions of 4.0.2.
	We replaced our fingerd with gnu finger, only to find the same problem.
I sent the changes back to the gnu finger developer, but I don't think a
newer fixed version has been officially released yet.

V. SCSI Support Problems

1. sar is confused by SCSI
   Sar -d doesn't work on SCSI drives.  Dell fixed this in 2.1 and it's
reported to work OK in Esix 4.0.3A; no report of any other SVr4 having fixed
this yet.  SCO fixed it in 3.2.4.

2. A configuration problem
   Stock USL requires you to jumper your SCSI devices to fixed IDs
during installation (it can be changed to any other ID after).  
   Dell says they've fixed this.  The requirement is definitely still present
in Esix and Consensys 1.3.  UHC thinks they've fixed this, but their 4.0.3.6
release still seems to demand ID 1 to install. 

3. Synchronous SCSI hang problem
   David Wexelblat <dwex@mtgzfs3.att.com> reports: "Stock SVR4.0.3 will hang
the SCSI bus with a 1542 in synchronous mode.  Dell fixed this, and this has
been given to Microport [ed note: Microport 4.0.4 and Consensys 4.0.3 have
fixed the problem; MST UNIX and Esix 4.0.3 still have this problem; I have not
yet been able to determine if ESIX 4.0.4 does].  In the file /sbin/bcheckrc,
change the line:

	echo MARK > /dev/rswap

to
		
	echo MARK | dd of=/dev/rswap bs=512 conv=sync > /dev/null 2>&1

The magic is apparently the conv=sync, which forces a 512 byte block
to be written.  The original echo writes 4 bytes, which apparently causes
synchronous SCSI to go out to lunch.

Now, you ask, how can I fix this, since the system won't boot?  There are
a couple of methods.  First, if possible, disable synchronous negotiation
(1542 jumper J5-1 removed, plus whatever you may need to do to your drive).
Then boot up, edit /sbin/bcheckrc, then shutdown, restrap for synchronous,
then reboot.  Everything should be OK.

That's the easy way.  Unfortunately, some hard drives will only work
in synchronous mode.  Well, you can still recover from this phenomenon.
Here's how:

        1) Install on your hard drive
        2) Boot from the first boot floppy.  When it tells you to, insert
           the second boot floppy.  At the first prompt, hit <DEL> to
           break out to a shell.
        3) Mount your hard drive under /mnt with the following command
           (replace FS-TYPE with s5, s52, or ufs, whichever you used for
           for your root partition):

                /etc/fs/FS-TYPE/mount /dev/dsk/c0t0d0s1 /mnt

        4) Now edit /mnt/sbin/bcheckrc:

                ed /mnt/sbin/bcheckrc

           You may want the 'ed' man page handy (I barely remember how to
           to use 'ed' :->).  For simplicity, you can delete/comment out
           the offending line, then replace it with the correct line later.
        5) Unmount the hard drive:

                umount /mnt

        6) Reboot from the hard drive.  Everything should come up OK. and
           you can finish editing /sbin/bcheckrc, if necessary.

Note that you perform these actions at your own risk.  The first version was
performed by me on Microport SVR4, and the second was performed by someone
else (on my suggestion) on ESIX SVR4."
   This problem appears to be fixed on Consensys 1.3 and Dell 2.1; also
(pace David's remark) in ESIX 4.0.4, which has

	echo MARK | /sbin/dd.arch conv=sync > /dev/rswap 2> /dev/null

4. ps chokes on commands that do SCSI I/O
  Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, ps
doesn't work when a SCSI command in progress. It stops printing at the
process executing the scsi command.
  This is still broken in Dell 2.2.

5. Transfer speed problems with Adaptec 1542B on 486s
  If a system mount or install fails, try setting the DMA speed to 5MB/s,
rather than the default 5.7MB/s.  This is accomplished by removing the jumper
shorting the 12th pin pair of jumper block 5.

VI. Development Tools Problems

1. General UCB library brokenness
   The BSD compatibility libraries were badly broken in USL code.  A Dell
source adds "That meant that almost all the apps derived from them were broken
too.  Most stuff like automount will die when you send a SIGHUP, instead of
rereading the map file.  You can get a system into very strange states when
that happens."
   John Sully <jms@mport> of Microport opines: "This is a bug in automount
itself rather than BSD compatibility, since the automount which comes with SVR4
is not compiled with the BSD libraries.  (isn't this comforting??  :-()."

   Peter Wemm <peter@DIALix.oz.au> reports "There is a very simple and reliable
sure to this sort of thing: Using your favourite hex editor, change all
instances of "signal" in the binary file to "sigset".  Most BSD code assumes
that signal() auto-rearms after handling a signal.  On SVR4, signal() does not,
but sigset() is argument compatible, and has BSD semantics."

   Esix and UHC's BSD libraries are USL stock.  I don't yet know
the status of other ports.  Microport has run into things they think may be
symptoms of this but have no fix yet.

   John Sully <jms@mport> of Microport counters with: "One common thread I find
on reading of these problems is that the BSD compatibility libraries are
*misused*. [...] The problem is that BSD and SYSV have similarly named .h files
which sometimes contain different definitions for objects with the same name.
This has been known to cause all sorts of problems because the SYSV headers are
picked up and then the calls are satisfied from the BSD library rather than the
shared object library.  I have found that if you use /usr/ucb/cc that the BSD
compatibility is much less broken than it would seem at first because it
ensures that the correct headers are picked up."

   However, note that there is at least one *real* bug known --- as of 4.0.4
the signal emulation cannot explicitly set a handler to SIG_DFL or SIG_IGN.

   Ron Guilmette <rfg@ncd.com> writes "[Library lossage] may be easily
demonstrated by attempting to build and link the GNU C compiler with
`-L/usr/ucblib -lucb'.  The resulting compiler will most certainly
crash and die."  John Sully thinks this is because the /usr/ucb/cc
compiler should have been used, but wasn't.

2. USL emulation of BSD signals doesn't work
   A different source reports that the the USL implementatation of BSD signals
is broken in both 4.0.3 and 4.0.4; in particular, the sigvec() family doesn't
work properly.  It is possible to make minor tweaks to source to make such apps
work properly with the native USL signals implementation.

   Here's more on the signals problem, thanks to Richard <rc@siesoft.co.uk>:
------------------------------------------------------------------------------
The problem is to do with the signal() function that is within the BSD
compatability libc. 

To reproduce the problem do the following:

#include <stdio.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/siginfo.h>

main()
{
	signal(SIGPIPE,SIG_IGN);
	pause();
}

and compile it with cc xx.c -o xx /usr/ucblib/libucb.a

(John Sully observes that this is definitely wrong; /usr/ucb/cc should have
been used rather than "cc ... -L/usr/ucblib -lucb" or the equivalent "cc ...
/usr/ucblib/libucb.a".)

If you run the program and then signal it with a SIGPIPE, the program
will die, even though you've told it to ignore SIGPIPE.

The fix is difficult unless you've got source because there's a missing 'else'
clause from the signal() code. This is the only signal fault I've found in
the BSD signal functions, details of the rumoured sigvec problem would be
useful?

If you're trying to compile an application you could change the application
code to do the following, this does work..

void
catch(s)
int	s;
{
	/* DO NOTHING */
	;
}

main()
{
	signal(SIGPIPE,catch);
	pause();
}

SUMMARY
You can only change a signal handler to a function handler, any number of
times.  Any attempt to set the handler to SIG_DFL, or SIG_IGN will fail.

This bug has given some people working with X11R5 aggro, causing the X server
to die when you close a client. 

  Christoph Badura <bad@flatlin.ka.sub.org> confirms this bug
He has sent USL a source fix.  It appears already to have been fixed in Dell
2.2.
------------------------------------------------------------------------------

3. Possible string library problems
   There are also persistent rumors of problems in the BSD-emulation string
libraries.  I have not been able to pin down specifics on this.

4. USL's ndbm support is broken.
   Christoph Badura <bad@generics.ka.sub.org> reports "The ndbm functions in
the ucb library are broken [apparently due to a compiler of optimizer bug in cc
-- ed.].  Try makeing the whatis data base for /usr/share/man with Tom
Christiansen's perl rewrite of man. 
   The easiest way to fix this is to compile GNU's replacement ndbm.c with gcc
-fpcc-struct-return -traditional (gcc1.40 or 2.2 will do nicely) and install it
in your C library.  Source is available for FTP from prep.ai.mit.edu.

5. An include file is missing
   Both 4.0.3 and 4.0.4 USL versions are missing the documented dial.h
file from their /usr/include directory.  Dell 2.1 has it.

6. sscanf(3) has a potential bug
   Anthony Shipman <als@bohra.cpg.oz.au> reports: " I found the following bug
in SCO Unix 3.2.* and I think it may be common to many AT&T derived Unixes.

sscanf() calls _doscan() to read from a pretend file.  The file
uses the string as a buffer and a fake file descriptor of 60 (=_NFILE).  
Since _NFILE (for SCO UNIX) is 60 it assumes that fd 60 can never be open.

Then when fscanf() hits the end of the string it calls _filbuf() to read
into the buffer (which is the string) from fd 60.  This should fail with
an errno=9 and then _filbuf() sets EOF and it all terminates.

However in SCO Unix you can reconfigure the kernel to increase the number
of files per process to a recommended maximum of 150.  If you do this then
your program might have fd 60 open one day.  Then sscanf() will read from this
file overwriting your string.  The byte count to the read() in _filbuf() 
is some undefined but large value so a lot of memory will be overwritten.  In
my case the string was on the stack so my stack was wiped.

In short if you configure your kernel to have NOFILES > _NFILE ie more than
the default then sscanf() is a time bomb in your code."

7. shmat(2) vs. vfork(2)
   The shmat(2) call is known to interact bady with vfork(2).  Specifically,
if you attach a shared-memory segment, vfork(), and then the child releases
the segment, the parent loses it too!  Workaround; use fork(2).
   UHC and Microport both suspect that they still have this bug and opine that
anyone who uses vfork deserves to lose.  Dell has no plans to fix it.

   John Sully <jms@mport.com> writes: "This is not a bug.  It is completely
consistent with the semantics of a change to the address space of the child.
Think about it: any change to the address space of a child process created by
vfork(2) is reflected in the parent since the child is actually executing in
the parent's address space.  Therefore if the child changes the address space
(in this case by releasing the shared memory segment) what should happen?
Right, the parent should have the same change happen.  And what does happen?
The segment is released in the parent.  One can argue about the braindead
semantics of vfork(2) all day, but the fact remains that this is exactly what
one would expect to happen.  To quote from the manual page:

     [...] vfork differs  from fork  in
     that the	child  borrows	the parent's *memory* and thread of
     control until a call to execve or an exit (either by a  call
     to	 exit  or  abnormally.) [ emphasis added ]

and later:

     It does not work, however, to return while
     running in	the child's  context  from  the	 procedure  which
     called vfork since	the eventual return from vfork would then
     return to a no longer existent  stack  frame.

Please note that the entire address space of the parent is used by
the child created by vfork(2).  The manual page also points out
several other caveats involved in doing anything to the parent's
address space except successfully calling an exec family function or
_exit (note it specifically says *not* to call exit(2)).  I do not believe 
that having a shared memory segment disappear from the parent's address 
space is out of line after reading the man page for vfork(2).

It is interesting to note that Sun after implementing its new VM system in
SunOS 4.0 initially had no plans to support vfork, since they felt that the COW
semantics of the new fork would provide the necessary efficiency gain.  Indeed
they found that most programs which used vfork worked just fine by doing
-Dvfork=fork.  All that is, except for a certain popular command interpreter
[ed: can you say C shell?].  So we are stuck with the legacy of this braindead
system call.

BTW, Microport has no plans to fix this :-)."

8. FIONREAD fails on regular files
  Christoph Badura <bad@generics.ka.sub.org> reports that the FIONREAD ioctl()
fails on regular (disk) files.  He has sent USL a one-line kernel fix.

12. fread(3) does the wrong thing on pipes and FIFOs
   Ed Hall <edhall@rand.org> writes: "Unlike the raw read() system call,
fread() is supposed to be able to make several partial reads to satisfy the
data requested by its arguments.  The exceptions are an EOF or an error on the
stream.  This characteristic is quite useful when moving data through pipes or
over network connections, since partial reads are quite common in these cases.
Well, the version of fread() in ESIX 4.0.3 (and likely other Sys5R4's) only
does a single physical read, and if it only satifies part of the requested
number of bytes, that's all you get.  This can sting you even if you carefully
check the value returned by fread(), since the value returned is rounded down
to the number of complete "nitems" read, although your position in the stream
can be up to size-1 bytes beyond that point.  Neither ferror() nor feof()
indicate anything is wrong when this happens."
   This bug (which is also present in 4.0.4) is serious and nasty and should
be high on every porting house's list to fix.  It appears to be peculiar to
USL 4.0.3 and 4.0.4; 4.0.2 does *not* have it, nor does SCO.
   A USL source claims it has been fixed in 4.1.

10. putw appears to be broken
   There is a bug in the ESIX SVR4.0.3A putw() routine in the C shared
library which is probably USL's.  The following program demonstrates
it:

/* compile with: cc -o file file.c */
#include <stdio.h>
main()
{
	int i;
	for (i=0; i<1022; ++i) {
		putchar('1');
	}
	putw(-11, stdout);
	for (i=0; i<1022; ++i) {
		putchar('1');
	}
}

The putw() routine does not output 4 bytes, as it should.  It may be
there is some interaction with buffer flushing that is causing the
problem.  Also, note that if you change the sign of the first argument
to putw(), the program works fine.

11. Compiler problems
   Ronald Guilmette <rfg@ncd.com> also reports the following:

------------------------------------------------------------------------------
/* Here is a bug in the original SVR4 C compiler (aka C Issue 5) which
   effectively prevents you from making good use of the `const' and
   `volatile' qualifiers defined by ANSI C in conjunction with pointer
   types and typedef statements.  Compile this code and you will get:

   "qualifiers.c", line 23: left operand must be modifiable lvalue: op "="

   ...if your copy of the svr4 C compiler still has the bug.  Note that
   given these declarations, the ANSI C standard say that the thing pointed
   to by the variable `pci' should be considered to be constant... not the
   variable `pci' itself.  (The GCC compiler, either version 1.x or version
   2.x, correctly compiles this example without complaint.)
*/

typedef const int *ptr_to_const_int;

ptr_to_const_int pci;

int i;

void main ()
{
  pci = &i;
}
------------------------------------------------------------------------------
/* Here is a subtle bug in the original SVR4 C compiler (aka C Issue 5)
   which prevents you from first declaring a tagged type (i.e. a struct
   type or a union type) in a parameter list, and then defining that tagged
   type later on within the same scope.  (Note that according to the ANSI C
   standard, the scope in which parameters get declared and the outermost
   block of a function body are one and the same scope.  Thus, this really
   is legal ANSI C code!)

   Try compiling this with your C compiler on SVR4.  If your compiler still
   has the bug, you will get:

   "tagged_type.c", line 24: warning: dubious tag declaration: struct S
   "tagged_type.c", line 28: warning: improper member use: i
   "tagged_type.c", line 28: warning: improper member use: i
   "tagged_type.c", line 31: warning: dubious tag declaration: struct S
   "tagged_type.c", line 35: warning: improper member use: i
   "tagged_type.c", line 35: warning: improper member use: i

   (The GCC compiler also had this bug in version 1.x, but it has been fixed
   in version 2.x.)
*/

void foobar1 (arg)		/* use old-style without prototypes */
    struct S *arg;
{
  struct S { int i; };		/* define the type `struct S' */

  arg->i = arg->i;		/* legal according to ANSI C rules! */
}

void foobar2 (struct S *arg)	/* use new-style with prototypes */
{
  struct S { int i; };		/* define the type `struct S' */

  arg->i = arg->i;		/* legal according to ANSI C rules! */
}
------------------------------------------------------------------------------
/* Here is a serious bug in the original SVR4 `dump' program which dumps
   out parts of object files in either plain hex form or symbolically.

   To see the `dump' program get a segfault and die, save this code under
   the name `dump-bug.c' and then do:

	cc -g -c dump-bug.c
	dump -v -D dump-bug.o

   The bug arises whenever `dump' tries to read Dwarf debugging information
   for an array of pointers to any "user defined" type (e.g. `struct S' in
   this example).  Past that point, `dump' is totally confused, so further
   Dwarf debugging information finally causes it to go belly-up.
*/

struct S { int i; };
struct S *array[10];
int j;
------------------------------------------------------------------------------
It appears that the svr4 C compiler (for x86 machines) doesn't conform real
well to either the letter or the spirit of the IEEE 754 floating-point
standard.  In particular, "unordered comparisons" and other operations on
NaNs don't always produce the result that that the IEEE 754 standard calls
for.

An AT&T source comments: "This is documented in the SVID as a future direction.
We do not support NaNs in -Xa and -Xt modes, only in -Xc.  Try
isnan(sqrt(-1.0)) to determine which modes support it."
------------------------------------------------------------------------------

The compiler fails to issue diagnostics for cases where a floating point
literal is given which exceeds the range of its type (either float or
double).  Actually this one could be argued either way, since IEEE FP
format includes "infinities" and the compiler probably just changes any
FP value which is out of range for its type into either positive infinity
or negative infinity (as appropriate).

The compiler fails to issue diagnostics in cases where a typedef name is
reused to declare a formal parameter, as in:

-----------------------------------------------------------------------
typedef int FOO;
void bar (FOO)
    int FOO;
{
}
-----------------------------------------------------------------------

The compiler crashes on the following invalid input:

-----------------------------------------------------------------------
int i;
volatile void *pvv;

void pvv_test ()
{
  (i ? *pvv : *pvv);    /* ERROR */
}
-----------------------------------------------------------------------

The compiler fails to issue diagnostics for cases where an attempt is
made to "forward declare" an enum type (without also defining it), as
in:

-----------------------------------------------------------------------
enum enum0 *ep;       /* ERROR */
-----------------------------------------------------------------------

The compiler rejects the following code with an error, although there
seems to be no good reason why it should (because no object is being
declared).

-----------------------------------------------------------------------
#include <limits.h>

typedef char array_type[ULONG_MAX];
-----------------------------------------------------------------------

VII. The FUBYTE Problem

(Thanks to Christoph Badura <bad@flatlin.ka.sub.org> for this info)

The kernel function fubyte() is documented to return a positive value when
given a valid user space address and -1 otherwise. In the latter case u.u_error
is set to EFAULT.  USL SysV R4.0.3 has a sign extension bug in the
implementation of fubyte() for local file descriptors (i.e. not opened via
RFS), which causes fubyte() to return negative values if the byte fetched has
its high bit set. This bug doesn't affect STREAMS drivers, as they don't call
(and in fact are normally unable to call) fubyte().  Thus writing a byte with
the high bit set to certain character device drivers returns with -1 and errno
set to EFAULT.

The bug may affect any character device driver that calls fubyte(). It's not
limited to serial card drivers. The bug is noticed most often with serial card
drivers, since uucp uses byte values > 127 very early during g-protocol setup
and drivers for serial cards tend to use fubyte() quite often.

Note also that the bug's effect is different if the driver checks for a -1
return value of fubyte() or just a negative one. In the former case it is
possible to pass bytes with the 8 bit set through fubyte(), except for 0xff
which is -1 in two's complement. That makes the bug more obscure.

The fix is easy.  First, make a backup copy of the kernel object file
/etc/conf/pack.d/kernel/vm.o!  A disassembly of vm.o(lfubyte) should reveal
*exactly* one mov[s]bl (move byte to long w/sign extend).  That one needs to be
patched into a movzbl (zero extend). The difference is one bit in the second
byte of the opcode.

The movsbl has the bit pattern 00001111 1011111w mod/rm-byte.
The movzbl has the bit pattern 00001111 1011011w mod/rm-byte.

The 'w' bit is 0 for the instruction in question. So the opcodes are 0f be and
0f b6. Here is the diff -c from dis -F lfubyte showing the patch applied to
the Dell 2.1 kernel:

*** vm.o	Mon Mar  9 00:31:38 1992
--- vm.o.org	Mon Mar  9 00:32:40 1992
***************
*** 22,28 ****
  	11c90:  85 c0                 testl  %eax,%eax
  	11c92:  75 09                 jne    0x9 <11c9d>
  	11c94:  8b 45 08              movl   8(%ebp),%eax
! 	11c97:  0f b6 00              movzbl (%eax),%eax
  	11c9a:  89 45 fc              movl   %eax,-4(%ebp)
  	11c9d:  c7 05 d8 13 00 00 00 00 00 00 movl   $0x0,0x13d8
  	11ca7:  83 3d dc 13 00 00 00  cmpl   $0x0,0x13dc
--- 22,28 ----
  	11c90:  85 c0                 testl  %eax,%eax
  	11c92:  75 09                 jne    0x9 <11c9d>
  	11c94:  8b 45 08              movl   8(%ebp),%eax
! 	11c97:  0f be 00              movsbl (%eax),%eax
  	11c9a:  89 45 fc              movl   %eax,-4(%ebp)
  	11c9d:  c7 05 d8 13 00 00 00 00 00 00 movl   $0x0,0x13d8
  	11ca7:  83 3d dc 13 00 00 00  cmpl   $0x0,0x13dc

Of course there is a workaround at the driver level.  Canonically, one would do
this by checking for fubyte() returning -1 *and* u.u_error being set to EFAULT
(u.u_error is cleared upon entering a system call).  However, in R4.0.3
fubyte() does NOT set u.u_error.  It *does* set u.u_fault_catch.fc_errno.

Cristoph reports that Dell V.4 can be object-patched successfully to fix this.
I'm told that the offending 11c97 is at exactly the same address in the
Consensys 1.3 kernel.  I do not know the status of the other ports.

Another poster (Marc Boucher <marc@cam.org>) adds:

On ESIX SVR4.0.3 Rev. A, the instruction movsbl in question can be changed to
movzbl (as described above) with a binary-editor on file
/etc/conf/pack.d/kernel/vm.o. At offset 0x11eb0, change 0xbe to 0xb6.

Before patching, verify that your /etc/conf/pack.d/kernel/vm.o is the same as
mine!  On my system, the /bin/sum generated checksum of vm.o was "4440 222".

The problem results from a sign-extension bug.  The function lfubyte(), which
is called by fubyte(), is declared as

int lfubyte(char *addr);	/* actually caddr_t */

The byte is fetched with

	val = *addr;

which triggers sign extension.  Casting addr to a unsigned char * or declaring
it as such solves the problem.

This bug is still present in stock USL 4.0.4.  However, it has been fixed in
Dell 2.2.

VIII. Destiny and Dell

A source at at UNIX System Labs Europe claims that `Destiny' (the new Release
4.2) incorporates all of Dell UNIX's fixes to 4.0.3; thus, any bug for which a
Dell fix is indicated above should be gone in Destiny.
--
	Send your feedback to: Eric Raymond = esr@snark.thyrsus.com