Return to BSD News archive
Newsgroups: comp.os.386bsd.bugs Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!haven.umd.edu!uunet!mcsun!sun4nl!eur.nl!pk From: pk@cs.few.eur.nl (Paul Kranenburg) Subject: Conspiring bugs Message-ID: <1993Apr5.132031.10000@cs.few.eur.nl> Keywords: VMSTAT,VM,NFS Sender: news@cs.few.eur.nl Reply-To: pk@cs.few.eur.nl Organization: Erasmus University Rotterdam Date: Mon, 5 Apr 1993 13:20:31 GMT Lines: 184 Recently, several bugs and omissions conspired against me to cause heavy system crashes. While working on the NET/2 version of vmstat(8) to cooperate with the current VM statistics variables in the kernel. I noticed that the executable produced reasonable output the first time it was run. However, subsequent invocations started to dump core with a bus error. Also, the kernel would panic somewhere in the NFS code at a later time (now known to be related to vmstat's core dump). [ I should explain that I run in `dataless' mode: local root filesystem, /usr NFS mounted from a Sun IPC]. This is what happened. First, a typo in vmstat.c tricked the vmstat into issuing a "read(fd, 0, 4)" which unfortunately did not lead to an immediate SIGSEGV, because the hardware does not automatically detect a protection violation while in kernel mode. However, it does mark the page (which is mapped to the running executable's text segment) as modified. Upon process termination the allocated VM object is entered in the object cache. When the time comes to flush the modified page (say by the pageout daemon or induced by a `rm a.out') the kernel wants to write the bogus page onto its backing store. The vnode pager now takes control and prepares a call to the vnode layer write routine (VOP_WRITE). This will always fail on a NFS filesystem: the uio_procp field used by vnode_pager_io() is filled with a NULL pointer to which the nfs_write() routine reacts badly. In addition to this, the credentials passed to VOP_WRITE are those of the current process which may not suffice to make the NFS operation succeed. The following example demonstrates this: #include <sys/types.h> #include <sys/mman.h> #include <sys/file.h> #include <fcntl.h> #define SIZE 4096 main() { char *ad; int i,j; int fd; fd = open("xxx", O_RDWR|O_CREAT, 0666); if (fd == -1) { perror("open"); exit(1); } ftruncate(fd, SIZE); ad = mmap(0, SIZE, PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, fd, 0); if ((int)ad == -1) { perror("mmap"); exit(0); } for (j = 0; j< SIZE; j++) ad[j] = 1; /* munmap(ad, SIZE); */ printf("Sleeping\n"); sleep(100); printf("Done\n"); return 0; } Run this on an NFS filesystem. While the process is sleeping cause its modified page the get paged out by starting some other memory hog (say X11) under another userID. The page never makes it back to the file. While the process on whose behalf a pageout takes place may no longer be available, we can hang on to the credential structure for IO operations. The patches attached below take care of this by adding a credentials field to the vnode pager data. A similar change could be made to the swap pager to allow swapping on a NFS mounted file (if the rest of the swapping code would allow for that). -pk ------------------------------------------------------------------------------ ------- vnode_pager.c ------- *** /tmp/da12915 Mon Apr 5 15:16:17 1993 --- vnode_pager.c Sat Apr 3 12:19:40 1993 *************** *** 149,154 **** --- 149,157 ---- vnp->vnp_flags = 0; vnp->vnp_vp = vp; vnp->vnp_size = vattr.va_size; + vnp->vnp_cred = p->p_ucred; + if (vnp->vnp_cred) + crhold(vnp->vnp_cred); queue_enter(&vnode_pager_list, pager, vm_pager_t, pg_list); pager->pg_handle = handle; pager->pg_type = PG_VNODE; *************** *** 195,200 **** --- 198,205 ---- vrele(vp); } queue_remove(&vnode_pager_list, pager, vm_pager_t, pg_list); + if (vnp->vnp_cred) + crfree(vnp->vnp_cred); free((caddr_t)vnp, M_VMPGDATA); free((caddr_t)pager, M_VMPAGER); } *************** *** 415,421 **** struct iovec aiov; vm_offset_t kva, foff; int error, size; ! struct proc *p = curproc; /* XXX */ #ifdef DEBUG if (vpagerdebug & VDB_FOLLOW) --- 420,426 ---- struct iovec aiov; vm_offset_t kva, foff; int error, size; ! /* struct proc *p = curproc; /* XXX */ #ifdef DEBUG if (vpagerdebug & VDB_FOLLOW) *************** *** 458,466 **** vnp->vnp_vp, kva, foff, size); #endif if (rw == UIO_READ) ! error = VOP_READ(vnp->vnp_vp, &auio, 0, p->p_ucred); else ! error = VOP_WRITE(vnp->vnp_vp, &auio, 0, p->p_ucred); #ifdef DEBUG if (vpagerdebug & VDB_IO) { if (error || auio.uio_resid) --- 463,471 ---- vnp->vnp_vp, kva, foff, size); #endif if (rw == UIO_READ) ! error = VOP_READ(vnp->vnp_vp, &auio, 0, vnp->vnp_cred); else ! error = VOP_WRITE(vnp->vnp_vp, &auio, 0, vnp->vnp_cred); #ifdef DEBUG if (vpagerdebug & VDB_IO) { if (error || auio.uio_resid) ------- vnode_pager.h ------- *** /tmp/da12918 Mon Apr 5 15:16:18 1993 --- vnode_pager.h Sat Apr 3 12:19:41 1993 *************** *** 47,52 **** --- 47,53 ---- struct vnpager { int vnp_flags; /* flags */ struct vnode *vnp_vp; /* vnode */ + struct ucred *vnp_cred; /* user credentials */ vm_size_t vnp_size; /* vnode current size */ }; typedef struct vnpager *vn_pager_t; ------- nfs_bio.c ------- *** /tmp/da12926 Mon Apr 5 15:17:19 1993 --- nfs_bio.c Fri Apr 2 21:44:10 1993 *************** *** 235,241 **** * Maybe this should be above the vnode op call, but so long as * file servers have no limits, i don't think it matters */ ! if (uio->uio_offset + uio->uio_resid > p->p_rlimit[RLIMIT_FSIZE].rlim_cur) { psignal(p, SIGXFSZ); return (EFBIG); --- 235,242 ---- * Maybe this should be above the vnode op call, but so long as * file servers have no limits, i don't think it matters */ ! if (vp->v_type == VREG && p && ! uio->uio_offset + uio->uio_resid > p->p_rlimit[RLIMIT_FSIZE].rlim_cur) { psignal(p, SIGXFSZ); return (EFBIG);