Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!gatech!udel!newsserv.cs.sunysb.edu!stark.UUCP!stark!stark!gene From: stark!gene@newsserv.cs.sunysb.edu (Gene Stark) Newsgroups: comp.os.386bsd.bugs Subject: "Copyout" patch to trap.c (FIX) Date: 20 Dec 93 09:09:53 Organization: Gene Stark's home system Lines: 129 Distribution: world Message-ID: <STARK!GENE.93Dec20090953@stark.uucp> NNTP-Posting-Host: stark.uucp Several weeks ago I posted a patch to sys/i386/i386/trap.c that would fix a problem with VM allocation from within a system call (i.e. inside the copyout routine). Since then, I have discovered that this fix exposed another bug which produces system instability. Although I had already sent a fix for this new bug to the FreeBSD folks, reading another post just now reminded me that there might be other people who installed my first patch who will need the second one, too. So, here it is. If you installed my first patch, it is pretty important to install this one, too. *** /usr/src/sys/i386/i386/trap.c Sat Nov 20 14:47:44 1993 --- trap.c Sat Dec 11 20:46:53 1993 *************** *** 347,352 **** --- 347,359 ---- */ if (nss > vm->vm_ssize) vm->vm_ssize = nss; + /* + * va could be a page table address, if the fault + * occurred from within copyout. In that case, + * we have to wire it. (EWS 12/11/93) + */ + if (ispt(va)) + vm_map_pageable(map, va, round_page(va+1), FALSE); va = trunc_page(vtopte(va)); /* for page table, increment wiring as long as not a page table fault as well */ You should also be aware that the FreeBSD team is working on a revised pmap module, and the above fixes may be supplanted by totally new pmap.c and maybe trap.c code in the next release. - Gene Stark (Explanation of bug follows for those interested.) ----------------------------- The bug is exercised by doing the following in sequence: (1) Causing a new page of PTE's to be allocated from within a system call, e.g. reading a > 4MB file into emacs. (2) Causing pageout to be initiated, e.g. by stopping emacs and then running a process that allocates and accesses a large amount of memory. The system panics or does other weird stuff during the paging out. If instead of (1) above, you do: (1') Causing a new page of PTE's to be allocated by simply using brk() to increase the break and then accessing the memory. then the panics don't happen when you do (2). So there was something different about the way the new PTE's are allocated during (1) as opposed to (1'). After tracking through the system for awhile, I found that the PDE that was correctly allocated during operation (1) was getting zeroed. Later on, during (2), a call to pmap_changebit() occurs, which in turn calls pmap_pte() on a virtual address which requires the zeroed PDE. Pmap_pte() simply returns 0 in case the following test fails: if (pmap && pmap_pde_v(pmap_pde(pmap, va))) { This is bad, since 0 is for sure not a valid address of a PTE, and nobody else seems to be checking the return value from pmap_pte(). I put a panic() at the end of pmap_pte() instead of the return(0). I had a hard time figuring out what was going on, when it occurred to me that what must be happening is that the page of PTE's is getting paged out before the pages that those PTE's map. That is, for some reason the page of PTE's is not getting wired when it is allocated. I think I figured out how this happens: On a normal (non-copyout) page fault, the trap goes through trap(), which first faults the page of PTE's, if necessary, then faults the actual page. The code that faults the page of PTE's wires the page once it is allocated. This is done by the following code in trap.c: /* check if page table is mapped, if not, fault it first */ #define pde_v(v) (PTD[((v)>>PD_SHIFT)&1023].pd_v) if (!pde_v(va)) { v = trunc_page(vtopte(va)); rv = vm_fault(map, v, ftype, FALSE); if (rv != KERN_SUCCESS) goto nogo; /* check if page table fault, increment wiring */ vm_map_pageable(map, v, round_page(v+1), FALSE); } else v=0; On the other hand, when copyout() is running, there is first an actual trap generated from within copyout() by the access to the missing PTE. This trap is handled in trap(). The page itself is then faulted by a simulated trap that goes through trapwrite(). Here is the problem: the code in trap() does double duty for both page table faults and user faults. When the trap comes from copyout(), the "logic" (I use the term loosely!) that checks what kind of fault it is fails to detect that the virtual address that is getting faulted is actually a page table address, rather than an ordinary user address. So, the code shown above that is supposed to fault and then wire the new page of PTE's does not run, and instead control passes to the immediately following code: rv = vm_fault(map, va, ftype, FALSE); if (rv == KERN_SUCCESS) { /* * XXX: continuation of rude stack hack */ if (nss > vm->vm_ssize) vm->vm_ssize = nss; va = trunc_page(vtopte(va)); /* for page table, increment wiring as long as not a page table fault as well */ if (!v && type != T_PAGEFLT) vm_map_pageable(map, va, round_page(va+1), FALSE); if (type == T_PAGEFLT) return; goto out; } The only wiring that would happen here is to bump up the count on a page of PTE's that was already mapped. Now, v is zero because the page table fault code didn't execute. However, the trap came from kernel mode, so we do have type == T_PAGEFLT (rather than T_PAGEFLT | T_USER), and this code doesn't get the page of PTE's wired either. The fix I am suggesting is above. It might also be good defensive programming to put a panic at the end of pmap_pte() in pmap.c as I mentioned above, so that we know before somebody tries to use a NULL pointer as the address of a PTE! --