Return to BSD News archive
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!news.Brown.EDU!noc.near.net!news.delphi.com!usenet From: John Dyson <dysonj@delphi.com> Newsgroups: comp.os.386bsd.development Subject: Notes on the *new* FreeBSD V1.1 VM system Date: Sat, 19 Feb 94 20:14:17 -0500 Organization: Delphi (info@delphi.com email, 800-695-4005 voice) Lines: 175 Message-ID: <BcxpGux.dysonj@delphi.com> NNTP-Posting-Host: bos1a.delphi.com Gang, This is some info that I sent to the freebsd-hackers mailing list a few weeks ago (slightly edited): As you might know, the version of the MACH VM system that is being used by some of the BSD variants is not really complete and has some problems. Other implementations have fixed some of these problems too. The inadequacies included: 1) Incomplete handling of object inheritance (including the swap space grow problem.) 2) Essentially random pageouts, thereby guaranteeing thrashing during paging. Excessive CPU usage of pageout daemon. 3) Non-implementation of swapping. 4) Paging in/out of ONLY single pages. 5) Potential deadlocks in vnode_pager and swap_pager (different causes.) 6) Non-implementation of UPAGE and page-table-paging in 386 implementations. 7) Limited kernel address space in 386 implementations. 8) No support for use of <640K when kernel linked at 1MB. 9) The system will hang on swap space full. The blame for the original code should not be placed on anyone, because of the original audience/purpose of the BSD VM code (there is really no blame to assign.) It was time to upgrade the capabilities of the code. Starting in late '92 I started looking at the VM code and in about early/middle of '93 I had a fairly buggy implementation of the page-table, UPAGE and swapping code. In addition I had an improved page daemon. In middle of '93 Davidg and I started collaborating on integrating this code into FreeBSD. At about this time Davidg and myself started coming up with good ideas to improve and redo sections of the VM code. I started coming up with test versions of improved vnode_pager, swap_pager, pmap, and better pageout daemon code. In late '93 and early '94 we decided that the code would be ready for release 1.1 of FreeBSD. We had a couple of detours including the growing swap space problem (which is practically, but not completely fixed.) Here are the system improvements associated with the new VM code: 1) Much improved object inheritance and sharing. (One of the test programs is at the end of mail Test Program A.) 2) Improved LRU algorithm for page removal. This algorithm is slightly CPU usage heavy, but paging performance is better because of the stickyness of used pages into memory. Still, the CPU usage is much less than the original page daemon. Essentially, the algorithm is a hybrid clock/LRU. 3) Handling of soft RSS limits. Helpful for running large, memory hungry processes and not affecting other processes greatly. 4) True swapping support, tied in with RSS limits and UPAGE paging. The UPAGES have been moved into process space, and they are paged when a process is swapped out. (UPAGES MUST be wired when a process is swapped in.) 5) The pagers have been essentially redone, and vm_fault has had significant additions and improvements to support CLUSTER paging. The algorithm used is somewhat non-aggressive and was chosen to minimize additional system I/O. NO paging in from a UFS vnode is done through the buffer cache thereby eliminating the buffer cache flushing associated with paging in a process. In addition, no unnecessary copying is done during paging in a process (or even in the cp command which uses mmap!!!) [V1.2 mmap 'cp' will be *much* faster]. 6) Original swap_pager had some problems that would end up in deadlock due to spc and bp resource allocation problems (fixed). 7) Original vnode_pager had some problems with deadlock when the vm_pageout daemon would page objects out (fixed). 8) When page tables pages are not used, then they can be freed, even when a process is still running. With the UPAGE and pte paging, a processes page count can truly be 0 (Actually 1 because of the pd page.) Early versions ps lied about a processes page count, the current FreeBSD version does not lie as much. The only thing in a process that is not paged now is the Page Directory Page, and that costs only one page. (The page directory page is allocated out of the kernel and explicitly mapped into the process like the UPAGES used to be.) Originally, a process with shared libraries normally required: 1 pdpage, 2 pt pages, 1 pt page for sh lib, 2 UPAGES + data We now require (when a process is swapped out): 1 pdpage, 0 pt pages, 0 UPAGES, 0 data !!!! And even when a process is not swapped out, not all of the pt pages are needed!!! One of my tests is the following (in multi user, 20MB): Run 500 w/shlibs processes sleeping for 120 secs. Run 6 w/shlibs processes fork/execing 500 times. Run a process w/shlibs that grabs all of memory repeatedly. When I stop the memory grab process I use only 2200 pages, with all but approx 500-600 pages accounted for in malloc area and buffer cache. The 500 pages are for the processes page table directory pages!!! The old VM system would be using over 2500 pages for the various process overheads. 9) New versions of the FreeBSD system support a programmable kernel address space. Not only do we support variable kernel virtual address, but variable kernel size. This will help in the implementation of new features. 10) We can now support arbitrary discontiguous physical memory segments. 11) Significantly improved pmap code. String instructions used when advantageous, and elimination of the 'obnoxious' pmap_attributes. 12) Improved kern_physio, using the vm system to advantage eliminating the need to explicitly fault data pages. 13) Improved allocation of the process stack space (saves on numerous data structures and CPU.) 14) Numerous micro-level performance improvements during page manipulations in object queues. 15) Efficient operation of the system in 4MB. Others might claim this, but they are not fully loading their systems (just booting and running the daemons and an application or two does not count.) (run the test program below, and other demo programs can be easily contrived.) 16) The system can relieve its problem of running out of swap space, eliminating an impending hang. Contrary to information that may be implied by others, we plan to fully merge and implement the VM enhancements and changes from 4.4 lite. Any other group or individual that will help us with future FreeBSD releases will get full credit by me personally, FreeBSD team, and in any appropriate writings. Future enhancements for V1.2 already working include EXTREMELY FAST mmap performance (including improved process exec.) We have other things coming up that Davidg and myself are working on are major improvements in the file I/O setup... Other people on FreeBSD are doing even more important things than the VM changes (I think.) And I'll tell you, we cannot wait to see 4.4 lite, I believe that our changes in conjunction with 4.4 lite will be really good. I am sure that no matter what *BSD that you are looking at using, there will be a lot of interesting things happening in the future. John Dyson dyson@implode.root.com