*BSD News Article 27569


Return to BSD News archive

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!news.Brown.EDU!noc.near.net!news.delphi.com!usenet
From: John Dyson <dysonj@delphi.com>
Newsgroups: comp.os.386bsd.development
Subject: Notes on the *new* FreeBSD V1.1 VM system
Date: Sat, 19 Feb 94 20:14:17 -0500
Organization: Delphi (info@delphi.com email, 800-695-4005 voice)
Lines: 175
Message-ID: <BcxpGux.dysonj@delphi.com>
NNTP-Posting-Host: bos1a.delphi.com

Gang,
	This is some info that I sent to the freebsd-hackers mailing
	list a few weeks ago (slightly edited):
 
 
 
	As you might know, the version of the MACH VM system that
	is being used by some of the BSD variants is not really complete
	and has some problems.  Other implementations have fixed some
	of these problems too.  The inadequacies included:
 
	1)	Incomplete handling of object inheritance (including
		the swap space grow problem.)
 
	2)	Essentially random pageouts, thereby guaranteeing
		thrashing during paging.  Excessive CPU usage of
		pageout daemon.
 
	3)	Non-implementation of swapping.
 
	4)	Paging in/out of ONLY single pages.
 
	5)	Potential deadlocks in vnode_pager and swap_pager
		(different causes.)
 
	6)	Non-implementation of UPAGE and page-table-paging in
		386 implementations.
 
	7)	Limited kernel address space in 386 implementations.
 
	8)	No support for use of <640K when kernel linked at 1MB.
 
	9)	The system will hang on swap space full.
 
	The blame for the original code should not be placed on anyone, because
	of the original audience/purpose of the BSD VM code (there is
	really no blame to assign.)  It was time to upgrade the capabilities
	of the code.
	
	Starting in late '92 I started looking at the VM code and in about
	early/middle of '93 I had a fairly buggy implementation of the 
	page-table, UPAGE and swapping code.  In addition I had an improved
	page daemon.  In middle of '93 Davidg and I started collaborating
	on integrating this code into FreeBSD.  At about this time Davidg
	and myself started coming up with good ideas to improve and redo sections
	of the VM code.  I started coming up with test versions of improved
	vnode_pager, swap_pager, pmap, and better pageout daemon code.  In
	late '93 and early '94 we decided that the code would be ready for
	release 1.1 of FreeBSD.  We had a couple of detours including the
	growing swap space problem (which is practically, but not
	completely fixed.)   Here are the system improvements associated with
	the new VM code:
 
	1)	Much improved object inheritance and sharing.
		(One of the test programs is at the end of mail Test Program
		A.)
 
	2)	Improved LRU algorithm for page removal.  This algorithm
		is slightly CPU usage heavy, but paging performance is
		better because of the stickyness of used pages into memory.
		Still, the CPU usage is much less than the original page
		daemon.  Essentially, the algorithm is a hybrid clock/LRU.
 
	3)	Handling of soft RSS limits.  Helpful for running large,
		memory hungry processes and not affecting other processes
		greatly.
	
	4)	True swapping support, tied in with RSS limits and UPAGE
		paging.  The UPAGES have been moved into process space, and
		they are paged when a process is swapped out.  (UPAGES MUST
		be wired when a process is swapped in.)
 
	5)	The pagers have been essentially redone, and vm_fault
		has had significant additions and improvements to support
		CLUSTER paging.  The algorithm used is somewhat non-aggressive
		and was chosen to minimize additional system I/O.  NO paging
		in from a UFS vnode is done through the buffer cache thereby
		eliminating the buffer cache flushing associated with paging
		in a process.  In addition, no unnecessary copying is done
		during paging in a process (or even in the cp command which
		uses mmap!!!) [V1.2 mmap 'cp' will be *much* faster].
	
	6)	Original swap_pager had some problems that would end up in
		deadlock due to spc and bp resource allocation problems (fixed).
 
	7)	Original vnode_pager had some problems with deadlock when the
		vm_pageout daemon would page objects out (fixed).
 
	8)	When page tables pages are not used, then they can be freed,
		even when a process is still running.  With the UPAGE and
		pte paging, a processes page count can truly be 0 (Actually
		1 because of the pd page.)  Early versions ps lied about a
		processes page count, the current FreeBSD version does not
		lie as much.  The only thing in a process that is not paged
		now is the Page Directory Page, and that costs only one page.
		(The page directory page is allocated out of the kernel and
		explicitly mapped into the process like the UPAGES used to
		be.)  Originally, a process with shared libraries normally
		required:
 
		1 pdpage, 2 pt pages, 1 pt page for sh lib, 2 UPAGES + data
 
		We now require (when a process is swapped out):
 
		1 pdpage, 0 pt pages, 0 UPAGES, 0 data !!!!
 
		And even when a process is not swapped out, not all of the pt
		pages are needed!!!  
 
		One of my tests is the following (in multi user, 20MB):
 
			Run 500 w/shlibs processes sleeping for 120 secs.
			Run 6 w/shlibs processes fork/execing 500 times.
			Run a process w/shlibs that grabs all of memory
				repeatedly.
 
			When I stop the memory grab process I use only
			2200 pages, with all but approx 500-600 pages accounted
			for in malloc area and buffer cache.  The 500 pages
			are for the processes page table directory pages!!!
			The old VM system would be using over 2500 pages for
			the various process overheads.
				
 
	9)	New versions of the FreeBSD system support a programmable
		kernel address space.  Not only do we support variable
		kernel virtual address, but variable kernel size.  This
		will help in the implementation of new features.
 
	10)	We can now support arbitrary discontiguous physical memory
		segments.
 
	11)	Significantly improved pmap code.  String instructions used
		when advantageous, and elimination of the 'obnoxious'
		pmap_attributes.
 
	12)	Improved kern_physio, using the vm system to advantage
		eliminating the need to explicitly fault data pages.
 
	13)	Improved allocation of the process stack space (saves
		on numerous data structures and CPU.)
 
	14)	Numerous micro-level performance improvements during
		page manipulations in object queues.
 
	15)	Efficient operation of the system in 4MB.  Others
		might claim this, but they are not fully loading their
		systems (just booting and running the daemons and
		an application or two does not count.) (run the test
		program below, and other demo programs can be easily
		contrived.)
 
	16)	The system can relieve its problem of running out of
		swap space, eliminating an impending hang.
 
 
	Contrary to information that may be implied by others, we plan to
	fully merge and implement the VM enhancements and changes from
	4.4 lite.  Any other group or individual that will help us with
	future FreeBSD releases will get full credit by me personally,
	FreeBSD team, and in any appropriate writings.
 
	Future enhancements for V1.2 already working include EXTREMELY FAST mmap
	performance (including improved process exec.)  We have other things
	coming up that Davidg and myself are working on are major improvements
	in the file I/O setup...  Other people on FreeBSD are doing even more
	important things than the VM changes (I think.)  And I'll tell you,
	we cannot wait to see 4.4 lite, I believe that our changes in
	conjunction with 4.4 lite will be really good.  I am sure that no
	matter what *BSD that you are looking at using, there will be a lot of
	interesting things happening in the future.
	
 
John Dyson
dyson@implode.root.com