patch-2.3.23 linux/Documentation/vm/locking

Next file: linux/MAINTAINERS
Previous file: linux/Documentation/pci.txt
Back to the patch index
Back to the overall index

diff -u --recursive --new-file v2.3.22/linux/Documentation/vm/locking linux/Documentation/vm/locking
@@ -0,0 +1,88 @@
+Started Oct 1999 by Kanoj Sarcar <kanoj@sgi.com>
+
+The intent of this file is to have an uptodate, running commentary 
+from different people about how locking and synchronization is done 
+in the Linux vm code.
+
+vmlist_access_lock/vmlist_modify_lock
+--------------------------------------
+
+Page stealers pick processes out of the process pool and scan for 
+the best process to steal pages from. To guarantee the existance 
+of the victim mm, a mm_count inc and a mmdrop are done in swap_out().
+Page stealers hold kernel_lock to protect against a bunch of races.
+The vma list of the victim mm is also scanned by the stealer, 
+and the vmlist_lock is used to preserve list sanity against the
+process adding/deleting to the list. This also gurantees existance
+of the vma. Vma existance gurantee while invoking the driver
+swapout() method in try_to_swap_out() also relies on the fact
+that do_munmap() temporarily gets lock_kernel before decimating
+the vma, thus the swapout() method must snapshot all the vma 
+fields it needs before going to sleep (which will release the
+lock_kernel held by the page stealer). Currently, filemap_swapout
+is the only method that depends on this shaky interlocking.
+
+Any code that modifies the vmlist, or the vm_start/vm_end/
+vm_flags:VM_LOCKED/vm_next of any vma *in the list* must prevent 
+kswapd from looking at the chain. This does not include driver mmap() 
+methods, for example, since the vma is still not in the list.
+
+The rules are:
+1. To modify the vmlist (add/delete or change fields in an element), 
+you must hold mmap_sem to guard against clones doing mmap/munmap/faults, 
+(ie all vm system calls and faults), and from ptrace, swapin due to 
+swap deletion etc.
+2. To modify the vmlist (add/delete or change fields in an element), 
+you must also hold vmlist_modify_lock, to guard against page stealers 
+scanning the list.
+3. To scan the vmlist (find_vma()), you must either 
+        a. grab mmap_sem, which should be done by all cases except 
+	   page stealer.
+or
+        b. grab vmlist_access_lock, only done by page stealer.
+4. While holding the vmlist_modify_lock, you must be able to guarantee
+that no code path will lead to page stealing. A better guarantee is
+to claim non sleepability, which ensures that you are not sleeping
+for a lock, whose holder might in turn be doing page stealing.
+5. You must be able to guarantee that while holding vmlist_modify_lock
+or vmlist_access_lock of mm A, you will not try to get either lock
+for mm B.
+
+The caveats are:
+1. find_vma() makes use of, and updates, the mmap_cache pointer hint.
+The update of mmap_cache is racy (page stealer can race with other code
+that invokes find_vma with mmap_sem held), but that is okay, since it 
+is a hint. This can be fixed, if desired, by having find_vma grab the
+vmlist lock.
+
+
+Code that add/delete elements from the vmlist chain are
+1. callers of insert_vm_struct
+2. callers of merge_segments
+3. callers of avl_remove
+
+Code that changes vm_start/vm_end/vm_flags:VM_LOCKED of vma's on
+the list:
+1. expand_stack
+2. mprotect
+3. mlock
+4. mremap
+
+It is advisable that changes to vm_start/vm_end be protected, although 
+in some cases it is not really needed. Eg, vm_start is modified by 
+expand_stack(), it is hard to come up with a destructive scenario without 
+having the vmlist protection in this case.
+
+The vmlist lock nests with the inode i_shared_lock and the kmem cache
+c_spinlock spinlocks. This is okay, since code that holds i_shared_lock 
+never asks for memory, and the kmem code asks for pages after dropping
+c_spinlock.
+
+The vmlist lock can be a sleeping or spin lock. In either case, care
+must be taken that it is not held on entry to the driver methods, since
+those methods might sleep or ask for memory, causing deadlocks.
+
+The current implementation of the vmlist lock uses the page_table_lock,
+which is also the spinlock that page stealers use to protect changes to
+the victim process' ptes. Thus we have a reduction in the total number
+of locks. 

FUNET's LINUX-ADM group, linux-adm@nic.funet.fi
TCL-scripts by Sam Shen (who was at: slshen@lbl.gov)