Friday, May 24, 2013

Paging

Solaris uses both common types of paging in its virtual memory system. These types are swapping (swaps out all memory associated with a user process) and demand paging (swaps out the not recently used pages). Which method is used is determined by comparing the amount of available memory with several key parameters:
  • physmem: physmem is the total page count of physical memory.
  • lotsfree: The page scanner is woken up when available memory falls below lotsfree. The default value for this is physmem/64 (or 512 KB, whichever is greater); it can be tuned in the /etc/system file if necessary. The page scanner runs in demand paging mode by default. The initial scan rate is set by the kernel parameter slowscan (which is 100 by default).
  • minfree: Between lotsfree and minfree, the scan rate increases linearly between slowscan and fastscan. (fastscan is determined experimentally by the system as the maximum scan rate that can be supported by the system hardware. minfree is set to desfree/2, and desfree is set to lotsfree/2 by default.) Each page scanner will run for desscan pages. This parameter is dynamically set based on the scan rate.
  • maxpgio: maxpgio (default 40 or 60) limits the rate at which I/O is queued to the swap devices. It is set to 40 for x86 architectures and 60 for SPARC architectures. With modern hard drives, maxpgio can safely be set to 100 times the number of swap disks.
  • throttlefree: When free memory falls below throttlefree (default minfree), the page_create routines force the calling process to wait until free pages are available.
  • pageout_reserve: When free memory falls below this value (default throttlefree/2), only the page daemon and the scheduler are allowed memory allocations.

The page scanner operates by first freeing a usage flag on each page at a rate reported as "scan rate" in vmstat and sar -g. After handspreadpages additional pages have been read, the page scanner checks to see whether the usage flag has been reset. If not, the page is swapped out. (handspreadpages is set dynamically in current versions of Solaris. Its maximum value is pageout_new_spread.)

Solaris 8 introduced an improved algorithm for handling file system page caching (for file systems other than ZFS). This new architecture is known as the cyclical page cache. It is designed to remove most of the problems with virtual memory that were previously caused by the file system page cache.

In the new algorithm, the cache of unmapped/inactive file pages is located on a cachelist which functions as part of the freelist.

When a file page is mapped, it is mapped to the relevant page on the cachelist if it is already in memory. If the referenced page is not on the cachelist, it is mapped to a page on the freelist and the file page is read (or “paged”) into memory. Either way, mapped pages are moved to the segmap file cache. Once all other freelist pages are consumed, additional allocations are taken from the cachelist on a least recently accessed basis. With the new algorithm, file system cache only competes with itself for memory. It does not force applications to be swapped out of primary memory as sometimes happened with the earlier OS versions. As a result of these changes, vmstat reports statistics that are more in line with our intuition. In particular, scan rates will be near zero unless there is a systemwide shortage of available memory. (In the past, scan rates would reflect file caching activity, which is not really relevant to memory shortfalls.)

Every active memory page in Solaris is associated with a vnode (which is a mapping to a file) and an offset (the location within that file). This references the backing store for the memory location, and may represent an area on the swap device, or it may represent a location in a file system. All pages that are associated with a valid vnode and offset are placed on the global page hash list.

vmstat -p reports paging activity details for applications (executables), data (anonymous) and file system activity.

The parameters listed above can be viewed and set dynamically via mdb, as below:

# mdb -kw
Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba fcp fctl nca lofs zfs random logindmux ptm cpc fcip sppp crypto nfs ]
> physmem/E
physmem:
physmem: 258887
> lotsfree/E
lotsfree:
lotsfree: 3984
> desfree/E
desfree:
desfree: 1992
> minfree/E
minfree:
minfree: 996
> throttlefree/E
throttlefree:
throttlefree: 996
> fastscan/E
fastscan:
fastscan: 127499
> slowscan/E
slowscan:
slowscan: 100
> handspreadpages/E
handspreadpages:
handspreadpages:127499
> pageout_new_spread/E
pageout_new_spread:
pageout_new_spread: 161760
> lotsfree/Z fa0
lotsfree: 0xf90 = 0xfa0
> lotsfree/E
lotsfree:
lotsfree: 4000

No comments: