Tuesday, June 11, 2013

Swap

The Solaris virtual memory system combines physical memory with available swap space via swapfs. If insufficient total virtual memory space is provided, new processes will be unable to open.

Swap space can be added, deleted or examined with the swap command. swap -l reports total and free space for each of the swap partitions or files that are available to the system. Note that this number does not reflect total available virtual memory space, since physical memory is not reflected in the output. swap -s reports the total available amount of virtual memory, as does sar -r.

If swap is mounted on /tmp via tmpfs, df -k /tmp will report on total available virtual memory space, both swap and physical. As large memory allocations are made, the amount of space available to tmpfs will decrease, meaning that the utilization percentages reported by df will be of limited use.

The DTrace Toolkit's swapinfo.d program prints out a summary of how virtual memory is currently being used:

Virtual Memory Summary

# /opt/DTT/Bin/swapinfo.d
RAM _______Total 2048 MB
RAM Unusable 25 MB
RAM Kernel 564 MB
RAM Locked 2 MB
RAM Used 189 MB
RAM Free 1266 MB

Disk _______Total 4004 MB
Disk Resv 69 MB
Disk Avail 3935 MB

Swap _______Total 5207 MB
Swap Resv 69 MB
Swap Avail 5138 MB
Swap (Minfree) 252 MB

Swapping

If the system is consistently below desfree of free memory (over a 30 second average), the memory scheduler will start to swap out processes. (ie, if both avefree and avefree30 are less than desfree, the swapper begins to look at processes.) Initially, the scheduler will look for processes that have been idle for maxslp seconds. (maxslp defaults to 20 seconds and can be tuned in /etc/system.) This swapping mode is known as soft swapping.

Swapping priorities are calculated for an LWP by the following formula:
epri = swapin_time - rss/(maxpgio/2) - pri
where swapin_time is the time since the thread was last swapped, rss is the amount of memory used by the LWPs process, and pri is the thread's priority.

If, in addition to being below desfree of free memory, there are two processes in the run queue and paging activity exceeds maxpgio, the system will commence hard swapping. In this state, the kernel unloads all modules and cache memory that is not currently active and starts swapping out processes sequentially until desfree of free memory is available.

Processes are not eligible for swapping if they are:

  • In the SYS or RT scheduling class.
  • Being executed or stopped by a signal.
  • Exiting.
  • Zombie.
  • A system thread.
  • Blocking a higher priority thread.

The DTrace Toolkit provides the anonpgpid.d script to attempt to identify the processes which are suffering the most when the system is hard swapping. While this may be interesting, if we are hard-swapping, we need to kill the culprit, not identify the victims. We are better off identifying which processes are consuming how much memory. prstat -s rss does a nice job of ranking processes by memory usage. (RSS stands for “resident set size, “ which is the amount of physical memory allocated to a process.)

Ranking Processes by Memory Usage

# prstat -s rss
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
213 daemon 19M 18M sleep 59 0 0:00:12 0.0% nfsmapid/4
7 root 9336K 8328K sleep 59 0 0:00:04 0.0% svc.startd/14
9 root 9248K 8188K sleep 59 0 0:00:07 0.0% svc.configd/15
517 root 9020K 5916K sleep 59 0 0:00:02 0.0% snmpd/1
321 root 9364K 5676K sleep 59 0 0:00:02 0.0% fmd/14
...
Total: 39 processes, 159 lwps, load averages: 0.00, 0.00, 0.00

We may also find ourselves swapping if we are running tmpfs and someone places a large file in /tmp. It takes some effort, but we have to educate our user community that /tmp is not scratch space. It is literally part of the virtual memory space. It may help matters to set up a directory called /scratch to allow people to unpack files or manipulate data.

No comments: