Solaris Troubleshooting: Measuring Memory Shortfalls

In the real world, memory shortfalls are much more devastating than having a CPU bottleneck. Two primary indicators of a RAM shortage are the scan rate and swap device activity. Here are some useful commands for monitoring both types of activity:

Memory Saturation: Scan Rate

sar -g
vmstat

Memory Saturation: Swap Space Usage and Paging Rates

In both cases, the high activity rate can be due to something that does not have a consistently large impact on performance. The processes running on the system have to be examined to see how frequently they are run and what their impact is. It may be possible to re-work the program or run the process differently to reduce the amount of new data being read into memory.

(Virtual memory takes two shapes in a Unix system: physical memory and swap space. Physical memory usually comes in DIMM modules and is frequently called RAM. Swap space is a dedicated area of disk space that the operating system addresses almost as if it were physical memory. Since disk I/O is much slower than I/O to and from memory, we would prefer to use swap space as infrequently as possible. Memory address space refers to the range of addresses that can be assigned, or mapped, to virtual memory on the system. The bulk of an address space is not mapped at any given point in time.)

We have to weigh the costs and benefits of upgrading physical memory, especially to accommodate an infrequently scheduled process. If the cost is more important than the performance, we can use swap space to provide enough virtual memory space for the application to run. If adequate total virtual memory space is not provided, new processes will not be able to open. (The system may report "Not enough space" or "WARNING: /tmp: File system full, swap space limit exceeded.")

Swap space is usually only used when physical memory is too small to accommodate the system's memory requirements. At that time, space is freed in physical memory by paging (moving) it out to swap space. (See “Paging” below for a more complete discussion of the process.)

If inadequate physical memory is provided, the system will be so busy paging to swap that it will be unable to keep up with demand. (This state is known as "thrashing" and is characterized by heavy I/O on the swap device and horrendous performance. In this state, the scanner can use up to 80% of CPU.)

When this happens, we can use the vmstat -p command to examine whether the stress on the system is coming from executables, application data or file system traffic. This command displays the number of paging operations for each type of data.

Scan Rate

When available memory falls below certain thresholds, the system attempts to reclaim memory that is being used for other purposes. The page scanner is the program that runs through memory to see which pages can be made available by placing them on the free list. The scan rate is the number of times per second that the page scanner makes a pass through memory. (The “Paging” section later in this chapter discusses some details of the page scanner's operation.) The page scanning rate is the main tipoff that a system does not have enough physical memory. We can use sar -g or vmstat to look at the scan rate. vmstat 30 checks memory usage every 30 seconds. (Ignore the summary statistics on the first line.) If page/sr is much above zero for an extended time, your system may be running short of physical memory. (Shorter sampling periods may be used to get a feel for what is happening on a smaller time scale.)

A very low scan rate is a sure indicator that the system is not running short of physical memory. On the other hand, a high scan rate can be caused by transient issues, such as a process reading large amounts of uncached data. The processes on the system should be examined to see how much of a long-term impact they have on performance. Historical trends need to be examined with sar -g to make sure that the page scanner has not come on for a transient, non-recurring reason.

A nonzero scan rate is not necessarily an indication of a problem. Over time, memory is allocated for caching and other activities. Eventually, the amount of memory will reach the lotsfree memory level, and the pageout scanner will be invoked. For a more thorough discussion of the paging algorithm, see “Paging” below.

Swap Device Activity

The amount of disk activity on the swap device can be measured using iostat. iostat -xPnce provides information on disk activity on a partition-by-partition basis. sar -d provides similar information on a per-physical-device basis, and vmstat provides some usage information as well. Where Veritas Volume Manager is used, vxstat provides per-volume performance information.

If there are I/O's queued for the swap device, application paging is occurring. If there is significant, persistent, heavy I/O to the swap device, a RAM upgrade may be in order.

Process Memory Usage

The /usr/proc/bin/pmap command can help pin down which process is the memory hog. /usr/proc/bin/pmap -x PID prints out details of memory use by a process.

Summary statistics regarding process size can be found in the RSS column of ps -ly or top.

dbx, the debugging utility in the SunPro package, has extensive memory leak detection built in. The source code will need to be compiled with the -g flag by the appropriate SunPro compiler.

ipcs -mb shows memory statistics for shared memory. This may be useful when attempting to size memory to fit expected traffic.

Solaris Troubleshooting

Monday, May 20, 2013

Measuring Memory Shortfalls

Scan Rate

Swap Device Activity

Process Memory Usage

No comments:

Sponsor Links