Friday, May 24, 2013

Paging

Solaris uses both common types of paging in its virtual memory system. These types are swapping (swaps out all memory associated with a user process) and demand paging (swaps out the not recently used pages). Which method is used is determined by comparing the amount of available memory with several key parameters:
  • physmem: physmem is the total page count of physical memory.
  • lotsfree: The page scanner is woken up when available memory falls below lotsfree. The default value for this is physmem/64 (or 512 KB, whichever is greater); it can be tuned in the /etc/system file if necessary. The page scanner runs in demand paging mode by default. The initial scan rate is set by the kernel parameter slowscan (which is 100 by default).
  • minfree: Between lotsfree and minfree, the scan rate increases linearly between slowscan and fastscan. (fastscan is determined experimentally by the system as the maximum scan rate that can be supported by the system hardware. minfree is set to desfree/2, and desfree is set to lotsfree/2 by default.) Each page scanner will run for desscan pages. This parameter is dynamically set based on the scan rate.
  • maxpgio: maxpgio (default 40 or 60) limits the rate at which I/O is queued to the swap devices. It is set to 40 for x86 architectures and 60 for SPARC architectures. With modern hard drives, maxpgio can safely be set to 100 times the number of swap disks.
  • throttlefree: When free memory falls below throttlefree (default minfree), the page_create routines force the calling process to wait until free pages are available.
  • pageout_reserve: When free memory falls below this value (default throttlefree/2), only the page daemon and the scheduler are allowed memory allocations.

The page scanner operates by first freeing a usage flag on each page at a rate reported as "scan rate" in vmstat and sar -g. After handspreadpages additional pages have been read, the page scanner checks to see whether the usage flag has been reset. If not, the page is swapped out. (handspreadpages is set dynamically in current versions of Solaris. Its maximum value is pageout_new_spread.)

Solaris 8 introduced an improved algorithm for handling file system page caching (for file systems other than ZFS). This new architecture is known as the cyclical page cache. It is designed to remove most of the problems with virtual memory that were previously caused by the file system page cache.

In the new algorithm, the cache of unmapped/inactive file pages is located on a cachelist which functions as part of the freelist.

When a file page is mapped, it is mapped to the relevant page on the cachelist if it is already in memory. If the referenced page is not on the cachelist, it is mapped to a page on the freelist and the file page is read (or “paged”) into memory. Either way, mapped pages are moved to the segmap file cache. Once all other freelist pages are consumed, additional allocations are taken from the cachelist on a least recently accessed basis. With the new algorithm, file system cache only competes with itself for memory. It does not force applications to be swapped out of primary memory as sometimes happened with the earlier OS versions. As a result of these changes, vmstat reports statistics that are more in line with our intuition. In particular, scan rates will be near zero unless there is a systemwide shortage of available memory. (In the past, scan rates would reflect file caching activity, which is not really relevant to memory shortfalls.)

Every active memory page in Solaris is associated with a vnode (which is a mapping to a file) and an offset (the location within that file). This references the backing store for the memory location, and may represent an area on the swap device, or it may represent a location in a file system. All pages that are associated with a valid vnode and offset are placed on the global page hash list.

vmstat -p reports paging activity details for applications (executables), data (anonymous) and file system activity.

The parameters listed above can be viewed and set dynamically via mdb, as below:

# mdb -kw
Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba fcp fctl nca lofs zfs random logindmux ptm cpc fcip sppp crypto nfs ]
> physmem/E
physmem:
physmem: 258887
> lotsfree/E
lotsfree:
lotsfree: 3984
> desfree/E
desfree:
desfree: 1992
> minfree/E
minfree:
minfree: 996
> throttlefree/E
throttlefree:
throttlefree: 996
> fastscan/E
fastscan:
fastscan: 127499
> slowscan/E
slowscan:
slowscan: 100
> handspreadpages/E
handspreadpages:
handspreadpages:127499
> pageout_new_spread/E
pageout_new_spread:
pageout_new_spread: 161760
> lotsfree/Z fa0
lotsfree: 0xf90 = 0xfa0
> lotsfree/E
lotsfree:
lotsfree: 4000

Wednesday, May 22, 2013

Segmentation Violations

Segmentation violations occur when a process references a memory address not mapped by any segment. The resulting SIGSEGV signal originates as a major page fault hardware exception identified by the processor and is translated by as_fault() in the address space layer.

When a process overflows its stack, a segmentation violation fault results. The kernel recognizes the violation and can extend the stack size, up to a configurable limit. In a multithreaded environment, the kernel does not keep track of each user thread's stack, so it cannot perform this function. The thread itself is responsible for stack SIGSEGV (stack overflow signal) handling.

(The SIGSEGV signal is sent by the threads library when an attempt is made to write to a write-protected page just beyond the end of the stack. This page is allocated as part of the stack creation request.)

It is often the case that segmentation faults occur because of resource restrictions on the size of a process's stack. See “Resource Management” for information about how to increase these limits.

See “Process Virtual Memory” for a more detailed description of the structure of a process's address space.

Monday, May 20, 2013

Measuring Memory Shortfalls

In the real world, memory shortfalls are much more devastating than having a CPU bottleneck. Two primary indicators of a RAM shortage are the scan rate and swap device activity. Here are some useful commands for monitoring both types of activity:

In both cases, the high activity rate can be due to something that does not have a consistently large impact on performance. The processes running on the system have to be examined to see how frequently they are run and what their impact is. It may be possible to re-work the program or run the process differently to reduce the amount of new data being read into memory.

(Virtual memory takes two shapes in a Unix system: physical memory and swap space. Physical memory usually comes in DIMM modules and is frequently called RAM. Swap space is a dedicated area of disk space that the operating system addresses almost as if it were physical memory. Since disk I/O is much slower than I/O to and from memory, we would prefer to use swap space as infrequently as possible. Memory address space refers to the range of addresses that can be assigned, or mapped, to virtual memory on the system. The bulk of an address space is not mapped at any given point in time.)

We have to weigh the costs and benefits of upgrading physical memory, especially to accommodate an infrequently scheduled process. If the cost is more important than the performance, we can use swap space to provide enough virtual memory space for the application to run. If adequate total virtual memory space is not provided, new processes will not be able to open. (The system may report "Not enough space" or "WARNING: /tmp: File system full, swap space limit exceeded.")

Swap space is usually only used when physical memory is too small to accommodate the system's memory requirements. At that time, space is freed in physical memory by paging (moving) it out to swap space. (See “Paging” below for a more complete discussion of the process.)

If inadequate physical memory is provided, the system will be so busy paging to swap that it will be unable to keep up with demand. (This state is known as "thrashing" and is characterized by heavy I/O on the swap device and horrendous performance. In this state, the scanner can use up to 80% of CPU.)

When this happens, we can use the vmstat -p command to examine whether the stress on the system is coming from executables, application data or file system traffic. This command displays the number of paging operations for each type of data.

Scan Rate


When available memory falls below certain thresholds, the system attempts to reclaim memory that is being used for other purposes. The page scanner is the program that runs through memory to see which pages can be made available by placing them on the free list. The scan rate is the number of times per second that the page scanner makes a pass through memory. (The “Paging” section later in this chapter discusses some details of the page scanner's operation.) The page scanning rate is the main tipoff that a system does not have enough physical memory. We can use sar -g or vmstat to look at the scan rate. vmstat 30 checks memory usage every 30 seconds. (Ignore the summary statistics on the first line.) If page/sr is much above zero for an extended time, your system may be running short of physical memory. (Shorter sampling periods may be used to get a feel for what is happening on a smaller time scale.)

A very low scan rate is a sure indicator that the system is not running short of physical memory. On the other hand, a high scan rate can be caused by transient issues, such as a process reading large amounts of uncached data. The processes on the system should be examined to see how much of a long-term impact they have on performance. Historical trends need to be examined with sar -g to make sure that the page scanner has not come on for a transient, non-recurring reason.

A nonzero scan rate is not necessarily an indication of a problem. Over time, memory is allocated for caching and other activities. Eventually, the amount of memory will reach the lotsfree memory level, and the pageout scanner will be invoked. For a more thorough discussion of the paging algorithm, see “Paging” below.

Swap Device Activity

The amount of disk activity on the swap device can be measured using iostat. iostat -xPnce provides information on disk activity on a partition-by-partition basis. sar -d provides similar information on a per-physical-device basis, and vmstat provides some usage information as well. Where Veritas Volume Manager is used, vxstat provides per-volume performance information.

If there are I/O's queued for the swap device, application paging is occurring. If there is significant, persistent, heavy I/O to the swap device, a RAM upgrade may be in order.

Process Memory Usage

The /usr/proc/bin/pmap command can help pin down which process is the memory hog. /usr/proc/bin/pmap -x PID prints out details of memory use by a process.

Summary statistics regarding process size can be found in the RSS column of ps -ly or top.

dbx, the debugging utility in the SunPro package, has extensive memory leak detection built in. The source code will need to be compiled with the -g flag by the appropriate SunPro compiler.

ipcs -mb shows memory statistics for shared memory. This may be useful when attempting to size memory to fit expected traffic.

Friday, May 17, 2013

vmstat

vmstat

The first line of vmstat represents a summary of information since boot time. To obtain useful real-time statistics, run vmstat with a time step (eg vmstat 30).

The vmstat output columns are as follows use the pagesize command to determine the size of the pages):

  • procs or kthr/r: Run queue length.
  • procs or kthr/b: Processes blocked while waiting for I/O.
  • procs or kthr/w: Idle processes which have been swapped.
  • memory/swap: Free, unreserved swap space (Kb).
  • memory/free: Free memory (Kb). (Note that this will grow until it reaches lotsfree, at which point the page scanner is started. See "Paging" for more details.)
  • page/re: Pages reclaimed from the free list. (If a page on the free list still contains data needed for a new request, it can be remapped.)
  • page/mf: Minor faults (page in memory, but not mapped). (If the page is still in memory, a minor fault remaps the page. It is comparable to the vflts value reported by sar -p.)
  • page/pi: Paged in from swap (Kb/s). (When a page is brought back from the swap device, the process will stop execution and wait. This may affect performance.)
  • page/po: Paged out to swap (Kb/s). (The page has been written and freed. This can be the result of activity by the pageout scanner, a file close, or fsflush.)
  • page/fr: Freed or destroyed (Kb/s). (This column reports the activity of the page scanner.)
  • page/de: Freed after writes (Kb/s). (These pages have been freed due to a pageout.)
  • page/sr: Scan rate (pages). Note that this number is not reported as a "rate," but as a total number of pages scanned.
  • disk/s#: Disk activity for disk # (I/O's per second).
  • faults/in: Interrupts (per second).
  • faults/sy: System calls (per second).
  • faults/cs: Context switches (per second).
  • cpu/us: User CPU time (%).
  • cpu/sy: Kernel CPU time (%).
  • cpu/id: Idle + I/O wait CPU time (%).

vmstat -i reports on hardware interrupts.

vmstat -s provides a summary of memory statistics, including statistics related to the DNLC, inode and rnode caches.

vmstat -S reports on swap-related statistics such as:

  • si: Swapped in (Kb/s).
  • so: Swap outs (Kb/s).

(Note that the man page for vmstat -s incorrectly describes the swap queue length. In Solaris 2, the swap queue length is the number of idle swapped-out processes. (In SunOS 4, this referred to the number of active swapped-out processes.)

Solaris 8

vmstat under Solaris 8 will report different statistics than would be expected under an earlier version of Solaris due to a different paging algorithm:
  • Page Reclaim rate higher.
  • Higher reported Free Memory: A large component of the filesystem cache is reported as free memory.
  • Low Scan Rates: Scan rates will be near zero unless there is a systemwide shortage of available memory.

vmstat -p reports paging activity details for applications (executables), data (anonymous) and filesystem activity.

Thursday, May 16, 2013

sar

sar

The word "sar" is used to refer to two related items:

  1. The system activity report package
  2. The system activity reporter

System Activity Report Package

This facility stores a great deal of performance data about a system. This information is invaluable when attempting to identify the source of a performance problem.

The Report Package can be enabled by uncommenting the appropriate lines in the sys crontab. The sa1 program stores performance data in the /var/adm/sa directory. sa2 writes reports from this data, and sadc is a more general version of sa1.

In practice, I do not find that the sa2-produced reports are terribly useful in most cases. Depending on the issue being examined, it may be sufficient to run sa1 at intervals that can be set in the sys crontab.

Alternatively, sar can be used on the command line to look at performance over different time slices or over a constricted period of time:

sar -A -o outfile 5 2000

(Here, "5" represents the time slice and "2000" represents the number of samples to be taken. "outfile" is the output file where the data will be stored.)

The data from this file can be read by using the "-f" option (see below).


System Activity Reporter

sar has several options that allow it to process the data collected by sa1 in different ways:
  • -a: Reports file system access statistics. Can be used to look at issues related to the DNLC.

    • iget/s: Rate of requests for inodes not in the DNLC. An iget will be issued for each path component of the file's path.

    • namei/s: Rate of file system path searches. (If the directory name is not in the DNLC, iget calls are made.)

    • dirbk/s: Rate of directory block reads.

  • -A: Reports all data.

  • -b: Buffer activity reporter:

    • bread/s, bwrit/s: Transfer rates (per second) between system buffers and block devices (such as disks).

    • lread/s, lwrit/s: System buffer access rates (per second).

    • %rcache, %wcache: Cache hit rates (%).

    • pread/s, pwrit/s: Transfer rates between system buffers and character devices.

  • -c: System call reporter:

    • scall/s: System call rate (per second).

    • sread/s, swrit/s, fork/s, exec/s: Call rate for these calls (per second).

    • rchar/s, wchar/s: Transfer rate (characters per second).

  • -d: Disk activity (actually, block device activity):

    • %busy: % of time servicing a transfer request.

    • avque: Average number of outstanding requests.

    • r+w/s: Rate of reads+writes (transfers per second).

    • blks/s: Rate of 512-byte blocks transferred (per second).

    • avwait: Average wait time (ms).

    • avserv: Average service time (ms). (For block devices, this includes seek rotation and data transfer times. Note that the iostat svc_t is equivalent to the avwait+avserv.)

  • -e HH:MM: CPU useage up to time specified.

  • -f filename: Use filename as the source for the binary sar data. The default is to use today's file from /var/adm/sa.

  • -g: Paging activity (see "Paging" for more details):

    • pgout/s: Page-outs (requests per second).

    • ppgout/s: Page-outs (pages per second).

    • pgfree/s: Pages freed by the page scanner (pages per second).

    • pgscan/s: Scan rate (pages per second).

    • %ufs_ipf: Percentage of UFS inodes removed from the free list while still pointing at reuseable memory pages. This is the same as the percentage of igets that force page flushes.


  • -i sec: Set the data collection interval to i seconds.

  • -k: Kernel memory allocation:

    • sml_mem: Amount of virtual memory available for the small pool (bytes). (Small requests are less than 256 bytes)

    • lg_mem: Amount of virtual memory available for the large pool (bytes). (512 bytes-4 Kb)

    • ovsz_alloc: Memory allocated to oversize requests (bytes). Oversize requests are dynamically allocated, so there is no pool. (Oversize requests are larger than 4 Kb)

    • alloc: Amount of memory allocated to a pool (bytes). The total KMA useage is the sum of these columns.

    • fail: Number of requests that failed.

  • -m: Message and semaphore activities.

    • msg/s, sema/s: Message and semaphore statistics (operations per second).

  • -o filename: Saves output to filename.

  • -p: Paging activities.

    • atch/s: Attaches (per second). (This is the number of page faults that are filled by reclaiming a page already in memory.)

    • pgin/s: Page-in requests (per second) to file systems.

    • ppgin/s: Page-ins (per second). (Multiple pages may be affected by a single request.)

    • pflt/s: Page faults from protection errors (per second).

    • vflts/s: Address translation page faults (per second). (This happens when a valid page is not in memory. It is comparable to the vmstat-reported page/mf value.)

    • slock/s: Faults caused by software lock requests that require physical I/O (per second).

  • -q: Run queue length and percentage of the time that the run queue is occupied.

  • -r: Unused memory pages and disk blocks.

    • freemem: Pages available for use (Use pagesize to determine the size of the pages).

    • freeswap: Disk blocks available in swap (512-byte blocks).

  • -s time: Start looking at data from time onward.

  • -u: CPU utilization.

    • %usr: User time.

    • %sys: System time.

    • %wio: Waiting for I/O (does not include time when another process could be schedule to the CPU).

    • %idle: Idle time.

  • -v: Status of process, inode, file tables.

    • proc-sz: Number of process entries (proc structures) currently in use, compared with max_nprocs.

    • inod-sz: Number of inodes in memory compared with the number currently allocated in the kernel.

    • file-sz: Number of entries in and size of the open file table in the kernel.

    • lock-sz: Shared memory record table entries currently used/allocated in the kernel. This size is reported as 0 for standards compliance (space is allocated dynamically for this purpose).

    • ov: Overflows between sampling points.

  • -w: System swapping and switching activity.

    • swpin/s, swpot/s, bswin/s, bswot/s: Number of LWP transfers or 512-byte blocks per second.

    • pswch/s: Process switches (per second).

  • -y: TTY device activity.

    • rawch/s, canch/s, outch/s: Input character rate, character rate processed by canonical queue, output character rate.

    • rcvin/s, xmtin/s, mdmin/s: Receive, transmit and modem interrupt rates.

Wednesday, May 15, 2013

nfsstat

nfsstat can be used to examine NFS performance.

nfsstat -s reports server-side statistics. In particular, the following are important:

  • calls: Total RPC calls received.
  • badcalls: Total number of calls rejected by the RPC layer.
  • nullrecv: Number of times an RPC call was not available even though it was believed to have been received.
  • badlen: Number of RPC calls with a length shorter than that allowed for RPC calls.
  • xdrcall: Number of RPC calls whose header could not be decoded by XDR (External Data Representation).
  • readlink: Number of times a symbolic link was read.
  • getattr: Number of attribute requests.
  • null: Null calls are made by the automounter when looking for a server for a filesystem.
  • writes: Data written to an exported filesystem.

Sun recommends the following tuning actions for some common conditions:

  • writes > 10%: Write caching (either array-based or host-based, such as a Prestoserv card) would speed up operation.
  • badcalls >> 0: The network may be overloaded and should be checked out. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround.
  • readlink > 10%: Replace symbolic links with directories on the server.
  • getattr > 40%: The client attribute cache can be increased by setting the actimeo mount option. Note that this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases, mount the filesystems with the noac option.

nfsstat -c reports client-side statistics. The following statistics are of particular interest:

  • calls: Total number of calls made.
  • badcalls: Total number of calls rejected by RPC.
  • retrans: Total number of retransmissions. If this number is larger than 5%, the requests are not reaching the server consistently. This may indicate a network or routing problem.
  • badxid: Number of times a duplicate acknowledgement was received for a single request. If this number is roughly the same as badcalls, the network is congested. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround.
    If on the other hand, badxid=0, this can be an indication of a slow network connection.
  • timeout: Number of calls that timed out. If this is roughly equal to badxid, the requests are reaching the server, but the server is slow.
  • wait: Number of times a call had to wait because a client handle was not available.
  • newcred: Number of times the authentication was refreshed.
  • null: A large number of null calls indicates that the automounter is retrying the mount frequently. The timeo parameter should be changed in the automounter configuration.

nfsstat -m (from the client) provides server-based performance data.

  • srtt: Smoothed round-trip time. If this number is larger than 50ms, the mount point is slow.
  • dev: Estimated deviation.
  • cur: Current backed-off timeout value.
  • Lookups: If cur>80 ms, the requests are taking too long.
  • Reads: If cur>150 ms, the requests are taking too long.
  • Writes: If cur>250 ms, the requests are taking too long.

Tuesday, May 14, 2013

p-commands

p-Commands

In Unix, every object is either a file or a process. With the /proc virtual file system, even processes may be treated like files.

/proc (or procfs) is a virtual file system that allows us to examine processes like files. This means that /proc allows us to use file-like operations and intuitions when looking at processes. /proc does not occupy disk space; it is located in working memory. This structure was originally designed as a programming interface for writing debuggers, but it has grown considerably since then.

To avoid confusion, we will refer to the virtual file system as /proc or procfs. The man page for procfs is proc(4). proc, on the other hand, will be used to refer to the process data structure discussed in the Process Structure page.

Under /proc is a list of numbers, each of which is a Process ID (PID) for a process on our system. Under these directories are subdirectories referring to the different components of interest of each process. This directory structure can be examined directly, but we usually prefer to use commands written to extract information from this structure. These are known as the p-commands.

  • pcred: Display process credentials (eg EUID/EGID, RUID/RGID, saved UIDs/GIDs)
  • pfiles: Reports fstat() and fcntl() information for all open files. This includes information on the inode number, file system, ownership and size.
  • pflags: Prints the tracing flags, pending and held signals and other /proc status information for each LWP.
  • pgrep: Finds processes matching certain criteria.
  • pkill: Kills specified processes.
  • pldd: Lists dynamic libraries linked to the process.
  • pmap: Prints process address space map.
  • prun: Starts stopped processes.
  • prstat: Display process performance-related statistics.
  • ps: List process information.
  • psig: Lists signal actions.
  • pstack: Prints a stack trace for each LWP in the process.
  • pstop: Stops the process.
  • ptime: Times the command; does not time children.
  • ptree: Prints process genealogy.
  • pwait: Wait for specified processes to complete.
  • pwdx: Prints process working directory.

prstat Example 1

CPU Saturation is can be directly measured via prstat. (Saturation refers to a situation where there is not enough CPU capacity to adequately handle requests for processing resources.) Saturation can be measured directly by looking at the CPU latency time for each thread reported by prstat -mL. (LAT is reported as a percentage of the time that a thread is waiting to use a processor.)

This example shows the prstat -mL output from a single-CPU system that has been overloaded. Notice the load average and LAT numbers.

PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
2724 root 24 0.2 0.0 0.0 0.0 0.0 2.2 74 284 423 361 0 gzip/1
2729 root 21 0.3 0.0 0.0 0.0 0.0 3.3 75 396 564 518 0 gzip/1
2733 root 20 0.3 0.0 0.0 0.0 0.0 5.3 75 391 514 484 0 gzip/1
2737 root 14 0.2 0.0 0.0 0.0 0.0 4.1 81 176 415 383 0 gzip/1
2730 root 3.3 0.3 0.0 0.0 0.0 0.0 96 0.7 602 258 505 0 gunzip/1
2734 root 2.9 0.3 0.0 0.0 0.0 0.0 92 4.5 522 280 457 0 gunzip/1
2738 root 2.7 0.2 0.0 0.0 0.0 0.0 93 3.9 377 147 370 0 gunzip/1
2725 root 2.4 0.2 0.0 0.0 0.0 0.0 95 2.4 495 179 355 0 gunzip/1
2728 root 0.1 1.4 0.0 0.0 0.0 0.0 97 1.7 769 11 2K 0 tar/1
2732 root 0.1 1.3 0.0 0.0 0.0 0.0 99 0.2 762 14 2K 0 tar/1
2723 root 0.0 1.1 0.0 0.0 0.0 0.0 99 0.1 564 7 1K 0 tar/1
2731 root 0.3 0.4 0.0 0.0 0.0 0.0 98 1.2 754 3 1K 0 tar/1
2735 root 0.3 0.4 0.0 0.0 0.0 0.0 98 0.9 722 0 1K 0 tar/1
2736 root 0.0 0.6 0.0 0.0 0.0 0.0 99 0.0 341 2 1K 0 tar/1
2726 root 0.3 0.3 0.0 0.0 0.0 0.0 98 1.0 473 145 1K 0 tar/1
2739 root 0.2 0.2 0.0 0.0 0.0 0.0 99 0.3 335 1 664 0 tar/1
2749 scromar 0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 23 0 194 0 prstat/1
337 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 6 0 36 6 xntpd/1
2716 scromar 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 1 21 0 sshd/1
124 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 17 0 picld/4
119 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 21 0 63 0 nscd/26
Total: 51 processes, 164 lwps, load averages: 4.12, 2.13, 0.88

prstat Example 2

In this case, we sort prstat output to look for the processes with heavy memory utilization:

# prstat -s rss
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
471 juser 125M 58M sleep 59 0 4:26:46 0.6% java/17
200 daemon 62M 55M sleep 59 0 0:01:21 0.0% nfsmapid/4
18296 juser 116M 39M sleep 26 11 0:05:36 0.1% java/23
...
254 root 3968K 1016K sleep 59 0 0:00:03 0.0% sshd/1
Total: 47 processes, 221 lwps, load averages: 0.20, 0.21, 0.20

Other Usage Examples

# ps -ef | grep more | grep -v grep
root 18494 8025 0 08:53:09 pts/3 0:00 more
# pgrep more
18494
# pmap -x 18494
18494: more
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 32 32 - - r-x-- more
00028000 8 8 8 - rwx-- more
0002A000 16 16 16 - rwx-- [ heap ]
FF200000 864 824 - - r-x-- libc.so.1
FF2E8000 32 32 32 - rwx-- libc.so.1
FF2F0000 8 8 8 - rwx-- libc.so.1
FF300000 16 16 - - r-x-- en_US.ISO8859-1.so.3
FF312000 16 16 16 - rwx-- en_US.ISO8859-1.so.3
FF330000 8 8 - - r-x-- libc_psr.so.1
FF340000 8 8 8 - rwx-- [ anon ]
FF350000 168 104 - - r-x-- libcurses.so.1
FF38A000 32 32 24 - rwx-- libcurses.so.1
FF392000 8 8 8 - rwx-- libcurses.so.1
FF3A0000 24 16 16 - rwx-- [ anon ]
FF3B0000 184 184 - - r-x-- ld.so.1
FF3EE000 8 8 8 - rwx-- ld.so.1
FF3F0000 8 8 8 - rwx-- ld.so.1
FFBFC000 16 16 16 - rw--- [ stack ]
-------- ------- ------- ------- -------
total Kb 1456 1344 168 -
# pstack 18494
18494: more
ff2c0c7c read (2, ffbff697, 1)
00015684 ???????? (0, 1, 43858, ff369ad4, 0, 28b20)
000149a4 ???????? (ffbff82f, 28400, 15000000, 28af6, 0, 28498)
00013ad8 ???????? (0, 28b10, 28c00, 400b0, ff2a4a74, 0)
00012780 ???????? (2a078, ff393050, 0, 28b00, 2a077, 6b)
00011c68 main (28b10, ffffffff, 28c00, 0, 0, 1) + 684
000115cc _start (0, 0, 0, 0, 0, 0) + 108
# pfiles 18494
18494: more
Current rlimit: 256 file descriptors
0: S_IFIFO mode:0000 dev:292,0 ino:2083873 uid:0 gid:0 size:0
O_RDWR
1: S_IFCHR mode:0620 dev:284,0 ino:12582922 uid:1000 gid:7 rdev:24,3
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:3
2: S_IFCHR mode:0620 dev:284,0 ino:12582922 uid:1000 gid:7 rdev:24,3
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:3
# pcred 18494
18494: e/r/suid=0 e/r/sgid=0
groups: 0 1 2 3 4 5 6 7 8 9 12