Intuitively, the load average is an average over time of the number of
processes in the run queue. uptime
reports load averages
over 1-, 5- and 15-minute intervals. Typically, load averages are
divided by the number of CPU cores to find the load per CPU. Load averages
above 1 per CPU indicate that the CPUs are fully utilized. Depending on the
type of load and the I/O requirements, user-visible performance may
not be affected until levels of 2 per CPU are reached.
A general rule of thumb is that load averages that are persistently above
4 times the number of CPUs will result in sluggish performance.
Prior to Solaris 10, the calculation algorithm directly computed the load average by periodically sampling the length of the run queue. Since this measurement can be skewed by threads that enter and exit more quickly than the sampling interval, Solaris 10 altered the algorithm to use microstate accounting instead.
Solaris 10 applies an exponential decay algorithm to a combination of high-resolution usr, sys and thread wait times. The numbers are comparable to a traditional load average.
The load averages can be monitored intermittently via
uptime
or over extended time periods by looking at run
queue lengths and the amount of time that the run queue is
occupied via
sar -q
.
One issue to watch for is the number of processes that are blocked while waiting for I/O. Check the disk I/O page for information on monitoring this.
Solaris 10 allows us to directly monitor the amount of time threads
wait for a processor via the prstat -mL
command in the
LAT
category.
For non-NFS servers, another danger sign is when the system consistently spends more time in sys than usr mode. (nfsd operates in the kernel in sys mode.) MacDougall and Mauro comment that a typical usr/sys ratio is in the neighborhood of 70/30 on a reasonably loaded system.
Another issue to watch for is a high number of system calls per
second per processor. With today's faster CPUs, 20,000 would
represent a reasonable threshold. This can be monitored via
sar -c
.
In particular, the
large numbers of forks
or execs
may represent
excessive context switching. (Slower processors will be able to
handle fewer system calls per second.) Context switching is
monitored by
vmstat
or
mpstat
.
No comments:
Post a Comment