- Mutexes
- Semaphores (counters) (not the same as IPC semaphores)
- Condition variables (generalized semaphores)
- Multiple-reader, single-writer locks
The following types of locking problems can occur:
- Lock contention (due to excessively coarse granularity or inappropriate lock type)
- Deadlock (each process is waiting for a lock held by another process)
- Lost locks
- Race conditions
- Incomplete or buggy lock implementation
Mutex Locks
A "mutex lock" is a "mutual exclusion lock." It is created by the
LDSTUB
(load-store-unsigned-byte) instruction, which
is an atomic (indivisible) operation that reads a byte from memory
and writes 0xFF into that location. (When the lock is cleared,
0x00 is written back to the memory location.)
If the value that was read from memory is already 0xFF, another processor has already set the lock. At that point, the processor can "spin" by sitting in a loop and testing to see if the lock has cleared (i.e., been written back to 0x00). This sort of "spin lock" is usually used when the wait time for the lock is expected to be short. (If the wait is expected to be longer, the process should sleep so that the CPU can be used by another process. This is known as a "block.")
Adaptive Locks
Solaris 2.x provides a type of locking known as adaptive locks. When one thread attempts to acquire one of these that is held by another thread, it checks to see if the second thread is active on a processor. If it is, the first thread spins. If the second thread is blocked, the first thread blocks as well.
Read/Write Locks
This type of lock allows multiple concurrent reads, but prevents other accesses of the resource when writes are taking place.
Lock Contention Indicators
One indicator of a possible lock contention problem is when
vmstat
reports that the system is not idle, but that
cpu/sy
dominates cpu/us
. (Note: this observation
is only true if the system is not running an NFS server or other major
service that runs from inside the kernel.)
One way to pin down a lock contention problem is by tracing the problem
process with
truss
.
Another way to attempt to track down the problem is with
mpstat
. The smtx
measurement shows the number
of times a CPU failed to obtain a mutex immediately. The master CPU
(the one taking the clock interrupt--usually CPU 0) will tend to have
a high reading. Depending upon CPU speed, a reading of more than 500
may be an indication of a system in trouble. If the smtx
is greater than 500 on a single CPU and sys
dominates
usr
(ie, system time is larger than user time, and
system time is greater than 20%), it is likely that mutex contention
is occurring.
Similarly,
mpstat/srw
value
reports on the number of times that a CPU failed to obtain a read/write
lock immediately.
For Solaris 2.6 and above, the lockstat
command can help to pin down the culprit. The
kernel takes a performance hit while lockstat
is running,
so you probably only want to use this command while you are actually
looking at the output.
With lockstat
, look for large counts (indv
),
especially with long locking times (nsec
).
In any case, extreme mutex contention problems should be reported to Sun. Changes have been implemented in current versions of the SunOS 5.x kernel that dramatically increase the scalability of the operating system over multiple processors. Unless additional issues are brought to the vendor's attention, they cannot be expected to correct them in future releases.
No comments:
Post a Comment