A process can exist in one of the following states: running, sleeping or ready.
Kernel Threads Model
The Solaris 10 kernel threads model consists of the following major objects:
- kernel threads: This is what is scheduled/executed on a processor
- user threads: The user-level thread state within a process.
- process: The object that tracks the execution environment of a program.
- lightweight process (lwp): Execution context for a user thread. Associates a user thread with a kernel thread.
In the Solaris 10 kernel, kernel services and tasks are executed as kernel threads. When a user thread is created, the associated lwp and kernel threads are also created and linked to the user thread.
(This single-level model was first introduced in Solaris 8's alternative threads library, which was made the default in Solaris 9. Prior to that, user threads had to bind to an available lwp before becoming eligible to run on the processor.)
The Solaris kernel is fully preemptible. This means that all threads, including the threads that support the kernel's own activities, can be deferred to allow a higher- priority thread to run.
Solaris recognizes 170 different priorities, 0-169. Within these priorities fall a number of different scheduling classes:
- TS (timeshare): This is the default class for processes and their associated kernel threads. Priorities within this class range 0-59, and are dynamically adjusted in an attempt to allocate processor resources evenly.
- IA (interactive): This is an enhanced version of the TS class that applies to the in-focus window in the GUI. Its intent is to give extra resources to processes associated with that specific window. Like TS, IA's range is 0-59.
- FSS (fair-share scheduler): This class is share-based rather than priority- based. Threads managed by FSS are scheduled based on their associated shares and the processor's utilization. FSS also has a range 0-59.
- FX (fixed-priority): The priorities for threads associated with this class are fixed. (In other words, they do not vary dynamically over the lifetime of the thread.) FX also has a range 0-59.
- SYS (system): The SYS class is used to schedule kernel threads. Threads in this class are "bound" threads, which means that they run until they block or complete. Priorities for SYS threads are in the 60-99 range.
- RT (real-time): Threads in the RT class are fixed-priority, with a fixed time quantum. Their priorities range 100-159, so an RT thread will preempt a system thread.
Of these, FSS and FX were implemented in Solaris 9. (An extra-cost option for Solaris 8 included the SHR (share-based) class, but this has been subsumed into FSS.)
Fair Share Scheduler
The default Timesharing (TS) scheduling class in Solaris attempts to
allow each process on the system to have relatively equal CPU access.
nice command allows some management of process
priority, but the new Fair Share Scheduler (FSS) allows more flexible
process priority management that integrates with the
Each project is allocated a certain number of CPU shares via the
resource control. Each project is allocated CPU time based on
cpu-shares value divided by the sum of the
cpu-shares values for all active projects.
Anything with a zero
cpu-shares value will not be granted CPU
time until all projects with non-zero
cpu-shares are done with the CPU.
The maximum number of shares that can be assigned to any one project is 65535.
FSS can be assigned to processor sets, resulting in more
sensitive control of priorities on a server than raw processor sets.
The dispadmin command command controls the
assignment of schedulers to processor sets, using a form like:
dispadmin -d FSS
To enable this change now, rather than after the next reboot, run a command like the following:
priocntl -s -C FSS
priocntl can control
cpu-shares for a project:
priocntl -r -n project.cpu-shares -v number-shares
-i project project-name
The Fair Share Scheduler should not be combined with the TS, FX (fixed-priority) or IA (interactive) scheduling classes on the same CPU or processor set. All of these scheduling classes use priorities in the same range, so unexpected behavior can result from combining FSS with any of these. (There is no problem, however, with running TS and IA on the same processor set.)
To move a specific project's processes into FSS, run something like:
priocntl -s -c FSS -i projid project-ID
All processes can be moved into FSS by first converting init, then the
rest of the processes:
priocntl -s -c FSS -i pid 1
priocntl -s -c FSS -i all
Time Slicing for TS and IA
TS and IA scheduling classes implement an adaptive time slicing scheme
that increases the priority of I/O-bound processes at the expense of
compute-bound processes. The exact values that are used to implement
this can be found in the dispatch table. To examine the TS dispatch
table, run the command
dispadmin -c TS -g. (If units are
dispadmin reports time values in ms.)
The following values are reported in the dispatch table:
- ts_quantum: This is the default length of time assigned to a process with the specified priority.
- ts_tqexp: This is the new priority that is assigned to a process that uses its entire time quantum.
- ts_slpret: The new priority assigned to a process that blocks before using its entire time quantum.
- ts_maxwait: If a thread does not receive CPU time during
a time interval of
ts_maxwait, its priority is raised to
The man page for
ts_dptbl contains additional information
about these parameters.
dispadmin can be used to edit the dispatch table to affect
the decay of priority for compute-bound processes or the growth in
priority for I/O-bound processes. Obviously, the importance of the
different types of processing on different systems will make a
difference in how these parameters are tweaked. In particular,
ts_lwait can prevent
CPU starvation, and raising
ts_tqexp slightly can
slow the decline in priority of CPU-bound processes.
In any case, the dispatch tables should only be altered slightly at each step in the tuning process, and should only be altered at all if you have a specific goal in mind.
The following are some of the sorts of changes that can be made:
ts_quantumfavors IA class objects.
ts_quantumfavors compute-bound objects.
ts_lwaitcontrol CPU starvation.
ts_tqexpcan cause compute-bound objects' priorities to decay more or less rapidly.
ts_slpretcan cause I/O-bound objects' priorities to rise more or less rapidly.
RT objects time slice differently in that
ts_slpret do not increase or decrease the priority of the
IA objects add 10 to the regular TS priority of the process in the active window. This priority shifts with the focus on the active window. object. Each RT thread will execute until its time slice is up or it is blocked while waiting for a resource.
Time Slicing for FSS
In FSS, the time quantum is the length of time that a thread is allowed
to run before it has to release the processor. This can be checked using
dispadmin -c FSS -g
QUANTUM is reported in ms. (The output of the above command
displays the resolution in the
RES parameter. The default is
1000 slices per second.) It can be adjusted using
dispadmin as well. First, run the above command and capture the
output to a text file (filename.txt). Then run the command:
dispadmin -c FSS -s filename.txt
CalloutsSolaris handles callouts with a callout thread that runs at maximum system priority, which is still lower than any RT thread. RT callouts are handled separately and are invoked at the lowest interrupt level, which ensures prompt processing.
Priority InheritanceEach thread has two priorities: global priority and inherited priority. The inherited priority is normally zero unless the thread is sitting on a resource that is required by a higher priority thread.
When a thread blocks on a resource, it attempts to "will" or pass on its priority to all threads that are directly or indirectly blocking it. The pi_willto() function checks each thread that is blocking the resource or that is blocking a thread in the syncronization chain. When it sees threads that are a lower priority, those threads inherit the priority of the blocked thread. It stops traversing the syncronization chain when it hits an object that is not blocked or is higher priority than the willing thread.
This mechanism is of limited use when considering condition variable, semaphore or read/write locks. In the latter case, an owner-of-record is defined, and the inheritance works as above. If there are several threads sharing a read lock, however, the inheritance only works on one thread at a time.
Thundering HerdWhen a resource is freed, all threads awaiting that resource are woken. This results in a footrace to obtain access to that object; one succeeds and the others return to sleep. This can lead to wasted overhead for context switches, as well as a problem with lower priority threads obtaining access to an object before a higher-priority thread. This is called a "thundering herd" problem.
Priority inheritance is an attempt to deal with this problem, but some types of syncronization do not use inheritance.
TurnstilesEach syncronization object (lock) contains a pointer to a structure known as a turnstile. These contain the data needed to manipulate the syncronization object, such as a queue of blocked threads and a pointer to the thread that is currently using the resource. Turnstiles are dynamically allocated based on the number of allocated threads on the system. A turnstile is allocated by the first thread that blocks on a resource and is freed when no more threads are blocked on the resource.
Turnstiles queue the blocked threads according to their priority. Turnstiles may issue a signal to wake up the highest-priority thread, or they may issue a broadcast to wake up all sleeping threads.
Adjusting PrioritiesThe priority of a process can be adjusted with
nice, and the priority of an LWP can be controlled with
Real Time IssuesSTREAMS processing is moved into its own kernel threads, which run at a lower priority than RT threads. If an RT thread places a STREAMS request, it may be serviced at a lower priority level than is merited.
Real time processes also lock all their pages in memory. This can cause problems on a system that is underconfigured for the amount of memory that is required.
Since real time processes run at such a high priority, system daemons may suffer if the real time process does not permit them to run.
When a real time process forks, the new process also inherits real time privileges. The programmer must take care to prevent unintended consequences. Loops can also be hard to stop, so the programmer also needs to make sure that the program does not get caught in an infinite loop.
InterruptsInterrupt levels run between 0 and 15. Some typical interrupts include:
- soft interrupts
- SCSI/FC disks (3)
- Tape, Ethernet
- clock() (10)
- serial communications
- real-time CPU clock
- Nonmaskable interrupts (15)